Jiaming Zhou

This page is a disambiguation page, it actually contains mutiple papers from persons of the same or a similar name.

Bibliography

2025
RAMPGrasp: Retentive Attention-Based Multiscale Perception Grasp Detection Network.
IEEE Trans. Circuits Syst. Video Technol., August, 2025

RealTalk-CN: A Realistic Chinese Speech-Text Dialogue Benchmark With Cross-Modal Interaction Analysis.
CoRR, August, 2025

End-to-End Humanoid Robot Safe and Comfortable Locomotion Policy.
CoRR, August, 2025

DIFFA: Large Language Diffusion Models Can Listen and Understand.
CoRR, July, 2025

Omni-Thinker: Scaling Cross-Domain Generalization in LLMs via Multi-Task RL with Hybrid Rewards.
CoRR, July, 2025

A Self-Training Approach for Whisper to Enhance Long Dysarthric Speech Recognition.
CoRR, June, 2025

StreamMel: Real-Time Zero-shot Text-to-Speech via Interleaved Continuous Autoregressive Modeling.
CoRR, June, 2025

A State Space Model for Multiobject Full 3-D Information Estimation From RGB-D Images.
IEEE Trans. Cybern., May, 2025

EmotionTalk: An Interactive Chinese Multimodal Emotion Dataset With Rich Annotations.
CoRR, May, 2025

RA-CLAP: Relation-Augmented Emotional Speaking Style Contrastive Language-Audio Pretraining For Speech Retrieval.
CoRR, May, 2025

Omni-Perception: Omnidirectional Collision Avoidance for Legged Locomotion in Dynamic Environments.
CoRR, May, 2025

Reinforcing Question Answering Agents with Minimalist Policy Gradient Optimization.
CoRR, May, 2025

Exploring the Limits of Vision-Language-Action Manipulations in Cross-task Generalization.
CoRR, May, 2025

GLOVER++: Unleashing the Potential of Affordance Learning from Human Behaviors for Robotic Manipulation.
CoRR, May, 2025

Chinese-LiPS: A Chinese audio-visual speech recognition dataset with Lip-reading and Presentation Slides.
CoRR, April, 2025

SeniorTalk: A Chinese Conversation Dataset with Rich Annotations for Super-Aged Seniors.
CoRR, March, 2025

Human-Centric Transformer for Domain Adaptive Action Recognition.
IEEE Trans. Pattern Anal. Mach. Intell., February, 2025

CS-Dialogue: A 104-Hour Dataset of Spontaneous Mandarin-English Code-Switching Dialogues for Speech Recognition.
CoRR, February, 2025

FELLE: Autoregressive Speech Synthesis with Token-Wise Coarse-to-Fine Flow Matching.
CoRR, February, 2025

MusicEval: A Generative Music Corpus with Expert Ratings for Automatic Text-to-Music Evaluation.
CoRR, January, 2025

Improving Zero-Shot Chinese-English Code-Switching ASR with kNN-CTC and Gated Monolingual Datastores.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

M2R-Whisper: Multi-stage and Multi-scale Retrieval Augmentation for Enhancing Whisper.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Enhancing Multimodal Emotion Recognition through Multi-Granularity Cross-Modal Alignment.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Enhancing Emotion Recognition in Incomplete Data: A Novel Cross-Modal Alignment, Reconstruction, and Refinement Framework.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

MusicEval: A Generative Music Dataset with Expert Ratings for Automatic Text-to-Music Evaluation.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Emotion-Preserving Prosody Anonymization Network for Voice Privacy Protection.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Mitigating the Human-Robot Domain Discrepancy in Visual Pre-training for Robotic Manipulation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

One-Iteration-per-Update (OIpU) Algorithm Applied to MMPaC (Minimum Motion Planning and Control) of Planar Four-Link Robotic Arm Aided with Zhang Equivalency.
Proceedings of the 28th International Conference on Computer Supported Cooperative Work in Design, 2025

ChildMandarin: A Comprehensive Mandarin Speech Dataset for Young Children Aged 3-5.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024
PoseDiffusion: A Coarse-to-Fine Framework for Unseen Object 6-DoF Pose Estimation.
IEEE Trans. Ind. Informatics, September, 2024

A Depth Adaptive Feature Extraction and Dense Prediction Network for 6-D Pose Estimation in Robotic Grasping.
IEEE Trans. Ind. Informatics, February, 2024

TwinFormer: Fine-to-Coarse Temporal Modeling for Long-Term Action Recognition.
IEEE Trans. Multim., 2024

Path-of-Thoughts: Extracting and Following Paths for Robust Relational Reasoning with Large Language Models.
CoRR, 2024

GLOVER: Generalizable Open-Vocabulary Affordance Reasoning for Task-Oriented Grasping.
CoRR, 2024

ChildMandarin: A Comprehensive Mandarin Speech Dataset for Young Children Aged 3-5.
CoRR, 2024

Enhancing Logical Reasoning in Large Language Models through Graph-based Synthetic Data.
CoRR, 2024

Self-Prompt Tuning: Enable Autonomous Role-Playing in LLMs.
CoRR, 2024

Rethinking CLIP-based Video Learners in Cross-Domain Open-Vocabulary Action Recognition.
CoRR, 2024

ActionHub: A Large-scale Action Video Description Dataset for Zero-shot Action Recognition.
CoRR, 2024

Findings of the 2024 Mandarin Stuttering Event Detection and Automatic Speech Recognition Challenge.
Proceedings of the IEEE Spoken Language Technology Workshop, 2024

PB-LRDWWS System For the SLT 2024 Low-Resource Dysarthria Wake-Up Word Spotting Challenge.
Proceedings of the IEEE Spoken Language Technology Workshop, 2024

Uncertainty-Aware Mean Opinion Score Prediction.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Enhancing Dysarthric Speech Recognition for Unseen Speakers via Prototype-Based Adaptation.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Iterative Prototype Refinement for Ambiguous Speech Emotion Recognition.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

AS-70: A Mandarin stuttered speech dataset for automatic speech recognition and stuttering event detection.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

CKGConv: General Graph Convolution with Continuous Kernels.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

KNN-CTC: Enhancing ASR via Retrieval of CTC Pseudo Labels.
Proceedings of the IEEE International Conference on Acoustics, 2024

CIF-T: A Novel CIF-Based Transducer Architecture for Automatic Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2024

Uniform-Distribution (UD) Based Time Intervals of GRC (Global Reserve Currency) Transition Year Predicted Narrowly as [2024, 2040] and Generally [2k00, 2k50].
Proceedings of the 27th International Conference on Computer Supported Cooperative Work in Design, 2024

Contrastive Imitation Learning for Language-guided Multi-Task Robotic Manipulation.
Proceedings of the Conference on Robot Learning, 6-9 November 2024, Munich, Germany., 2024

2023
Toward TR-PCB Bubble Detection via an Efficient Attention Segmentation Network and Dynamic Threshold.
IEEE Trans. Instrum. Meas., 2023

Exploring ChatGPT's Potential for Consultation, Recommendations and Report Diagnosis: Gastric Cancer and Gastroscopy Reports' Case.
Int. J. Interact. Multim. Artif. Intell., 2023

GeoDeformer: Geometric Deformable Transformer for Action Recognition.
CoRR, 2023

AdaFocus: Towards End-to-end Weakly Supervised Learning for Long-Video Action Understanding.
CoRR, 2023

PostRainBench: A comprehensive benchmark and a new model for precipitation forecasting.
CoRR, 2023

Improved YOLOv7 Based on Transformer for Object Detection in UAV-Captured Images.
Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics, 2023

Diversifying Spatial-Temporal Perception for Video Domain Generalization.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

MADI: Inter-Domain Matching and Intra-Domain Discrimination for Cross-Domain Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2023

Unsupervised Deep Homography Estimation based on Transformer.
Proceedings of the International Conference on Advanced Robotics and Mechatronics, 2023

2022
Execution and perception of upper limb exoskeleton for stroke patients: a systematic review.
Intell. Serv. Robotics, 2022

Adversarial Partial Domain Adaptation by Cycle Inconsistency.
Proceedings of the Computer Vision - ECCV 2022, 2022

2021
Analysis and Design of Drivetrain Control for the AEV With Network-Induced Compounding-Construction Loop Delays.
IEEE Trans. Veh. Technol., 2021

Video Mosaic of Arbitrary Viewing Direction Oriented on DAS.
Proceedings of the IEEE International Conference on Signal Processing, 2021

Graph-Based High-Order Relation Modeling for Long-Term Action Recognition.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Novelty Detection-Based Automated Anomaly Identification via Optimized Deep Generative Model.
Proceedings of the Big Data - 9th CCF Conference, 2021

2020
Toward Flotation Process Operation-State Identification via Statistical Modeling of Biologically Inspired Gabor Filtering Responses.
IEEE Trans. Cybern., 2020

Research on the Damping Effect Mechanism and Optimization of Super-High-Speed Electric Air Compressors for Fuel Cell Vehicles Under the Stiffness Softening Effect.
IEEE Access, 2020

2019
FLONet: Fewer Labeling Cost Active Learning for Deep Neural Network.
Proceedings of the 21st IEEE International Conference on High Performance Computing and Communications; 17th IEEE International Conference on Smart City; 5th IEEE International Conference on Data Science and Systems, 2019

LADet: A Light-weight and Adaptive Network for Multi-scale Object Detection.
Proceedings of The 11th Asian Conference on Machine Learning, 2019


  Loading...