Jiaming Zhou

Orcid: 0009-0002-4819-4572

Affiliations:
  • Nankai University, College of Computer Science, Tianjin, China


According to our database1, Jiaming Zhou authored at least 42 papers between 2023 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

Online presence:

On csauthors.net:

Bibliography

2026
Large Language Model Post-Training: A Unified View of Off-Policy and On-Policy Learning.
CoRR, April, 2026

DIFFA-2: A Practical Diffusion Large Language Model for General Audio Understanding.
CoRR, January, 2026

Reflecting Twice before Speaking with Empathy: Self-Reflective Alternating Inference for Empathy-Aware End-to-End Spoken Dialogue.
CoRR, January, 2026

DIFFA: Large Language Diffusion Models Can Listen and Understand.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

TTA-Bench: A Comprehensive Benchmark for Evaluating Text-to-Audio Models.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025
Zero- and One-Shot Data Augmentation for Sentence-Level Dysarthric Speech Recognition in Constrained Scenarios.
CoRR, October, 2025

SpeechLLM-as-Judges: Towards General and Interpretable Speech Quality Evaluation.
CoRR, October, 2025

AudioEval: Automatic Dual-Perspective and Multi-Dimensional Evaluation of Text-to-Audio-Generation.
CoRR, October, 2025

WildElder: A Chinese Elderly Speech Dataset from the Wild with Fine-Grained Manual Annotations.
CoRR, October, 2025

MECap-R1: Emotion-aware Policy with Reinforcement Learning for Multimodal Emotion Captioning.
CoRR, September, 2025

Mind the Gap: Data Rewriting for Stable Off-Policy Supervised Fine-Tuning.
CoRR, September, 2025

GLAD: Global-Local Aware Dynamic Mixture-of-Experts for Multi-Talker ASR.
CoRR, September, 2025

RealTalk-CN: A Realistic Chinese Speech-Text Dialogue Benchmark With Cross-Modal Interaction Analysis.
CoRR, August, 2025

StreamMel: Real-Time Zero-shot Text-to-Speech via Interleaved Continuous Autoregressive Modeling.
CoRR, June, 2025

EmotionTalk: An Interactive Chinese Multimodal Emotion Dataset With Rich Annotations.
CoRR, May, 2025

SeniorTalk: A Chinese Conversation Dataset with Rich Annotations for Super-Aged Seniors.
CoRR, March, 2025

CS-Dialogue: A 104-Hour Dataset of Spontaneous Mandarin-English Code-Switching Dialogues for Speech Recognition.
CoRR, February, 2025

MusicEval: A Generative Music Corpus with Expert Ratings for Automatic Text-to-Music Evaluation.
CoRR, January, 2025

StreamMel: Real-Time Zero-Shot Text-to-Speech Via Interleaved Continuous Autoregressive Modeling.
IEEE Signal Process. Lett., 2025

Self-Prompt Tuning: Enable Autonomous Role-Playing in LLMs.
Proceedings of the Natural Language Processing and Chinese Computing, 2025

FELLE: Autoregressive Speech Synthesis with Token-Wise Coarse-to-Fine Flow Matching.
Proceedings of the 33rd ACM International Conference on Multimedia, 2025

A Self-Training Approach for Whisper to Enhance Long Dysarthric Speech Recognition.
Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

RA-CLAP: Relation-Augmented Emotional Speaking Style Contrastive Language-Audio Pretraining For Speech Retrieval.
Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Chinese-LiPS: A Chinese Audio-Visual Speech Recognition Dataset with Lip-Reading and Presentation Slides.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2025

Improving Zero-Shot Chinese-English Code-Switching ASR with kNN-CTC and Gated Monolingual Datastores.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

M2R-Whisper: Multi-stage and Multi-scale Retrieval Augmentation for Enhancing Whisper.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Enhancing Multimodal Emotion Recognition through Multi-Granularity Cross-Modal Alignment.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Enhancing Emotion Recognition in Incomplete Data: A Novel Cross-Modal Alignment, Reconstruction, and Refinement Framework.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

MusicEval: A Generative Music Dataset with Expert Ratings for Automatic Text-to-Music Evaluation.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Emotion-Preserving Prosody Anonymization Network for Voice Privacy Protection.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

ChildMandarin: A Comprehensive Mandarin Speech Dataset for Young Children Aged 3-5.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024
ChildMandarin: A Comprehensive Mandarin Speech Dataset for Young Children Aged 3-5.
CoRR, 2024

Self-Prompt Tuning: Enable Autonomous Role-Playing in LLMs.
CoRR, 2024

Findings of the 2024 Mandarin Stuttering Event Detection and Automatic Speech Recognition Challenge.
Proceedings of the IEEE Spoken Language Technology Workshop, 2024

PB-LRDWWS System For the SLT 2024 Low-Resource Dysarthria Wake-Up Word Spotting Challenge.
Proceedings of the IEEE Spoken Language Technology Workshop, 2024

Uncertainty-Aware Mean Opinion Score Prediction.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Enhancing Dysarthric Speech Recognition for Unseen Speakers via Prototype-Based Adaptation.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Iterative Prototype Refinement for Ambiguous Speech Emotion Recognition.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

AS-70: A Mandarin stuttered speech dataset for automatic speech recognition and stuttering event detection.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

KNN-CTC: Enhancing ASR via Retrieval of CTC Pseudo Labels.
Proceedings of the IEEE International Conference on Acoustics, 2024

CIF-T: A Novel CIF-Based Transducer Architecture for Automatic Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2024

2023
MADI: Inter-Domain Matching and Intra-Domain Discrimination for Cross-Domain Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2023


  Loading...