Jiaming Zhou

Orcid: 0009-0002-4819-4572

Affiliations:

Nankai University, College of Computer Science, Tianjin, China

According to our database¹, Jiaming Zhou authored at least 44 papers between 2023 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Bibliography

2026

CosyEdit2: Speech-Editing-Oriented Reinforcement Learning Unlocks Better Zero-Shot TTS.

[BibT_eX]

[DOI]

CoRR, May, 2026

Large Language Model Post-Training: A Unified View of Off-Policy and On-Policy Learning.

[BibT_eX]

[DOI]

CoRR, April, 2026

DIFFA-2: A Practical Diffusion Large Language Model for General Audio Understanding.

[BibT_eX]

[DOI]

CoRR, January, 2026

Reflecting Twice before Speaking with Empathy: Self-Reflective Alternating Inference for Empathy-Aware End-to-End Spoken Dialogue.

[BibT_eX]

[DOI]

CoRR, January, 2026

SpeechLLM-as-Judges: Towards General and Interpretable Speech Quality Evaluation.

[BibT_eX]

[DOI]

Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2026

RealTalk-CN: A Realistic Chinese Speech Task-Oriented Dialogue Benchmark with Cross-Modal Analysis.

[BibT_eX]

[DOI]

Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2026

DIFFA: Large Language Diffusion Models Can Listen and Understand.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

TTA-Bench: A Comprehensive Benchmark for Evaluating Text-to-Audio Models.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025

Zero- and One-Shot Data Augmentation for Sentence-Level Dysarthric Speech Recognition in Constrained Scenarios.

[BibT_eX]

[DOI]

CoRR, October, 2025

AudioEval: Automatic Dual-Perspective and Multi-Dimensional Evaluation of Text-to-Audio-Generation.

[BibT_eX]

[DOI]

CoRR, October, 2025

WildElder: A Chinese Elderly Speech Dataset from the Wild with Fine-Grained Manual Annotations.

[BibT_eX]

[DOI]

CoRR, October, 2025

MECap-R1: Emotion-aware Policy with Reinforcement Learning for Multimodal Emotion Captioning.

[BibT_eX]

[DOI]

CoRR, September, 2025

Mind the Gap: Data Rewriting for Stable Off-Policy Supervised Fine-Tuning.

[BibT_eX]

[DOI]

CoRR, September, 2025

GLAD: Global-Local Aware Dynamic Mixture-of-Experts for Multi-Talker ASR.

[BibT_eX]

[DOI]

CoRR, September, 2025

RealTalk-CN: A Realistic Chinese Speech-Text Dialogue Benchmark With Cross-Modal Interaction Analysis.

[BibT_eX]

[DOI]

CoRR, August, 2025

StreamMel: Real-Time Zero-shot Text-to-Speech via Interleaved Continuous Autoregressive Modeling.

[BibT_eX]

[DOI]

CoRR, June, 2025

EmotionTalk: An Interactive Chinese Multimodal Emotion Dataset With Rich Annotations.

[BibT_eX]

[DOI]

CoRR, May, 2025

CS-Dialogue: A 104-Hour Dataset of Spontaneous Mandarin-English Code-Switching Dialogues for Speech Recognition.

[BibT_eX]

[DOI]

CoRR, February, 2025

MusicEval: A Generative Music Corpus with Expert Ratings for Automatic Text-to-Music Evaluation.

[BibT_eX]

[DOI]

CoRR, January, 2025

StreamMel: Real-Time Zero-Shot Text-to-Speech Via Interleaved Continuous Autoregressive Modeling.

[BibT_eX]

[DOI]

IEEE Signal Process. Lett., 2025

Self-Prompt Tuning: Enable Autonomous Role-Playing in LLMs.

[BibT_eX]

[DOI]

Proceedings of the Natural Language Processing and Chinese Computing, 2025

SeniorTalk: A Chinese Conversation Dataset with Rich Annotations for Super-Aged Seniors.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

FELLE: Autoregressive Speech Synthesis with Token-Wise Coarse-to-Fine Flow Matching.

[BibT_eX]

[DOI]

Proceedings of the 33rd ACM International Conference on Multimedia, 2025

A Self-Training Approach for Whisper to Enhance Long Dysarthric Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

RA-CLAP: Relation-Augmented Emotional Speaking Style Contrastive Language-Audio Pretraining For Speech Retrieval.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Chinese-LiPS: A Chinese Audio-Visual Speech Recognition Dataset with Lip-Reading and Presentation Slides.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Multimedia and Expo, 2025

Improving Zero-Shot Chinese-English Code-Switching ASR with kNN-CTC and Gated Monolingual Datastores.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

M2R-Whisper: Multi-stage and Multi-scale Retrieval Augmentation for Enhancing Whisper.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Enhancing Multimodal Emotion Recognition through Multi-Granularity Cross-Modal Alignment.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Enhancing Emotion Recognition in Incomplete Data: A Novel Cross-Modal Alignment, Reconstruction, and Refinement Framework.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

MusicEval: A Generative Music Dataset with Expert Ratings for Automatic Text-to-Music Evaluation.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Emotion-Preserving Prosody Anonymization Network for Voice Privacy Protection.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

ChildMandarin: A Comprehensive Mandarin Speech Dataset for Young Children Aged 3-5.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024

ChildMandarin: A Comprehensive Mandarin Speech Dataset for Young Children Aged 3-5.

[BibT_eX]

[DOI]

CoRR, 2024

Self-Prompt Tuning: Enable Autonomous Role-Playing in LLMs.

[BibT_eX]

[DOI]

CoRR, 2024

Findings of the 2024 Mandarin Stuttering Event Detection and Automatic Speech Recognition Challenge.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

PB-LRDWWS System For the SLT 2024 Low-Resource Dysarthria Wake-Up Word Spotting Challenge.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

Uncertainty-Aware Mean Opinion Score Prediction.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Enhancing Dysarthric Speech Recognition for Unseen Speakers via Prototype-Based Adaptation.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Iterative Prototype Refinement for Ambiguous Speech Emotion Recognition.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

AS-70: A Mandarin stuttered speech dataset for automatic speech recognition and stuttering event detection.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

KNN-CTC: Enhancing ASR via Retrieval of CTC Pseudo Labels.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

CIF-T: A Novel CIF-Based Transducer Architecture for Automatic Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

2023

MADI: Inter-Domain Matching and Intra-Domain Discrimination for Cross-Domain Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Jiaming Zhou

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...