Qian Chen
Orcid: 0000-0002-1263-9560Affiliations:
- Alibaba Group, DAMO Academy, Speech Lab, China
- University of Science and Technology of China, National Engineering Laboratory of Speech and Language Information Processing, Hefei, China
According to our database1,
Qian Chen
authored at least 101 papers
between 2015 and 2025.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on orcid.org
On csauthors.net:
Bibliography
2025
SpeakerLM: End-to-End Versatile Speaker Diarization and Recognition with Multimodal Large Language Models.
CoRR, August, 2025
ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing.
CoRR, June, 2025
KaLM-Embedding-V2: Superior Training Techniques and Data Inspire A Versatile Embedding Model.
CoRR, June, 2025
OmniDRCA: Parallel Speech-Text Foundation Model via Dual-Resolution Speech Representations and Contrastive Alignment.
CoRR, June, 2025
CoRR, May, 2025
Pushing the Frontiers of Self-Distillation Prototypes Network with Dimension Regularization and Score Normalization.
CoRR, May, 2025
CoRR, April, 2025
InspireMusic: Integrating Super Resolution and Large Language Model for High-Fidelity Long-Form Music Generation.
CoRR, March, 2025
CoRR, January, 2025
CoRR, January, 2025
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
Self-Distillation Prototypes Network: Learning Robust Speaker Representations without Supervision.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025
ControlSpeech: Towards Simultaneous and Independent Zero-shot Speaker Cloning and Zero-shot Language Style Control.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025
CodeHalu: Investigating Code Hallucinations in LLMs via Execution-based Verification.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025
Speech Recognition Meets Large Language Model: Benchmarking, Models, and Exploration.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025
Recording for Eyes, Not Echoing to Ears: Contextualized Spoken-to-Written Conversion of ASR Transcripts.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025
2024
IEEE Signal Process. Lett., 2024
CoRR, 2024
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling.
CoRR, 2024
Integrating Audio, Visual, and Semantic Information for Enhanced Multimodal Speaker Diarization.
CoRR, 2024
Recording for Eyes, Not Echoing to Ears: Contextualized Spoken-to-Written Conversion of ASR Transcripts.
CoRR, 2024
CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens.
CoRR, 2024
FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs.
CoRR, 2024
ClinicalLab: Aligning Agents for Multi-Departmental Clinical Diagnostics in the Real World.
CoRR, 2024
CoRR, 2024
ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control With Decoupled Codec.
CoRR, 2024
CoRR, 2024
Proceedings of the 14th IEEE International Symposium on Chinese Spoken Language Processing, 2024
ERes2NetV2: Boosting Short-Duration Speaker Verification Performance with Computational Efficiency.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024
Proceedings of the IEEE International Conference on Acoustics, 2024
Proceedings of the IEEE International Conference on Acoustics, 2024
Advancing Precise Outline-Conditioned Text Generation with Task Duality and Explicit Outline Control.
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics, 2024
CodeScope: An Execution-based Multilingual Multitask Multidimensional Benchmark for Evaluating LLMs on Code Understanding and Generation.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024
2023
Improving Speaker Diarization using Semantic Information: Joint Pairwise Constraints Propagation.
CoRR, 2023
Self-Distillation Network with Ensemble Prototypes: Learning Robust Speaker Representations without Supervision.
CoRR, 2023
3D-Speaker: A Large-Scale Multi-Device, Multi-Distance, and Multi-Dialect Corpus for Speech Representation Disentanglement.
CoRR, 2023
Exploiting Correlations Between Contexts and Definitions with Multiple Definition Modeling.
CoRR, 2023
CoRR, 2023
Enhancing Multi-modal Multi-hop Question Answering via Structured Knowledge and Unified Retrieval-Generation.
Proceedings of the 31st ACM International Conference on Multimedia, 2023
Proceedings of the 20th International Conference on Spoken Language Translation, 2023
CAM++: A Fast and Efficient Network for Speaker Verification Using Context-Aware Masking.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Adapter-tuning with Effective Token-dependent Representation Shift for Automatic Speech Recognition.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Personality-aware Training based Speaker Adaptation for End-to-end Speech Recognition.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Overview of the ICASSP 2023 General Meeting Understanding and Generation Challenge (MUG).
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Pushing the Limits of Self-Supervised Speaker Verification using Regularized Distillation Framework.
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023
The Second Multi-Channel Multi-Party Meeting Transcription Challenge (M2MeT 2.0): A Benchmark for Speaker-Attributed ASR.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023
A Comparative Study on Multichannel Speaker-Attributed Automatic Speech Recognition in Multi-party Meetings.
Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2023
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023
Exploring Speaker-Related Information in Spoken Language Understanding for Better Speaker Diarization.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023
2022
Enhancing Multi-modal and Multi-hop Question Answering via Structured Knowledge and Unified Retrieval-Generation.
CoRR, 2022
PRISM: Pre-trained Indeterminate Speaker Representation Model for Speaker Diarization and Speaker Verification.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Proceedings of the Tenth International Conference on Learning Representations, 2022
Proceedings of the IEEE International Conference on Acoustics, 2022
MDERank: A Masked Document Embedding Rank Approach for Unsupervised Keyphrase Extraction.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022, 2022
2021
TRS: Transferability Reduced Ensemble via Encouraging Gradient Diversity and Model Smoothness.
CoRR, 2021
TRS: Transferability Reduced Ensemble via Promoting Gradient Diversity and Model Smoothness.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021
Pre-Training for Spoken Language Understanding with Joint Textual and Phonetic Representation Learning.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Sequence Model with Self-Adaptive Sliding Window for Efficient Spoken Document Segmentation.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021
2020
Comput. Speech Lang., 2020
Controllable Time-Delay Transformer for Real-Time Punctuation Prediction and Disfluency Detection.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020
2019
Align, Mask and Select: A Simple Method for Incorporating Commonsense Knowledge into Language Representation Models.
CoRR, 2019
Several Experiments on Investigating Pretraining and Knowledge-Enhanced Models for Natural Language Inference.
CoRR, 2019
CoRR, 2019
Proceedings of the IEEE International Conference on Acoustics, 2019
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019
2018
A Sequential Neural Encoder With Latent Structured Description for Modeling Sentences.
IEEE ACM Trans. Audio Speech Lang. Process., 2018
VisDrone-DET2018: The Vision Meets Drone Object Detection in Image Challenge Results.
Proceedings of the Computer Vision - ECCV 2018 Workshops, 2018
Proceedings of the 27th International Conference on Computational Linguistics, 2018
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018
2017
Exploring Question Understanding and Adaptation in Neural-Network-Based Question Answering.
CoRR, 2017
Recurrent Neural Network-Based Sentence Encoder with Gated Attention for Natural Language Inference.
Proceedings of the 2nd Workshop on Evaluating Vector Space Representations for NLP, 2017
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017
2016
CoRR, 2016
Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, 2016
2015
Automatic phrase boundary labeling of speech synthesis database using context-dependent HMMs and n-gram prior distributions.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, 2015