Qian Chen

Orcid: 0000-0002-1263-9560

Affiliations:

Alibaba Group, DAMO Academy, Speech Lab, China
University of Science and Technology of China, National Engineering Laboratory of Speech and Language Information Processing, Hefei, China

According to our database¹, Qian Chen authored at least 106 papers between 2015 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Bibliography

2025

Eliminating stability hallucinations in llm-based tts models via attention guidance.

[BibT_eX]

[DOI]

CoRR, September, 2025

Enhancing the Robustness of Contextual ASR to Varying Biasing Information Volumes Through Purified Semantic Correlation Joint Modeling.

[BibT_eX]

[DOI]

CoRR, September, 2025

Say More with Less: Variable-Frame-Rate Speech Tokenization via Adaptive Clustering and Implicit Duration Coding.

[BibT_eX]

[DOI]

CoRR, September, 2025

SpeakerLM: End-to-End Versatile Speaker Diarization and Recognition with Multimodal Large Language Models.

[BibT_eX]

[DOI]

CoRR, August, 2025

ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing.

[BibT_eX]

[DOI]

CoRR, June, 2025

KaLM-Embedding-V2: Superior Training Techniques and Data Inspire A Versatile Embedding Model.

[BibT_eX]

[DOI]

CoRR, June, 2025

OmniDRCA: Parallel Speech-Text Foundation Model via Dual-Resolution Speech Representations and Contrastive Alignment.

[BibT_eX]

[DOI]

CoRR, June, 2025

CosyVoice 3: Towards In-the-wild Speech Generation via Scaling-up and Post-training.

[BibT_eX]

[DOI]

CoRR, May, 2025

EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting.

[BibT_eX]

[DOI]

CoRR, April, 2025

InspireMusic: Integrating Super Resolution and Large Language Model for High-Fidelity Long-Form Music Generation.

[BibT_eX]

[DOI]

CoRR, March, 2025

MinMo: A Multimodal Large Language Model for Seamless Voice Interaction.

[BibT_eX]

[DOI]

CoRR, January, 2025

KaLM-Embedding: Superior Training Data Brings A Stronger Embedding Model.

[BibT_eX]

[DOI]

CoRR, January, 2025

Pushing the Frontiers of Self-Distillation Prototypes Network with Dimension Regularization and Score Normalization.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

OmniAudio: Generating Spatial Audio from 360-Degree Video.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

3D-Speaker-Toolkit: An Open-Source Toolkit for Multimodal Speaker Verification and Diarization.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Self-Distillation Prototypes Network: Learning Robust Speaker Representations without Supervision.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

UniCodec: Unified Audio Codec with Single Domain-Adaptive Codebook.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

ControlSpeech: Towards Simultaneous and Independent Zero-shot Speaker Cloning and Zero-shot Language Style Control.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Integrating Audio, Visual, and Semantic Information for Enhanced Multimodal Speaker Diarization on Multi-party Conversation.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

CodeHalu: Investigating Code Hallucinations in LLMs via Execution-based Verification.

[BibT_eX]

[DOI]

Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

Speech Recognition Meets Large Language Model: Benchmarking, Models, and Exploration.

[BibT_eX]

[DOI]

Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

Recording for Eyes, Not Echoing to Ears: Contextualized Spoken-to-Written Conversion of ASR Transcripts.

[BibT_eX]

[DOI]

Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024

Tuning Large Language Model for Speech Recognition With Mixed-Scale Re-Tokenization.

[BibT_eX]

[DOI]

IEEE Signal Process. Lett., 2024

CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation.

[BibT_eX]

[DOI]

CoRR, 2024

WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling.

[BibT_eX]

[DOI]

CoRR, 2024

Integrating Audio, Visual, and Semantic Information for Enhanced Multimodal Speaker Diarization.

[BibT_eX]

[DOI]

CoRR, 2024

Recording for Eyes, Not Echoing to Ears: Contextualized Spoken-to-Written Conversion of ASR Transcripts.

[BibT_eX]

[DOI]

CoRR, 2024

Multimodal Fusion and Coherence Modeling for Video Topic Segmentation.

[BibT_eX]

[DOI]

CoRR, 2024

CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens.

[BibT_eX]

[DOI]

CoRR, 2024

FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs.

[BibT_eX]

[DOI]

CoRR, 2024

ClinicalLab: Aligning Agents for Multi-Departmental Clinical Diagnostics in the Real World.

[BibT_eX]

[DOI]

CoRR, 2024

Skip-Layer Attention: Bridging Abstract and Detailed Dependencies in Transformers.

[BibT_eX]

[DOI]

CoRR, 2024

ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control With Decoupled Codec.

[BibT_eX]

[DOI]

CoRR, 2024

CodeHalu: Code Hallucinations in LLMs Driven by Execution-based Verification.

[BibT_eX]

[DOI]

CoRR, 2024

An Embarrassingly Simple Approach for LLM with Strong ASR Capacity.

[BibT_eX]

[DOI]

CoRR, 2024

E-Chat: Emotion-Sensitive Spoken Dialogue System with Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the 14th IEEE International Symposium on Chinese Spoken Language Processing, 2024

ERes2NetV2: Boosting Short-Duration Speaker Verification Performance with Computational Efficiency.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Leveraging Speech PTM, Text LLM, And Emotional TTS For Speech Emotion Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Loss Masking Is Not Needed In Decoder-Only Transformer For Discrete-Token-Based ASR.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Advancing Precise Outline-Conditioned Text Generation with Task Duality and Explicit Outline Control.

[BibT_eX]

[DOI]

Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics, 2024

CodeScope: An Execution-based Multilingual Multitask Multidimensional Benchmark for Evaluating LLMs on Code Understanding and Generation.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023

LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT.

[BibT_eX]

[DOI]

CoRR, 2023

Improving Speaker Diarization using Semantic Information: Joint Pairwise Constraints Propagation.

[BibT_eX]

[DOI]

CoRR, 2023

Self-Distillation Network with Ensemble Prototypes: Learning Robust Speaker Representations without Supervision.

[BibT_eX]

[DOI]

Yafeng Chen

Siqi Zheng

Qian Chen

CoRR, 2023

Improving BERT with Hybrid Pooling Network and Drop Mask.

[BibT_eX]

[DOI]

CoRR, 2023

3D-Speaker: A Large-Scale Multi-Device, Multi-Distance, and Multi-Dialect Corpus for Speech Representation Disentanglement.

[BibT_eX]

[DOI]

CoRR, 2023

Exploiting Correlations Between Contexts and Definitions with Multiple Definition Modeling.

[BibT_eX]

[DOI]

CoRR, 2023

Enhancing Generation through Summarization Duality and Explicit Outline Control.

[BibT_eX]

[DOI]

CoRR, 2023

Enhancing Multi-modal Multi-hop Question Answering via Structured Knowledge and Unified Retrieval-Generation.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

DePA: Improving Non-autoregressive Translation with Dependency-Aware Decoder.

[BibT_eX]

[DOI]

Proceedings of the 20th International Conference on Spoken Language Translation, 2023

CAM++: A Fast and Efficient Network for Speaker Verification Using Context-Aware Masking.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Semantic VAD: Low-Latency Voice Activity Detection for Speech Interaction.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

CASA-ASR: Context-Aware Speaker-Attributed ASR.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Adapter-tuning with Effective Token-dependent Representation Shift for Automatic Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

BA-SOT: Boundary-Aware Serialized Output Training for Multi-Talker ASR.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Personality-aware Training based Speaker Adaptation for End-to-end Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

An Enhanced Res2Net with Local and Global Feature Fusion for Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

MUG: A General Meeting Understanding and Generation Benchmark.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Overview of the ICASSP 2023 General Meeting Understanding and Generation Challenge (MUG).

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Weighted Sampling for Masked Language Modeling.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Auxiliary Pooling Layer For Spoken Language Understanding.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Meeting Action Item Detection with Regularized Context Modeling.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Pushing the Limits of Self-Supervised Speaker Verification using Regularized Distillation Framework.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Improving Long Document Topic Segmentation Models With Enhanced Coherence Modeling.

[BibT_eX]

[DOI]

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

CodeTransOcean: A Comprehensive Multilingual Benchmark for Code Translation.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Ditto: A Simple and Efficient Approach to Improve Sentence Embeddings.

[BibT_eX]

[DOI]

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

The Second Multi-Channel Multi-Party Meeting Transcription Challenge (M2MeT 2.0): A Benchmark for Speaker-Attributed ASR.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

A Comparative Study on Multichannel Speaker-Attributed Automatic Speech Recognition in Multi-party Meetings.

[BibT_eX]

[DOI]

Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2023

DopplerBAS: Binaural Audio Synthesis Addressing Doppler Effect.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

Exploring Speaker-Related Information in Spoken Language Understanding for Better Speaker Diarization.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

2022

Enhancing Multi-modal and Multi-hop Question Answering via Structured Knowledge and Unified Retrieval-Generation.

[BibT_eX]

[DOI]

CoRR, 2022

Non-autoregressive Translation with Dependency-Aware Decoder.

[BibT_eX]

[DOI]

CoRR, 2022

PRISM: Pre-trained Indeterminate Speaker Representation Model for Speaker Diarization and Speaker Verification.

[BibT_eX]

[DOI]

Siqi Zheng

Hongbin Suo

Qian Chen

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

PoNet: Pooling Network for Efficient Token Mixing in Long Sequences.

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Learning Representations, 2022

Prosospeech: Enhancing Prosody with Quantized Vector Pre-Training in Text-To-Speech.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

MDERank: A Masked Document Embedding Rank Approach for Unsupervised Keyphrase Extraction.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022, 2022

2021

BeamTransformer: Microphone Array-based Overlapping Speech Detection.

[BibT_eX]

[DOI]

CoRR, 2021

TRS: Transferability Reduced Ensemble via Encouraging Gradient Diversity and Model Smoothness.

[BibT_eX]

[DOI]

Benjamin I. P. Rubinstein

Ce Zhang

Bo Li

CoRR, 2021

TRS: Transferability Reduced Ensemble via Promoting Gradient Diversity and Model Smoothness.

[BibT_eX]

[DOI]

Benjamin I. P. Rubinstein

Ce Zhang

Bo Li

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Pre-Training for Spoken Language Understanding with Joint Textual and Phonetic Representation Learning.

[BibT_eX]

[DOI]

Qian Chen

Wen Wang

Qinglin Zhang

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Discriminative Self-Training for Punctuation Prediction.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Sequence Model with Self-Adaptive Sliding Window for Efficient Spoken Document Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

2020

Sequential neural networks for noetic end-to-end response selection.

[BibT_eX]

[DOI]

Qian Chen

Wen Wang

Comput. Speech Lang., 2020

Controllable Time-Delay Transformer for Real-Time Punctuation Prediction and Disfluency Detection.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

T3: Tree-Autoencoder Constrained Adversarial Text Generation for Targeted Attack.

[BibT_eX]

[DOI]

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

2019

Align, Mask and Select: A Simple Method for Incorporating Commonsense Knowledge into Language Representation Models.

[BibT_eX]

[DOI]

CoRR, 2019

Several Experiments on Investigating Pretraining and Knowledge-Enhanced Models for Natural Language Inference.

[BibT_eX]

[DOI]

CoRR, 2019

BERT for Joint Intent Classification and Slot Filling.

[BibT_eX]

[DOI]

Qian Chen

Zhu Zhuo

Wen Wang

CoRR, 2019

Sequential Attention-based Network for Noetic End-to-End Response Selection.

[BibT_eX]

[DOI]

Qian Chen

Wen Wang

CoRR, 2019

Sequential Matching Model for End-to-end Multi-turn Response Selection.

[BibT_eX]

[DOI]

Qian Chen

Wen Wang

Proceedings of the IEEE International Conference on Acoustics, 2019

Transfer Learning for Context-Aware Spoken Language Understanding.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

2018

A Sequential Neural Encoder With Latent Structured Description for Modeling Sentences.

[BibT_eX]

[DOI]

Yu-Ping Ruan

Qian Chen

Zhen-Hua Ling

IEEE ACM Trans. Audio Speech Lang. Process., 2018

VisDrone-DET2018: The Vision Meets Drone Object Detection in Image Challenge Results.

[BibT_eX]

[DOI]

Konstantinos Avgerinakis

Naveen Kumar Vedurupaka

Nehal Mamgain

Nitin Bansal

Oliver Acatay

Panagiotis Giannakeris

Vineeth N. Balasubramanian

Proceedings of the Computer Vision - ECCV 2018 Workshops, 2018

Enhancing Sentence Embedding with Generalized Pooling.

[BibT_eX]

[DOI]

Qian Chen

Zhen-Hua Ling

Xiaodan Zhu

Proceedings of the 27th International Conference on Computational Linguistics, 2018

Neural Natural Language Inference Models Enhanced with External Knowledge.

[BibT_eX]

[DOI]

Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018

2017

Natural Language Inference with External Knowledge.

[BibT_eX]

[DOI]

CoRR, 2017

Exploring Question Understanding and Adaptation in Neural-Network-Based Question Answering.

[BibT_eX]

[DOI]

CoRR, 2017

Recurrent Neural Network-Based Sentence Encoder with Gated Attention for Natural Language Inference.

[BibT_eX]

[DOI]

Proceedings of the 2nd Workshop on Evaluating Vector Space Representations for NLP, 2017

Enhanced LSTM for Natural Language Inference.

[BibT_eX]

[DOI]

Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017

2016

Distraction-Based Neural Networks for Document Summarization.

[BibT_eX]

[DOI]

CoRR, 2016

Enhancing and Combining Sequential and Tree LSTM for Natural Language Inference.

[BibT_eX]

[DOI]

CoRR, 2016

Distraction-Based Neural Networks for Modeling Document.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, 2016

2015

Automatic phrase boundary labeling of speech synthesis database using context-dependent HMMs and n-gram prior distributions.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Revisiting Word Embedding for Contrasting Meaning.

[BibT_eX]

[DOI]

Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, 2015

Qian Chen

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...