Wen Wang
Orcid: 0000-0002-0356-1968Affiliations:
- Alibaba Group, DAMO Academy, Speech Lab, Sunnyvale, CA, USA
- SRI International, Menlo Park, CA, USA (2002 - 2018)
- Purdue University, West Lafayette, IN, USA (PhD 2002)
According to our database1,
Wen Wang
authored at least 128 papers
between 2000 and 2025.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on linkedin.com
-
on orcid.org
On csauthors.net:
Bibliography
2025
SpeakerLM: End-to-End Versatile Speaker Diarization and Recognition with Multimodal Large Language Models.
CoRR, August, 2025
ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing.
CoRR, June, 2025
OmniDRCA: Parallel Speech-Text Foundation Model via Dual-Resolution Speech Representations and Contrastive Alignment.
CoRR, June, 2025
CoRR, May, 2025
Pushing the Frontiers of Self-Distillation Prototypes Network with Dimension Regularization and Score Normalization.
CoRR, May, 2025
CoRR, April, 2025
InspireMusic: Integrating Super Resolution and Large Language Model for High-Fidelity Long-Form Music Generation.
CoRR, March, 2025
CoRR, January, 2025
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
Self-Distillation Prototypes Network: Learning Robust Speaker Representations without Supervision.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025
ControlSpeech: Towards Simultaneous and Independent Zero-shot Speaker Cloning and Zero-shot Language Style Control.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025
CodeHalu: Investigating Code Hallucinations in LLMs via Execution-based Verification.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025
Recording for Eyes, Not Echoing to Ears: Contextualized Spoken-to-Written Conversion of ASR Transcripts.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025
2024
IEEE Signal Process. Lett., 2024
CoRR, 2024
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling.
CoRR, 2024
Recording for Eyes, Not Echoing to Ears: Contextualized Spoken-to-Written Conversion of ASR Transcripts.
CoRR, 2024
FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs.
CoRR, 2024
ClinicalLab: Aligning Agents for Multi-Departmental Clinical Diagnostics in the Real World.
CoRR, 2024
CoRR, 2024
ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control With Decoupled Codec.
CoRR, 2024
CoRR, 2024
Proceedings of the IEEE International Conference on Acoustics, 2024
Advancing Precise Outline-Conditioned Text Generation with Task Duality and Explicit Outline Control.
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics, 2024
CodeScope: An Execution-based Multilingual Multitask Multidimensional Benchmark for Evaluating LLMs on Code Understanding and Generation.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024
2023
Exploiting Correlations Between Contexts and Definitions with Multiple Definition Modeling.
CoRR, 2023
CoRR, 2023
Enhancing Multi-modal Multi-hop Question Answering via Structured Knowledge and Unified Retrieval-Generation.
Proceedings of the 31st ACM International Conference on Multimedia, 2023
Proceedings of the 20th International Conference on Spoken Language Translation, 2023
Adapter-tuning with Effective Token-dependent Representation Shift for Automatic Speech Recognition.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Overview of the ICASSP 2023 General Meeting Understanding and Generation Challenge (MUG).
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023
2022
Enhancing Multi-modal and Multi-hop Question Answering via Structured Knowledge and Unified Retrieval-Generation.
CoRR, 2022
Proceedings of the Tenth International Conference on Learning Representations, 2022
MDERank: A Masked Document Embedding Rank Approach for Unsupervised Keyphrase Extraction.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022, 2022
2021
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2021
Pre-Training for Spoken Language Understanding with Joint Textual and Phonetic Representation Learning.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Sequence Model with Self-Adaptive Sliding Window for Efficient Spoken Document Segmentation.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021
2020
Comput. Speech Lang., 2020
Controllable Time-Delay Transformer for Real-Time Punctuation Prediction and Disfluency Detection.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
2019
Align, Mask and Select: A Simple Method for Incorporating Commonsense Knowledge into Language Representation Models.
CoRR, 2019
CoRR, 2019
Proceedings of the IEEE International Conference on Acoustics, 2019
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019
2018
Articulatory Information and Multiview Features for Large Vocabulary Continuous Speech Recognition.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018
2017
Joint modeling of articulatory and acoustic spaces for continuous speech recognition tasks.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017
Proceedings of the New Era for Robust Speech Recognition, Exploiting Deep Learning., 2017
2016
Proceedings of the 2016 IEEE Spoken Language Technology Workshop, 2016
Fusion Strategies for Robust Speech Recognition and Keyword Spotting for Channel- and Noise-Degraded Speech.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016
2015
Exploiting Out-of-Domain Data Sources for Dialectal Arabic Statistical Machine Translation.
CoRR, 2015
Morphological Modeling for Machine Translation of English-Iraqi Arabic Spoken Dialogs.
Proceedings of the NAACL HLT 2015, The 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Denver, Colorado, USA, May 31, 2015
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015
Proceedings of the Multimedia Data Mining and Analytics - Disruptive Innovation, 2015
2014
Proceedings of the 2014 TREC Video Retrieval Evaluation, 2014
Deep convolutional nets and robust features for reverberation-robust speech recognition.
Proceedings of the 2014 IEEE Spoken Language Technology Workshop, 2014
Proceedings of the International Conference on Multimedia Retrieval, 2014
Evaluating robust features on deep neural networks for speech recognition in noisy and channel mismatched conditions.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014
ASR error detection using recurrent neural network language model and complementary ASR.
Proceedings of the IEEE International Conference on Acoustics, 2014
Highly accurate phonetic segmentation using boundary correction models and system fusion.
Proceedings of the IEEE International Conference on Acoustics, 2014
2013
Proceedings of the 2013 TREC Video Retrieval Evaluation, 2013
Proceedings of the 10th NTCIR Conference on Evaluation of Information Access Technologies, 2013
Proceedings of the Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, 2013
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013
Proceedings of the IEEE International Conference on Acoustics, 2013
Proceedings of the IEEE International Conference on Acoustics, 2013
Rich system combination for keyword spotting in noisy and acoustically heterogeneous audio streams.
Proceedings of the IEEE International Conference on Acoustics, 2013
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, 2013
2012
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012
Proceedings of the 21st ACM International Conference on Information and Knowledge Management, 2012
2011
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011
Automatic identification of speaker role and agreement/disagreement in broadcast conversation.
Proceedings of the IEEE International Conference on Acoustics, 2011
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference, 19-24 June, 2011, Portland, Oregon, USA, 2011
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, 2011
2010
Proceedings of the 2010 IEEE Spoken Language Technology Workshop, 2010
Proceedings of the 2010 IEEE Spoken Language Technology Workshop, 2010
Proceedings of the IEEE International Conference on Acoustics, 2010
2009
Building A Highly Accurate Mandarin Speech Recognizer With Language-Independent Technologies and Language-Dependent Modules.
IEEE Trans. Speech Audio Process., 2009
Proceedings of the Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, Proceedings, May 31, 2009
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009
Multifactor adaptation for Mandarin broadcast news and conversation speech recognition.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009
Development of the 2008 SRI Mandarin speech-to-text system for broadcast news and conversation.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009
Data-driven lexicon expansion for Mandarin broadcast news and conversation speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2009
Recent advances in SRI'S IraqComm<sup>TM</sup> Iraqi Arabic-English speech-to-speech translation system.
Proceedings of the IEEE International Conference on Acoustics, 2009
2008
Proceedings of the 2008 IEEE Spoken Language Technology Workshop, 2008
Proceedings of the 2008 IEEE Spoken Language Technology Workshop, 2008
Development of SRI's translation systems for broadcast news and broadcast conversations.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008
Improving Alignments for Better Confusion Networks for Combining Machine Translation Systems.
Proceedings of the COLING 2008, 2008
2007
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007
Proceedings of the IEEE International Conference on Acoustics, 2007
Proceedings of the EMNLP-CoNLL 2007, 2007
Reranking machine translation hypotheses with structured and web-based language models.
Proceedings of the IEEE Workshop on Automatic Speech Recognition & Understanding, 2007
Proceedings of the IEEE Workshop on Automatic Speech Recognition & Understanding, 2007
2006
IEEE Trans. Speech Audio Process., 2006
Proceedings of the 2006 IEEE ACL Spoken Language Technology Workshop, 2006
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006
The Use of Word N-Grams and Parts of Speech for Hierarchical Cluster Language Modeling.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006
Speech Recognition Engineering Issues in Speech to Speech Translation System Design for Low Resource Languages and Domains.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006
2005
Proceedings of the 9th European Conference on Speech Communication and Technology, 2005
2004
Proceedings of the 2004 International Symposium on Chinese Spoken Language Processing, 2004
Proceedings of the 8th International Conference on Spoken Language Processing, 2004
The use of a linguistically motivated language model in conversational speech recognition.
Proceedings of the 2004 IEEE International Conference on Acoustics, 2004
2003
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003
Proceedings of the 2003 IEEE International Conference on Acoustics, 2003
2002
Rescoring effectiveness of language models using different levels of knowledge and their integration.
Proceedings of the IEEE International Conference on Acoustics, 2002
The SuperARV Language Model: Investigating the Effectiveness of Tightly Integrating Multiple Knowledge Sources.
Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing, 2002
2000
Proceedings of the 6th Applied Natural Language Processing Conference, 2000