Wen Wang

Orcid: 0000-0002-0356-1968

Affiliations:
  • Alibaba Group, DAMO Academy, Speech Lab, Sunnyvale, CA, USA
  • SRI International, Menlo Park, CA, USA (2002 - 2018)
  • Purdue University, West Lafayette, IN, USA (PhD 2002)


According to our database1, Wen Wang authored at least 128 papers between 2000 and 2025.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
SpeakerLM: End-to-End Versatile Speaker Diarization and Recognition with Multimodal Large Language Models.
CoRR, August, 2025

ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing.
CoRR, June, 2025

OmniDRCA: Parallel Speech-Text Foundation Model via Dual-Resolution Speech Representations and Contrastive Alignment.
CoRR, June, 2025

CosyVoice 3: Towards In-the-wild Speech Generation via Scaling-up and Post-training.
CoRR, May, 2025

Pushing the Frontiers of Self-Distillation Prototypes Network with Dimension Regularization and Score Normalization.
CoRR, May, 2025

OmniAudio: Generating Spatial Audio from 360-Degree Video.
CoRR, April, 2025

EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting.
CoRR, April, 2025

InspireMusic: Integrating Super Resolution and Large Language Model for High-Fidelity Long-Form Music Generation.
CoRR, March, 2025

MinMo: A Multimodal Large Language Model for Seamless Voice Interaction.
CoRR, January, 2025

WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Self-Distillation Prototypes Network: Learning Robust Speaker Representations without Supervision.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

UniCodec: Unified Audio Codec with Single Domain-Adaptive Codebook.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

ControlSpeech: Towards Simultaneous and Independent Zero-shot Speaker Cloning and Zero-shot Language Style Control.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

CodeHalu: Investigating Code Hallucinations in LLMs via Execution-based Verification.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

Recording for Eyes, Not Echoing to Ears: Contextualized Spoken-to-Written Conversion of ASR Transcripts.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024
Tuning Large Language Model for Speech Recognition With Mixed-Scale Re-Tokenization.
IEEE Signal Process. Lett., 2024

CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models.
CoRR, 2024

OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation.
CoRR, 2024

WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling.
CoRR, 2024

Recording for Eyes, Not Echoing to Ears: Contextualized Spoken-to-Written Conversion of ASR Transcripts.
CoRR, 2024

Multimodal Fusion and Coherence Modeling for Video Topic Segmentation.
CoRR, 2024

FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs.
CoRR, 2024

ClinicalLab: Aligning Agents for Multi-Departmental Clinical Diagnostics in the Real World.
CoRR, 2024

Skip-Layer Attention: Bridging Abstract and Detailed Dependencies in Transformers.
CoRR, 2024

ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control With Decoupled Codec.
CoRR, 2024

CodeHalu: Code Hallucinations in LLMs Driven by Execution-based Verification.
CoRR, 2024

Loss Masking Is Not Needed In Decoder-Only Transformer For Discrete-Token-Based ASR.
Proceedings of the IEEE International Conference on Acoustics, 2024

Advancing Precise Outline-Conditioned Text Generation with Task Duality and Explicit Outline Control.
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics, 2024

CodeScope: An Execution-based Multilingual Multitask Multidimensional Benchmark for Evaluating LLMs on Code Understanding and Generation.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023
LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT.
CoRR, 2023

Improving BERT with Hybrid Pooling Network and Drop Mask.
CoRR, 2023

Exploiting Correlations Between Contexts and Definitions with Multiple Definition Modeling.
CoRR, 2023

Enhancing Generation through Summarization Duality and Explicit Outline Control.
CoRR, 2023

Enhancing Multi-modal Multi-hop Question Answering via Structured Knowledge and Unified Retrieval-Generation.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

DePA: Improving Non-autoregressive Translation with Dependency-Aware Decoder.
Proceedings of the 20th International Conference on Spoken Language Translation, 2023

Adapter-tuning with Effective Token-dependent Representation Shift for Automatic Speech Recognition.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

MUG: A General Meeting Understanding and Generation Benchmark.
Proceedings of the IEEE International Conference on Acoustics, 2023

Overview of the ICASSP 2023 General Meeting Understanding and Generation Challenge (MUG).
Proceedings of the IEEE International Conference on Acoustics, 2023

Weighted Sampling for Masked Language Modeling.
Proceedings of the IEEE International Conference on Acoustics, 2023

Adaptive Knowledge Distillation Between Text and Speech Pre-Trained Models.
Proceedings of the IEEE International Conference on Acoustics, 2023

Auxiliary Pooling Layer For Spoken Language Understanding.
Proceedings of the IEEE International Conference on Acoustics, 2023

Meeting Action Item Detection with Regularized Context Modeling.
Proceedings of the IEEE International Conference on Acoustics, 2023

Improving Long Document Topic Segmentation Models With Enhanced Coherence Modeling.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

CodeTransOcean: A Comprehensive Multilingual Benchmark for Code Translation.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Ditto: A Simple and Efficient Approach to Improve Sentence Embeddings.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

DopplerBAS: Binaural Audio Synthesis Addressing Doppler Effect.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

2022
Enhancing Multi-modal and Multi-hop Question Answering via Structured Knowledge and Unified Retrieval-Generation.
CoRR, 2022

Non-autoregressive Translation with Dependency-Aware Decoder.
CoRR, 2022

PoNet: Pooling Network for Efficient Token Mixing in Long Sequences.
Proceedings of the Tenth International Conference on Learning Representations, 2022

MDERank: A Masked Document Embedding Rank Approach for Unsupervised Keyphrase Extraction.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022, 2022

2021
Reducing BERT Computation by Padding Removal and Curriculum Learning.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2021

Pre-Training for Spoken Language Understanding with Joint Textual and Phonetic Representation Learning.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Discriminative Self-Training for Punctuation Prediction.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Sequence Model with Self-Adaptive Sliding Window for Efficient Spoken Document Segmentation.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

2020
Sequential neural networks for noetic end-to-end response selection.
Comput. Speech Lang., 2020

Controllable Time-Delay Transformer for Real-Time Punctuation Prediction and Disfluency Detection.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019
Align, Mask and Select: A Simple Method for Incorporating Commonsense Knowledge into Language Representation Models.
CoRR, 2019

BERT for Joint Intent Classification and Slot Filling.
CoRR, 2019

Sequential Attention-based Network for Noetic End-to-End Response Selection.
CoRR, 2019

Sequential Matching Model for End-to-end Multi-turn Response Selection.
Proceedings of the IEEE International Conference on Acoustics, 2019

Transfer Learning for Context-Aware Spoken Language Understanding.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

2018
Articulatory Information and Multiview Features for Large Vocabulary Continuous Speech Recognition.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017
Joint modeling of articulatory and acoustic spaces for continuous speech recognition tasks.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Robust Features in Deep-Learning-Based Speech Recognition.
Proceedings of the New Era for Robust Speech Recognition, Exploiting Deep Learning., 2017

2016
Toward human-assisted lexical unit discovery without text resources.
Proceedings of the 2016 IEEE Spoken Language Technology Workshop, 2016

Fusion Strategies for Robust Speech Recognition and Keyword Spotting for Channel- and Noise-Degraded Speech.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

2015
Exploiting Out-of-Domain Data Sources for Dialectal Arabic Statistical Machine Translation.
CoRR, 2015

Morphological Modeling for Machine Translation of English-Iraqi Arabic Spoken Dialogs.
Proceedings of the NAACL HLT 2015, The 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Denver, Colorado, USA, May 31, 2015

Combating reverberation in large vocabulary continuous speech recognition.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Improving robustness against reverberation for automatic speech recognition.
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

Detection of Demographics and Identity in Spontaneous Speech and Writing.
Proceedings of the Multimedia Data Mining and Analytics - Disruptive Innovation, 2015

2014

Deep convolutional nets and robust features for reverberation-robust speech recognition.
Proceedings of the 2014 IEEE Spoken Language Technology Workshop, 2014

ISOMER: Informative Segment Observations for Multimedia Event Recounting.
Proceedings of the International Conference on Multimedia Retrieval, 2014

Evaluating robust features on deep neural networks for speech recognition in noisy and channel mismatched conditions.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

ASR error detection using recurrent neural network language model and complementary ASR.
Proceedings of the IEEE International Conference on Acoustics, 2014

Highly accurate phonetic segmentation using boundary correction models and system fusion.
Proceedings of the IEEE International Conference on Acoustics, 2014

2013

SRI's Submissions to Chinese-English PatentMT NTCIR10 Evaluation.
Proceedings of the 10th NTCIR Conference on Evaluation of Information Access Technologies, 2013

A Cross-language Study on Automatic Speech Disfluency Detection.
Proceedings of the Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, 2013

Automatic phonetic segmentation using boundary models.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Articulatory trajectories for large-vocabulary speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2013

Using multiple versions of speech input in phone recognition.
Proceedings of the IEEE International Conference on Acoustics, 2013

Rich system combination for keyword spotting in noisy and acoustically heterogeneous audio streams.
Proceedings of the IEEE International Conference on Acoustics, 2013

Name-aware Machine Translation.
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, 2013

2012
Detecting leadership and cohesion in spoken interactions.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Joint bilingual name tagging for parallel corpora.
Proceedings of the 21st ACM International Conference on Information and Knowledge Management, 2012

2011
Identifying Agreement/Disagreement in Conversational Speech: A Cross-Lingual Study.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Automatic identification of speaker role and agreement/disagreement in broadcast conversation.
Proceedings of the IEEE International Conference on Acoustics, 2011

Detection of Agreement and Disagreement in Broadcast Conversations.
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference, 19-24 June, 2011, Portland, Oregon, USA, 2011

N-Best Rescoring Based on Pitch-accent Patterns.
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, 2011

2010
Implementing SRI's Pashto speech-to-speech translation system on a smart phone.
Proceedings of the 2010 IEEE Spoken Language Technology Workshop, 2010

Unsupervised domain adaptation with multiple acoustic models.
Proceedings of the 2010 IEEE Spoken Language Technology Workshop, 2010

Automatic disfluency removal for improving spoken language translation.
Proceedings of the IEEE International Conference on Acoustics, 2010

2009
Building A Highly Accurate Mandarin Speech Recognizer With Language-Independent Technologies and Language-Dependent Modules.
IEEE Trans. Speech Audio Process., 2009

Anchored Speech Recognition for Question Answering.
Proceedings of the Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, Proceedings, May 31, 2009

Using syntax in large-scale audio document translation.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Multifactor adaptation for Mandarin broadcast news and conversation speech recognition.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Development of the 2008 SRI Mandarin speech-to-text system for broadcast news and conversation.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Data-driven lexicon expansion for Mandarin broadcast news and conversation speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2009

Recent advances in SRI'S IraqComm<sup>TM</sup> Iraqi Arabic-English speech-to-speech translation system.
Proceedings of the IEEE International Conference on Acoustics, 2009

2008
Speech segmentation and spoken document processing.
IEEE Signal Process. Mag., 2008

Efficient data selection for machine translation.
Proceedings of the 2008 IEEE Spoken Language Technology Workshop, 2008

Phonetic name matching for cross-lingual Spoken Sentence Retrieval.
Proceedings of the 2008 IEEE Spoken Language Technology Workshop, 2008

Development of SRI's translation systems for broadcast news and broadcast conversations.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Development of the SRI/nightingale Arabic ASR system.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Improving Alignments for Better Confusion Networks for Combining Machine Translation Systems.
Proceedings of the COLING 2008, 2008

2007
Integrating MAP, marginals, and unsupervised language model adaptation.
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

The SRI/OGI 2006 spoken term detection system.
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Advances in Mandarin broadcast speech recognition.
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Semi-Supervised Learning for Part-of-Speech Tagging of Mandarin Transcribed Speech.
Proceedings of the IEEE International Conference on Acoustics, 2007

Mandarin Part-of-Speech Tagging and Discriminative Reranking.
Proceedings of the EMNLP-CoNLL 2007, 2007

Reranking machine translation hypotheses with structured and web-based language models.
Proceedings of the IEEE Workshop on Automatic Speech Recognition & Understanding, 2007

Building a highly accurate Mandarin speech recognizer.
Proceedings of the IEEE Workshop on Automatic Speech Recognition & Understanding, 2007

2006
Recent innovations in speech-to-text transcription at SRI-ICSI-UW.
IEEE Trans. Speech Audio Process., 2006

Impact of Automatic Comma Prediction on POS/Name Tagging of speech.
Proceedings of the 2006 IEEE ACL Spoken Language Technology Workshop, 2006

Investigation on Mandarin broadcast news speech recognition.
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

The Use of Word N-Grams and Parts of Speech for Hierarchical Cluster Language Modeling.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

Speech Recognition Engineering Issues in Speech to Speech Translation System Design for Low Resource Languages and Domains.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

2005
Speech translation for low-resource languages: the case of Pashto.
Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

2004
Progress on Mandarin conversational telephone speech recognition.
Proceedings of the 2004 International Symposium on Chinese Spoken Language Processing, 2004

An efficient repair procedure for quick transcriptions.
Proceedings of the 8th International Conference on Spoken Language Processing, 2004

The use of a linguistically motivated language model in conversational speech recognition.
Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

2003
Techniques for effective vocabulary selection.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

The robustness of an almost-parsing language model given errorful training data.
Proceedings of the 2003 IEEE International Conference on Acoustics, 2003

2002
Rescoring effectiveness of language models using different levels of knowledge and their integration.
Proceedings of the IEEE International Conference on Acoustics, 2002

The SuperARV Language Model: Investigating the Effectiveness of Tightly Integrating Multiple Knowledge Sources.
Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing, 2002

2000
The Effectiveness of Corpus-Induced Dependency Grammars for Post-processing Speech.
Proceedings of the 6th Applied Natural Language Processing Conference, 2000


  Loading...