Xuedong Huang

Orcid: 0000-0003-4550-7908

Affiliations:
  • Microsoft Research, Redmond, WA, USA


According to our database1, Xuedong Huang authored at least 96 papers between 1989 and 2023.

Collaborative distances:

Awards

ACM Fellow

ACM Fellow 2016, "For contributions to spoken language processing".

IEEE Fellow

IEEE Fellow 2000, "For contributions to development of speech technology, standards, and products.".

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2023
ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text Translation.
CoRR, 2023

i-Code Studio: A Configurable and Composable Framework for Integrative AI.
CoRR, 2023

i-Code V2: An Autoregressive Generation Framework over Vision, Language, and Speech Data.
CoRR, 2023

ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text Translation.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Z-Code++: A Pre-trained Language Model Optimized for Abstractive Summarization.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

i-Code: An Integrative and Composable Multimodal Learning Framework.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
Z-Code++: A Pre-trained Language Model Optimized for Abstractive Summarization.
CoRR, 2022

Human Parity on CommonsenseQA: Augmenting Self-Attention with External Attention.
Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, 2022

One Model to Enhance Them All: Array Geometry Agnostic Multi-Channel Personalized Speech Enhancement.
Proceedings of the IEEE International Conference on Acoustics, 2022

Personalized speech enhancement: new models and Comprehensive evaluation.
Proceedings of the IEEE International Conference on Acoustics, 2022

2021
Florence: A New Foundation Model for Computer Vision.
CoRR, 2021

Leveraging Lead Bias for Zero-shot Abstractive News Summarization.
Proceedings of the SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2021

Enhancing Factual Consistency of Abstractive Summarization.
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data.
Proceedings of the 38th International Conference on Machine Learning, 2021

Fusing Context Into Knowledge Graph for Commonsense Question Answering.
Proceedings of the Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, 2021

2020
Fusing Context Into Knowledge Graph for Commonsense Reasoning.
CoRR, 2020

Mind The Facts: Knowledge-Boosted Coherent Abstractive Text Summarization.
CoRR, 2020

End-to-End Abstractive Summarization for Meetings.
CoRR, 2020

Boosting Factual Correctness of Abstractive Summarization with Knowledge Graph.
CoRR, 2020

Mixed-Lingual Pre-training for Cross-lingual Summarization.
Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, 2020

A Hierarchical Network for Abstractive Meeting Summarization with Cross-Domain Pretraining.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020, 2020

TED: A Pretrained Unsupervised Summarization Model with Theme Modeling and Denoising.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020, 2020

2019
Make Lead Bias in Your Favor: A Simple and Effective Method for News Summarization.
CoRR, 2019

Meeting Transcription Using Virtual Microphone Arrays.
CoRR, 2019

SIM: A Slot-Independent Neural Model for Dialogue State Tracking.
Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue, 2019

Meeting Transcription Using Asynchronous Distant Microphones.
Proceedings of the Interspeech 2019, 2019

Multi-task Learning for Natural Language Generation in Task-Oriented Dialogue.
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019


2018
SDNet: Contextualized Attention-based Deep Network for Conversational Question Answering.
CoRR, 2018

Achieving Human Parity on Automatic Chinese to English News Translation.
CoRR, 2018

The Microsoft 2017 Conversational Speech Recognition System.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Big Data for Speech and Language Processing.
Proceedings of the IEEE International Conference on Big Data (IEEE BigData 2018), 2018

2017
Toward Human Parity in Conversational Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2017

The microsoft 2016 conversational speech recognition system.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

2016
Achieving Human Parity in Conversational Speech Recognition.
CoRR, 2016

2015
Large-Scale Question Answering with Joint Embedding and Proof Tree Decoding.
Proceedings of the 24th ACM International Conference on Information and Knowledge Management, 2015

2014
Web Information at Your Fingertips: Paper as an Interaction Metaphor.
Computer, 2014

A historical perspective of speech recognition.
Commun. ACM, 2014

2010
An Overview of Modern Speech Recognition.
Proceedings of the Handbook of Natural Language Processing, Second Edition., 2010

2008
International workshop on question answering on the web (QAWeb2008).
Proceedings of the 17th International Conference on World Wide Web, 2008

2004
Speech and Language Processing for Multimodal Human-Computer Interaction.
J. VLSI Signal Process., 2004

Challenges in adopting speech recognition.
Commun. ACM, 2004

Direct filtering for air- and bone-conductive microphones.
Proceedings of the IEEE 6th Workshop on Multimedia Signal Processing, 2004

Enabling natural computing.
Proceedings of the 2004 International Symposium on Chinese Spoken Language Processing, 2004

Multi-sensory microphones for robust speech detection, enhancement and recognition.
Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

2002
Distributed speech processing in miPad's multimodal user interface.
IEEE Trans. Speech Audio Process., 2002

A speech-centric perspective for human-computer interface.
Proceedings of the IEEE 5th Workshop on Multimedia Signal Processing, 2002

2001

High-performance robust speech recognition using stereo training data.
Proceedings of the IEEE International Conference on Acoustics, 2001

2000
Subword-dependent speaker clustering for improved speech recognition.
Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

Mipad: a next generation PDA prototype.
Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

Large-vocabulary speech recognition under adverse acoustic environments.
Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

A unified context-free grammar and n-gram model for spoken language processing.
Proceedings of the IEEE International Conference on Acoustics, 2000

1999
Improvements on speech recognition for fast talkers.
Proceedings of the Sixth European Conference on Speech Communication and Technology, 1999

Unified decoding and feature representation for improved speech recognition.
Proceedings of the Sixth European Conference on Speech Communication and Technology, 1999

Improved topic-dependent language modeling using information retrieval techniques.
Proceedings of the 1999 IEEE International Conference on Acoustics, 1999

1998
Can continuous speech recognizers handle isolated speech?
Speech Commun., 1998

HMM-based smoothing for concatenative speech synthesis.
Proceedings of the 5th International Conference on Spoken Language Processing, Incorporating The 7th Australian International Speech Science and Technology Conference, Sydney Convention Centre, Sydney, Australia, 30th November, 1998

Vocabulary-independent word confidence measure using subword features.
Proceedings of the 5th International Conference on Spoken Language Processing, Incorporating The 7th Australian International Speech Science and Technology Conference, Sydney Convention Centre, Sydney, Australia, 30th November, 1998

How effective is unsupervised data collection for children's speech recognition?
Proceedings of the 5th International Conference on Spoken Language Processing, Incorporating The 7th Australian International Speech Science and Technology Conference, Sydney Convention Centre, Sydney, Australia, 30th November, 1998

Dynamically configurable acoustic models for speech recognition.
Proceedings of the 1998 IEEE International Conference on Acoustics, 1998

Automatic generation of synthesis units for trainable text-to-speech systems.
Proceedings of the 1998 IEEE International Conference on Acoustics, 1998

1997
Improvements on a trainable letter-to-sound converter.
Proceedings of the Fifth European Conference on Speech Communication and Technology, 1997

Recent improvements on Microsoft's trainable text-to-speech system-Whistler.
Proceedings of the 1997 IEEE International Conference on Acoustics, 1997

1996
Predicting unseen triphones with senones.
IEEE Trans. Speech Audio Process., 1996

Whistler: a trainable text-to-speech system.
Proceedings of the 4th International Conference on Spoken Language Processing, 1996

Deleted interpolation and density sharing for continuous hidden Markov models.
Proceedings of the 1996 IEEE International Conference on Acoustics, 1996

Improvements on the pronunciation prefix tree search organization.
Proceedings of the 1996 IEEE International Conference on Acoustics, 1996

Speaker and gender normalization for continuous-density hidden Markov models.
Proceedings of the 1996 IEEE International Conference on Acoustics, 1996

1995
Microsoft Windows highly intelligent speech recognizer: Whisper.
Proceedings of the 1995 International Conference on Acoustics, 1995

1994
Session 2: Language Modeling.
Proceedings of the Human Language Technology, 1994

Improving speech recognition performance via phone-dependent VQ codebooks and adaptive language models in SPHINX-II.
Proceedings of ICASSP '94: IEEE International Conference on Acoustics, 1994

1993
Shared-distribution hidden Markov models for speech recognition.
IEEE Trans. Speech Audio Process., 1993

On speaker-independent, speaker-dependent, and speaker-adaptive speech recognition.
IEEE Trans. Speech Audio Process., 1993

A comparative study of discrete, semicontinuous, and continuous hidden Markov models.
Comput. Speech Lang., 1993

The SPHINX-II speech recognition system: an overview.
Comput. Speech Lang., 1993

Efficient Cepstral Normalization For Robust Speech Recognition.
Proceedings of the Human Language Technology: Proceedings of a Workshop Held at Plainsboro, 1993

An Overview of the SPHINX-II Speech Recognition System.
Proceedings of the Human Language Technology: Proceedings of a Workshop Held at Plainsboro, 1993

Senones, multi-pass search, and unified stochastic modeling in sphinx-II.
Proceedings of the Third European Conference on Speech Communication and Technology, 1993

Unified stochastic engine (USE) for speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 1993

An improved search algorithm using incremental knowledge for continuous speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 1993

1992
Speech Understanding in Open Tasks.
Proceedings of the Speech and Natural Language: Proceedings of a Workshop Held at Harriman, 1992

Improvements in Stochastic Language Modeling.
Proceedings of the Speech and Natural Language: Proceedings of a Workshop Held at Harriman, 1992

Subphonetic Modeling for Speech Recognition.
Proceedings of the Speech and Natural Language: Proceedings of a Workshop Held at Harriman, 1992

Minimizing Speaker Variation Effects for Speaker-Independent Speech Recognition.
Proceedings of the Speech and Natural Language: Proceedings of a Workshop Held at Harriman, 1992

Applying SPHINX-II to the DARPA Wall Street Journal CSR Task.
Proceedings of the Speech and Natural Language: Proceedings of a Workshop Held at Harriman, 1992

Exploiting correlations among competing models with application to large vocabulary speech recognition.
Proceedings of the 1992 IEEE International Conference on Acoustics, 1992

Subphonetic modeling with Markov states-Senone.
Proceedings of the 1992 IEEE International Conference on Acoustics, 1992

Speaker normalization for speech recognition.
Proceedings of the 1992 IEEE International Conference on Acoustics, 1992

1991
A Study on Speaker-Adaptive Speech Recognition.
Proceedings of the Speech and Natural Language, 1991

Acoustic distribution clustering in phonetic hidden Markov models.
Proceedings of the Second European Conference on Speech Communication and Technology, 1991

Improved acoustic modeling with the SPHINX speech recognition system.
Proceedings of the 1991 International Conference on Acoustics, 1991

1990
Speech recognition using hidden Markov models: A CMU perspective.
Speech Commun., 1990

Improved Hidden Markov Modeling for Speaker-Independent Continuous Speech Recognition.
Proceedings of the Speech and Natural Language: Proceedings of a Workshop Held at Hidden Valley, 1990

On semi-continuous hidden Markov modeling.
Proceedings of the 1990 International Conference on Acoustics, 1990

1989
Large-vocabulary speaker-independent continuous speech recognition with semi-continuous hidden Markov models.
Proceedings of the First European Conference on Speech Communication and Technology, 1989


  Loading...