Shujie Liu

  • Microsoft Research Asia, Beijing, China

According to our database1, Shujie Liu authored at least 153 papers between 2008 and 2024.

Collaborative distances:



In proceedings 
PhD thesis 




VatLM: Visual-Audio-Text Pre-Training With Unified Masked Prediction for Speech Representation Learning.
IEEE Trans. Multim., 2024

SpeechLM: Enhanced Speech Pre-Training With Unpaired Textual Data.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

VioLA: Conditional Language Models for Speech Recognition, Synthesis, and Translation.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

SpeechX: Neural Codec Language Model as a Versatile Speech Transformer.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

Advanced Long-Content Speech Recognition With Factorized Neural Transducer.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

Autoregressive Speech Synthesis without Vector Quantization.
CoRR, 2024

VALL-E R: Robust and Efficient Zero-Shot Text-to-Speech Synthesis via Monotonic Alignment.
CoRR, 2024

VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers.
CoRR, 2024

TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation.
CoRR, 2024

CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations.
CoRR, 2024

RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis.
CoRR, 2024

WavLLM: Towards Robust and Adaptive Speech Large Language Model.
CoRR, 2024

Boosting Large Language Model for Speech Synthesis: An Empirical Study.
CoRR, 2024

COSMIC: Data Efficient Instruction-tuning For Speech In-Context Learning.
CoRR, 2023

Diffusion Conditional Expectation Model for Efficient and Robust Target Speech Extraction.
CoRR, 2023

WavMark: Watermarking for Audio Generation.
CoRR, 2023

SpeechX: Neural Codec Language Model as a Versatile Speech Transformer.
CoRR, 2023

VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation.
CoRR, 2023

ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text Translation.
CoRR, 2023

Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling.
CoRR, 2023

Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers.
CoRR, 2023

ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text Translation.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

LAMASSU: A Streaming Language-Agnostic Multilingual Speech Recognition and Translation Model Using Neural Transducers.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Accelerating Transducers through Adjacent Token Merging.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

GRAVO: Learning to Generate Relevant Audio from Visual Features with Noisy Online Videos.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

BEATs: Audio Pre-Training with Acoustic Tokenizers.
Proceedings of the International Conference on Machine Learning, 2023

Robust Data2VEC: Noise-Robust Speech Representation Learning for ASR by Combining Regression and Improved Contrastive Learning.
Proceedings of the IEEE International Conference on Acoustics, 2023

Code-Switching Text Generation and Injection in Mandarin-English ASR.
Proceedings of the IEEE International Conference on Acoustics, 2023

Joint Pre-Training with Speech and Bilingual Text for Direct Speech to Speech Translation.
Proceedings of the IEEE International Conference on Acoustics, 2023

DATA2VEC-SG: Improving Self-Supervised Learning Representations for Speech Generation Tasks.
Proceedings of the IEEE International Conference on Acoustics, 2023

Target Sound Extraction with Variable Cross-Modality Clues.
Proceedings of the IEEE International Conference on Acoustics, 2023

LongFNT: Long-Form Speech Recognition with Factorized Neural Transducer.
Proceedings of the IEEE International Conference on Acoustics, 2023

Prosody-Aware Speecht5 for Expressive Neural TTS.
Proceedings of the IEEE International Conference on Acoustics, 2023

On Decoder-Only Architecture For Speech-to-Text and Large Language Model Integration.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Building High-Accuracy Multilingual ASR With Gated Language Experts and Curriculum Training.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Prompting Large Language Models for Zero-Shot Domain Adaptation in Speech Recognition.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing.
IEEE J. Sel. Top. Signal Process., 2022

BEATs: Audio Pre-Training with Acoustic Tokenizers.
CoRR, 2022

LAMASSU: Streaming Language-Agnostic Multilingual Speech Recognition and Translation Using Neural Transducers.
CoRR, 2022

The YiTrans End-to-End Speech Translation System for IWSLT 2022 Offline Shared Task.
CoRR, 2022

Exploring WavLM on Speech Enhancement.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Two-Stream Network for Sign Language Recognition and Translation.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Separating Long-Form Speech with Group-wise Permutation Invariant Training.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Supervision-Guided Codebooks for Masked Prediction in Speech Pre-training.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Speech Pre-training with Acoustic Piece.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition?
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

A Configurable Multilingual Model is All You Need to Recognize All Languages.
Proceedings of the IEEE International Conference on Acoustics, 2022

Improving Self-Supervised Learning for Speech Recognition with Intermediate Layer Supervision.
Proceedings of the IEEE International Conference on Acoustics, 2022

Optimizing Alignment of Speech and Language Latent Spaces for End-To-End Speech Recognition and Understanding.
Proceedings of the IEEE International Conference on Acoustics, 2022

Improving Noise Robustness of Contrastive Speech Representation Learning with Speech Reconstruction.
Proceedings of the IEEE International Conference on Acoustics, 2022

Multi-View Self-Attention Based Transformer for Speaker Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2022

Unispeech-Sat: Universal Speech Representation Learning With Speaker Aware Pre-Training.
Proceedings of the IEEE International Conference on Acoustics, 2022

Large-Scale Self-Supervised Speech Representation Learning for Automatic Speaker Verification.
Proceedings of the IEEE International Conference on Acoustics, 2022

SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

Self-Supervised Learning for speech recognition with Intermediate layer supervision.
CoRR, 2021

WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing.
CoRR, 2021

SpeechT5: Unified-Modal Encoder-Decoder Pre-training for Spoken Language Processing.
CoRR, 2021

CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation.
Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, 2021

Investigation of Practical Aspects of Single Channel Speech Separation for ASR.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Improving Multilingual Transformer Transducer Models by Reducing Language Confusions.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Ultra Fast Speech Separation Model with Teacher Student Learning.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data.
Proceedings of the 38th International Conference on Machine Learning, 2021

GraphCodeBERT: Pre-training Code Representations with Data Flow.
Proceedings of the 9th International Conference on Learning Representations, 2021

Microsoft Speaker Diarization System for the Voxceleb Speaker Recognition Challenge 2020.
Proceedings of the IEEE International Conference on Acoustics, 2021

Continuous Speech Separation with Conformer.
Proceedings of the IEEE International Conference on Acoustics, 2021

Developing Real-Time Streaming Transformer Transducer for Speech Recognition on Large-Scale Dataset.
Proceedings of the IEEE International Conference on Acoustics, 2021

Don't Shoot Butterfly with Rifles: Multi-Channel Continuous Speech Separation with Early Exit Transformer.
Proceedings of the IEEE International Conference on Acoustics, 2021

Knowledge Enhanced Fine-Tuning for Better Handling Unseen Entities in Dialogue Generation.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

Jointly Learning to Repair Code and Generate Commit Message.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

Grammar-Based Patches Generation for Automated Program Repair.
Proceedings of the Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, 2021

SemFace: Pre-training Encoder and Decoder with a Semantic Interface for Neural Machine Translation.
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

Joint Learning of Question Answering and Question Generation.
IEEE Trans. Knowl. Data Eng., 2020

A Hierarchical Clustering Approach to Fuzzy Semantic Representation of Rare Words in Neural Machine Translation.
IEEE Trans. Fuzzy Syst., 2020

Don't shoot butterfly with rifles: Multi-channel Continuous Speech Separation with Early Exit Transformer.
CoRR, 2020

CodeBLEU: a Method for Automatic Evaluation of Code Synthesis.
CoRR, 2020

Continuous Speech Separation with Conformer.
CoRR, 2020

MoBoAligner: A Neural Alignment Model for Non-Autoregressive TTS with Monotonic Boundary Search.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

On the Comparison of Popular End-to-End Models for Large Scale Speech Recognition.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Low Latency End-to-End Streaming Speech Recognition with a Scout Network.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Semantic Mask for Transformer Based End-to-End Speech Recognition.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Curriculum Pre-training for End-to-End Speech Translation.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

A Retrieve-and-Rewrite Initialization Method for Unsupervised Machine Translation.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

A Graph-based Coarse-to-fine Method for Unsupervised Bilingual Lexicon Induction.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

MuTual: A Dataset for Multi-Turn Dialogue Reasoning.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

A Dataset for Low-Resource Stylized Sequence-to-Sequence Generation.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

Bridging the Gap between Pre-Training and Fine-Tuning for End-to-End Speech Translation.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

RobuTrans: A Robust Transformer-Based Text-to-Speech Model.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

Accelerating Transformer Decoding via a Hybrid of Self-attention and Recurrent Neural Network.
CoRR, 2019

Source Dependency-Aware Transformer with Supervised Self-Attention.
CoRR, 2019

Unsupervised Context Rewriting for Open Domain Conversation.
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019

Explicit Cross-lingual Pre-training for Unsupervised Machine Translation.
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019

Regularizing Neural Machine Translation by Target-Bidirectional Agreement.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

Unsupervised Neural Machine Translation with SMT as Posterior Regularization.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

Neural Speech Synthesis with Transformer Network.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

Close to Human Quality TTS with Transformer.
CoRR, 2018

Approximate Distribution Matching for Sequence-to-Sequence Learning.
CoRR, 2018

Style Transfer as Unsupervised Machine Translation.
CoRR, 2018

Regularizing Neural Machine Translation by Target-bidirectional Agreement.
CoRR, 2018

Achieving Human Parity on Automatic Chinese to English News Translation.
CoRR, 2018

Coarse-To-Fine Learning for Neural Machine Translation.
Proceedings of the Natural Language Processing and Chinese Computing, 2018

Learning to Collaborate for Question Answering and Asking.
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018

Generative Bridging Network for Neural Sequence Prediction.
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018

Bidirectional Generative Adversarial Networks for Neural Machine Translation.
Proceedings of the 22nd Conference on Computational Natural Language Learning, 2018

Triangular Architecture for Rare Language Translation.
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018

Joint Training for Neural Machine Translation Models with Monolingual Data.
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

Assertion-Based QA With Question-Aware Open Information Extraction.
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

Neural Sequence Prediction by Coaching.
CoRR, 2017

Modeling Indicative Context for Statistical Machine Translation.
Proceedings of the Natural Language Processing and Chinese Computing, 2017

Stack-based Multi-layer Attention for Transition-based Dependency Parsing.
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2017

Chunk-based Decoder for Neural Machine Translation.
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017

Implicit Distortion and Fertility Models for Attention-based Encoder-Decoder NMT Model.
CoRR, 2016

Improving Attention Modeling with Implicit Distortion and Fertility for Machine Translation.
Proceedings of the COLING 2016, 2016

Knowledge-Based Semantic Embedding for Machine Translation.
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 2016

Towards Machine Translation in Semantic Vector Space.
ACM Trans. Asian Low Resour. Lang. Inf. Process., 2015

Beyond Word-based Language Model in Statistical Machine Translation.
CoRR, 2015

A Statistical Parsing Framework for Sentiment Classification.
Comput. Linguistics, 2015

A Maximum Entropy Approach to Discourse Coherence Modeling.
Proceedings of the Natural Language Processing and Chinese Computing - 4th CCF Conference, 2015

Entity Translation with Collective Inference in Knowledge Graph.
Proceedings of the Natural Language Processing and Chinese Computing - 4th CCF Conference, 2015

Hierarchical Recurrent Neural Network for Document Modeling.
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015

Woodpecker: An Automatic Methodology for Machine Translation Diagnosis with Rich Linguistic Knowledge.
J. Inf. Sci. Eng., 2014

Bilingually-constrained Phrase Embeddings for Machine Translation.
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, 2014

A Recursive Recurrent Neural Network for Statistical Machine Translation.
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, 2014

Learning Topic Representation for SMT with Neural Networks.
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, 2014

Mind the Gap: Machine Translation by Minimizing the Semantic Gap in Embedding Space.
Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, 2014

Collective Corpus Weighting and Phrase Scoring for SMT Using Graph-Based Random Walk.
Proceedings of the Natural Language Processing and Chinese Computing, 2013

Efficient Collective Entity Linking with Stacking.
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, 2013

Multi-Domain Adaptation for SMT Using Multi-Task Learning.
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, 2013

Word Alignment Modeling with Context Dependent Deep Neural Network.
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, 2013

Learning Entity Representation for Entity Disambiguation.
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, 2013

Bilingual Data Cleaning for SMT using Graph-based Random Walk.
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, 2013

A novel 3D video transcoding scheme for adaptive 3D video transmission to heterogeneous terminals.
ACM Trans. Multim. Comput. Commun. Appl., 2012

Virtual View Reconstruction Using Temporal Information.
Proceedings of the 2012 IEEE International Conference on Multimedia and Expo, 2012

Re-training Monolingual Parser Bilingually for Syntactic SMT.
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, 2012

Learning Translation Consensus with Structured Label Propagation.
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, July 8-14, 2012, Jeju Island, Korea, 2012

New Depth Coding Techniques With Utilization of Corresponding Video.
IEEE Trans. Broadcast., 2011

Scalable video transmission: packet loss induced distortion modeling and estimation.
Proceedings of the Network and Operating System Support for Digital Audio and Video, 2011

A Unified SMT Framework Combining MIRA and MERT.
Proceedings of Machine Translation Summit XIII: Papers, 2011

Statistic Machine Translation Boosted with Spurious Word Deletion.
Proceedings of Machine Translation Summit XIII: Papers, 2011

Transductive Minimum Error Rate Training for Statistical Machine Translation.
Proceedings of the Fifth International Joint Conference on Natural Language Processing, 2011

Joint trilateral filtering for depth map compression.
Proceedings of the Visual Communications and Image Processing 2010, 2010

3D video transcoding for virtual views.
Proceedings of the 18th International Conference on Multimedia 2010, 2010

The MSRA machine translation system for IWSLT 2010.
Proceedings of the 2010 International Workshop on Spoken Language Translation, 2010

A novel prioritized spatial multiplexing for MIMO wireless system with application to H.264 SVC video.
Proceedings of the 2010 IEEE International Conference on Multimedia and Expo, 2010

Sparse dyadic mode for depth map compression.
Proceedings of the International Conference on Image Processing, 2010

Improved Discriminative ITG Alignment using Hierarchical Phrase Pairs and Semi-supervised Training.
Proceedings of the COLING 2010, 2010

Discriminative Pruning for Discriminative ITG Alignment.
Proceedings of the ACL 2010, 2010

Multiview Video transcoding: From multiple views to single view.
Proceedings of the 2009 Picture Coding Symposium, 2009

An EMD Based Approach to Transliteration Unit Alignment between English and Chinese.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008

Frame loss error concealment for multiview video coding.
Proceedings of the International Symposium on Circuits and Systems (ISCAS 2008), 2008

Low-complexity asymmetric multiview video coding.
Proceedings of the 2008 IEEE International Conference on Multimedia and Expo, 2008

Diagnostic Evaluation of Machine Translation Systems Using Automatically Constructed Linguistic Check-Points.
Proceedings of the COLING 2008, 2008
