Xie Chen

Orcid: 0000-0001-7423-617X

Affiliations:
  • Shanghai Jiao Tong University, China
  • Microsoft, Redmond, WA, USA (former)
  • University of Cambridge, UK (former)


According to our database1, Xie Chen authored at least 74 papers between 2011 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Advanced Long-Content Speech Recognition With Factorized Neural Transducer.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

Advanced Long-Content Speech Recognition With Factorized Neural Transducer.
CoRR, 2024

An Embarrassingly Simple Approach for LLM with Strong ASR Capacity.
CoRR, 2024

BAT: Learning to Reason about Spatial Sounds with Large Language Models.
CoRR, 2024

ELLA-V: Stable Neural Codec Language Modeling with Alignment-guided Sequence Reordering.
CoRR, 2024

EAT: Self-Supervised Pre-Training with Efficient Audio Transformer.
CoRR, 2024

2023
Speaker Adaptive Text-to-Speech With Timbre-Normalized Vector-Quantized Feature.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation.
CoRR, 2023

Expressive TTS Driven by Natural Language Prompts Using Few Human Annotations.
CoRR, 2023

Acoustic BPE for Speech Generation with Discrete Tokens.
CoRR, 2023

Leveraging Speech PTM, Text LLM, and Emotional TTS for Speech Emotion Recognition.
CoRR, 2023

Improved Factorized Neural Transducer Model For text-only Domain Adaptation.
CoRR, 2023

Incorporating Class-based Language Model for Named Entity Recognition in Factorized Neural Transducer.
CoRR, 2023

Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS.
CoRR, 2023

VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching.
CoRR, 2023

Unsupervised Active Learning: Optimizing Labeling Cost-Effectiveness for Automatic Speech Recognition.
CoRR, 2023

DSE-TTS: Dual Speaker Embedding for Cross-Lingual Text-to-Speech.
CoRR, 2023

Towards Effective and Compact Contextual Representation for Conformer Transducer Speech Recognition Systems.
CoRR, 2023

Pushing the Limits of Unsupervised Unit Discovery for SSL Speech Representation.
CoRR, 2023

Improving Code-Switching and Named Entity Recognition in ASR with Speech Editing based Data Augmentation.
CoRR, 2023

UniCATS: A Unified Context-Aware Text-to-Speech Framework with Contextual VQ-Diffusion and Vocoding.
CoRR, 2023

Blank-regularized CTC for Frame Skipping in Neural Transducer.
CoRR, 2023

DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Emodiff: Intensity Controllable Emotional Text-to-Speech with Soft-Label Guidance.
Proceedings of the IEEE International Conference on Acoustics, 2023

Factorized AED: Factorized Attention-Based Encoder-Decoder for Text-Only Domain Adaptive ASR.
Proceedings of the IEEE International Conference on Acoustics, 2023

LongFNT: Long-Form Speech Recognition with Factorized Neural Transducer.
Proceedings of the IEEE International Conference on Acoustics, 2023

Front-End Adapter: Adapting Front-End Input of Speech Based Self-Supervised Learning for Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2023

Improving Few-Shot Learning for Talking Face System with TTS Data Augmentation.
Proceedings of the IEEE International Conference on Acoustics, 2023

Fast-Hubert: an Efficient Training Framework for Self-Supervised Speech Representation Learning.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022
Exploring Effective Fusion Algorithms for Speech Based Self-Supervised Learning Models.
CoRR, 2022

MT4SSL: Boosting Self-Supervised Speech Representation Learning by Integrating Multiple Targets.
CoRR, 2022

Exploring Effective Distillation of Self-Supervised Speech Models for Automatic Speech Recognition.
CoRR, 2022

Internal Language Model Adaptation with Text-Only Data for End-to-End Speech Recognition.
Proceedings of the Interspeech 2022, 2022

VQTTS: High-Fidelity Text-to-Speech Synthesis with Self-Supervised VQ Acoustic Feature.
Proceedings of the Interspeech 2022, 2022

Factorized Neural Transducer for Efficient Language Model Adaptation.
Proceedings of the IEEE International Conference on Acoustics, 2022

2021
Internal Language Model Estimation for Domain-Adaptive End-to-End Speech Recognition.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Minimum Word Error Rate Training with Language Model Fusion for End-to-End Speech Recognition.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

Improving RNN-T for Domain Scaling Using Semi-Supervised Training with Neural TTS.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

Memory-Efficient Pipeline-Parallel DNN Training.
Proceedings of the 38th International Conference on Machine Learning, 2021

Internal Language Model Training for Domain-Adaptive End-To-End Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2021

Developing Real-Time Streaming Transformer Transducer for Speech Recognition on Large-Scale Dataset.
Proceedings of the IEEE International Conference on Acoustics, 2021

2020
LSTM-LM with Long-Term History for First-Pass Decoding in Conversational Speech Recognition.
CoRR, 2020

Low-bit Quantization of Recurrent Neural Network Language Models Using Alternating Direction Methods of Multipliers.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019
Exploiting Future Word Contexts in Neural Network Language Models for Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2019

Long-span language modeling for speech recognition.
CoRR, 2019

Recurrent Neural Network Language Model Training Using Natural Gradient.
Proceedings of the IEEE International Conference on Acoustics, 2019

Gaussian Process Lstm Recurrent Neural Network Language Models for Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2019

Investigation of Sampling Techniques for Maximum Entropy Language Modeling Training.
Proceedings of the IEEE International Conference on Acoustics, 2019

2018
Active Memory Networks for Language Modeling.
Proceedings of the Interspeech 2018, 2018

Neural Network Language Modeling with Letter-Based Features and Importance Sampling.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Phonetic and Graphemic Systems for Multi-Genre Broadcast Transcription.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Limited-Memory BFGS Optimization of Recurrent Neural Network Language Models for Speech Recognition.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

The Effect of Adding Authorship Knowledge in Automated Text Scoring.
Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications@NAACL-HLT 2018, 2018

2017
Future Word Contexts in Neural Network Language Models.
CoRR, 2017

Investigating Bidirectional Recurrent Neural Network Language Models for Speech Recognition.
Proceedings of the Interspeech 2017, 2017

Exploiting the Tibetan Radicals in Recurrent Neural Network for Low-Resource Language Models.
Proceedings of the Neural Information Processing - 24th International Conference, 2017

Recurrent neural network language models for keyword search.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Future word contexts in neural network language models.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

2016
Two Efficient Lattice Rescoring Methods Using Recurrent Neural Network Language Models.
IEEE ACM Trans. Audio Speech Lang. Process., 2016

Efficient Training and Evaluation of Recurrent Neural Network Language Models for Automatic Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2016

Multi-Language Neural Network Language Models.
Proceedings of the Interspeech 2016, 2016

CUED-RNNLM - An open-source toolkit for efficient training and evaluation of recurrent neural network language models.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2015
Recurrent neural network language model adaptation for multi-genre broadcast speech recognition.
Proceedings of the INTERSPEECH 2015, 2015

Paraphrastic recurrent neural network language models.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Robust excitation-based features for Automatic Speech Recognition.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Recurrent neural network language model training with noise contrastive estimation for speech recognition.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Improving the training and evaluation efficiency of recurrent neural network language models.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Investigation of back-off based interpolation between recurrent neural network and n-gram language models.
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

2014
Efficient GPU-based training of recurrent neural network language models using spliced sentence bunch.
Proceedings of the INTERSPEECH 2014, 2014

An initial investigation of long-term adaptation for meeting transcription.
Proceedings of the INTERSPEECH 2014, 2014

Impact of single-microphone dereverberation on DNN-based meeting transcription systems.
Proceedings of the IEEE International Conference on Acoustics, 2014

Efficient lattice rescoring using recurrent neural network language models.
Proceedings of the IEEE International Conference on Acoustics, 2014

2012
Pipelined Back-Propagation for Context-Dependent Deep Neural Networks.
Proceedings of the INTERSPEECH 2012, 2012

2011
Feature engineering in Context-Dependent Deep Neural Networks for conversational speech transcription.
Proceedings of the 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, 2011


  Loading...