Xuankai Chang

Orcid: 0000-0002-5221-5412

According to our database1, Xuankai Chang authored at least 71 papers between 2016 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages.
CoRR, 2024

OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer.
CoRR, 2024

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Software Design and User Interface of ESPnet-SE++: Speech Enhancement for Robust Speech Processing.
J. Open Source Softw., November, 2023

Software Design and User Interface of ESPnet-SE++: Speech Enhancement for Robust Speech Processing (espnet-v.202310).
Dataset, October, 2023

Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation over More Languages and Beyond.
CoRR, 2023

HuBERTopic: Enhancing Semantic Representation of HuBERT through Self-supervision Utilizing Topic Model.
CoRR, 2023

UniAudio: An Audio Foundation Model Toward Universal Audio Generation.
CoRR, 2023

Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Supervision, and LLM Mix-up Augmentation.
CoRR, 2023

Cross-Modal Multi-Tasking for Speech-to-Text Translation via Hard Parameter Sharing.
CoRR, 2023

Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study.
CoRR, 2023

Voxtlm: unified decoder-only models for consolidating speech recognition/synthesis and speech/text continuation tasks.
CoRR, 2023

TokenSplit: Using Discrete Speech Representations for Direct, Refined, and Transcript-Conditioned Speech Separation and Recognition.
CoRR, 2023

The CHiME-7 DASR Challenge: Distant Meeting Transcription with Multiple Devices in Diverse Scenarios.
CoRR, 2023

Reducing Barriers to Self-Supervised Learning: HuBERT Pre-training with Academic Compute.
CoRR, 2023

Exploration of Efficient End-to-End ASR using Discretized Input from Self-Supervised Learning.
CoRR, 2023

A New Benchmark of Aphasia Speech Recognition and Detection Based on E-Branchformer and Multi-task Learning.
CoRR, 2023

ML-SUPERB: Multilingual Speech Universal PERformance Benchmark.
CoRR, 2023

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head.
CoRR, 2023

Improving Perceptual Quality, Intelligibility, and Acoustics on VoIP Platforms.
CoRR, 2023

Exploring the Integration of Speech Separation and Recognition with Self-Supervised Learning Representation.
Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2023

Fully Unsupervised Topic Clustering of Unlabelled Spoken Audio Using Self-Supervised Representation Learning and Topic Model.
Proceedings of the IEEE International Conference on Acoustics, 2023

FindAdaptNet: Find and Insert Adapters by Learned Layer Importance.
Proceedings of the IEEE International Conference on Acoustics, 2023

Findings of the 2023 ML-Superb Challenge: Pre-Training And Evaluation Over More Languages And Beyond.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Reproducing Whisper-Style Training Using An Open-Source Toolkit And Publicly Available Data.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

LV-CTC: Non-Autoregressive ASR With CTC and Latent Variable Models.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Joint Prediction and Denoising for Large-Scale Multilingual Self-Supervised Learning.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022
End-to-End Dereverberation, Beamforming, and Speech Recognition in a Cocktail Party.
IEEE ACM Trans. Audio Speech Lang. Process., 2022

Train from scratch: Single-stage joint training of speech separation and recognition.
Comput. Speech Lang., 2022

Muskits: an End-to-End Music Processing Toolkit for Singing Voice Synthesis.
CoRR, 2022

End-to-End Multi-Speaker ASR with Independent Vector Analysis.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

A Study on the Integration of Pre-Trained SSL, ASR, LM and SLU Models for Spoken Language Understanding.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

End-to-End Integration of Speech Recognition, Dereverberation, Beamforming, and Self-Supervised Learning Representation.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Superb @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Muskits: an End-to-end Music Processing Toolkit for Singing Voice Synthesis.
Proceedings of the Interspeech 2022, 2022

ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and Understanding.
Proceedings of the Interspeech 2022, 2022

End-to-End Integration of Speech Recognition, Speech Enhancement, and Self-Supervised Learning Representation.
Proceedings of the Interspeech 2022, 2022

Two-Pass Low Latency End-to-End Spoken Language Understanding.
Proceedings of the Interspeech 2022, 2022

Joint Speech Recognition and Audio Captioning.
Proceedings of the IEEE International Conference on Acoustics, 2022

An Exploration of Hubert with Large Number of Cluster Units and Model Assessment Using Bayesian Information Criterion.
Proceedings of the IEEE International Conference on Acoustics, 2022

Towards Low-Distortion Multi-Channel Speech Enhancement: The ESPNET-Se Submission to the L3DAS22 Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2022

Extended Graph Temporal Classification for Multi-Speaker End-to-End ASR.
Proceedings of the IEEE International Conference on Acoustics, 2022

ESPnet-SLU: Advancing Spoken Language Understanding Through ESPnet.
Proceedings of the IEEE International Conference on Acoustics, 2022

SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

2021
Discretization and Re-synthesis: an alternative method to solve the Cocktail Party Problem.
CoRR, 2021

ESPnet-SE: End-To-End Speech Enhancement and Separation Toolkit Designed for ASR Integration.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Investigation of End-to-End Speaker-Attributed ASR for Continuous Multi-Talker Recordings.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

SUPERB: Speech Processing Universal PERformance Benchmark.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

Streaming End-to-End ASR Based on Blockwise Non-Autoregressive Models.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

Speech Representation Learning Combining Conformer CPC with Deep Cluster for the ZeroSpeech Challenge 2021.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

Multi-Speaker ASR Combining Non-Autoregressive Conformer CTC and Conditional Speaker Chain.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

Recent Developments on Espnet Toolkit Boosted By Conformer.
Proceedings of the IEEE International Conference on Acoustics, 2021

Hypothesis Stitcher for End-to-End Speaker-Attributed ASR on Long-Form Multi-Talker Recordings.
Proceedings of the IEEE International Conference on Acoustics, 2021

An Exploration of Self-Supervised Pretrained Representations for End-to-End Speech Recognition.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

2020
Improving End-to-End Single-Channel Multi-Talker Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2020

The 2020 ESPnet update: new features, broadened applications, performance improvements, and future plans.
CoRR, 2020

Sequence to Multi-Sequence Learning via Conditional Chain Mapping for Mixture Signals.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

End-to-End Far-Field Speech Recognition with Unified Dereverberation and Beamforming.
Proceedings of the Interspeech 2020, 2020

Insertion-Based Modeling for End-to-End Automatic Speech Recognition.
Proceedings of the Interspeech 2020, 2020

End-to-End ASR with Adaptive Span Self-Attention.
Proceedings of the Interspeech 2020, 2020

End-To-End Multi-Speaker Speech Recognition With Transformer.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019
Knowledge Distillation for End-to-End Monaural Multi-Talker ASR System.
Proceedings of the Interspeech 2019, 2019

End-to-end Monaural Multi-speaker ASR System without Pretraining.
Proceedings of the IEEE International Conference on Acoustics, 2019

MIMO-Speech: End-to-End Multi-Channel Multi-Speaker Speech Recognition.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

2018
Single-channel multi-talker speech recognition with permutation invariant training.
Speech Commun., 2018

Erratum to: Past review, current progress, and challenges ahead on the cocktail party problem.
Frontiers Inf. Technol. Electron. Eng., 2018

Past review, current progress, and challenges ahead on the cocktail party problem.
Frontiers Inf. Technol. Electron. Eng., 2018

Monaural Multi-Talker Speech Recognition with Attention Mechanism and Gated Convolutional Networks.
Proceedings of the Interspeech 2018, 2018

Adaptive Permutation Invariant Training with Auxiliary Information for Monaural Multi-Talker Speech Recognition.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017
Recognizing Multi-Talker Speech with Permutation Invariant Training.
Proceedings of the Interspeech 2017, 2017

2016
Unrestricted Vocabulary Keyword Spotting Using LSTM-CTC.
Proceedings of the Interspeech 2016, 2016


  Loading...