Yu-An Chung

CoRR, March, 2026

2025

Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages.

[BibT_eX]

[DOI]

CoRR, November, 2025

SpidR: Learning Fast and Stable Linguistic Units for Spoken Language Models Without Supervision.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2025

DC-Spin: A Speaker-invariant Speech Tokenizer for Spoken Language Models.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

2024

COLLD: Contrastive Layer-to-Layer Distillation for Compressing Multilingual Pre-Trained Speech Encoders.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

2023

Seamless: Multilingual Expressive and Streaming Speech Translation.

[BibT_eX]

[DOI]

Loïc Barrault

Mariano Coria Meglioli

David Dale

Ning Dong

Mark Duppenthaler

Kaushik Ram Sadagopan

Gabriel Mejia Gonzalez

CoRR, 2023

SeamlessM4T-Massively Multilingual & Multimodal Machine Translation.

[BibT_eX]

[DOI]

Seamless Communication

Loïc Barrault

Mariano Coria Meglioli

David Dale

Ning Dong

Kaushik Ram Sadagopan

Gabriel Mejia Gonzalez

CoRR, 2023

UnitY: Two-pass Direct Speech-to-speech Translation with Discrete Units.

[BibT_eX]

[DOI]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Speech-to-Speech Translation for a Real-world Unwritten Language.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

2022

Self-Supervised Learning for Speech Processing

[BibT_eX]

[DOI]

PhD thesis, 2022

Autoregressive Predictive Coding: A Comprehensive Study.

[BibT_eX]

[DOI]

IEEE J. Sel. Top. Signal Process., 2022

SSAST: Self-Supervised Audio Spectrogram Transformer.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021

PSLA: Improving Audio Tagging With Pretraining, Sampling, Labeling, and Aggregation.

[BibT_eX]

[DOI]

Yuan Gong

IEEE ACM Trans. Audio Speech Lang. Process., 2021

SLAM: A Unified Encoder for Speech and Language Modeling via Speech-Text Joint Pre-Training.

[BibT_eX]

[DOI]

CoRR, 2021

PSLA: Improving Audio Event Classification with Pretraining, Sampling, Labeling, and Aggregation.

[BibT_eX]

[DOI]

Yuan Gong

CoRR, 2021

SPLAT: Speech-Language Joint Pre-Training for Spoken Language Understanding.

[BibT_eX]

[DOI]

Chenguang Zhu

Michael Zeng

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

Non-Autoregressive Predictive Coding for Learning Speech Representations from Local Dependencies.

[BibT_eX]

[DOI]

Alexander H. Liu

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

AST: Audio Spectrogram Transformer.

[BibT_eX]

[DOI]

Yuan Gong

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Similarity Analysis of Self-Supervised Speech Representations.

[BibT_eX]

[DOI]

Yonatan Belinkov

Proceedings of the IEEE International Conference on Acoustics, 2021

w2v-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-Training.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

2020

Semi-Supervised Speech-Language Joint Pre-Training for Spoken Language Understanding.

[BibT_eX]

[DOI]

Chenguang Zhu

Michael Zeng

CoRR, 2020

Clinical Text Summarization with Syntax-Based Negation and Semantic Concept Identification.

[BibT_eX]

[DOI]

Wei-Hung Weng

Schrasing Tong

CoRR, 2020

Cost-Sensitive Deep Learning with Layer-Wise Cost Estimation.

[BibT_eX]

[DOI]

Shao-Wen Yang

Hsuan-Tien Lin

Proceedings of the International Conference on Technologies and Applications of Artificial Intelligence, 2020

Vector-Quantized Autoregressive Predictive Coding.

[BibT_eX]

[DOI]

Hao Tang

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Generative Pre-Training for Speech with Autoregressive Predictive Coding.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Improved Speech Representations with Multi-Target Autoregressive Predictive Coding.

[BibT_eX]

[DOI]

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

2019

SummAE: Zero-Shot Abstractive Text Summarization using Length-Agnostic Auto-Encoders.

[BibT_eX]

[DOI]

Peter J. Liu

Jie Ren

CoRR, 2019

Towards Transfer Learning for End-to-End Speech Synthesis from Deep Pre-Trained Language Models.

[BibT_eX]

[DOI]

Wei Fang

CoRR, 2019

Unsupervised Clinical Language Translation.

[BibT_eX]

[DOI]

Wei-Hung Weng

Peter Szolovits

Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2019

An Unsupervised Autoregressive Model for Speech Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Disentangling Correlated Speaker and Noise for Speech Synthesis via Data Augmentation and Adversarial Factorization.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Towards Unsupervised Speech-to-text Translation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Semi-supervised Training for Improving Data Efficiency in End-to-end Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

2018

Unsupervised Cross-Modal Alignment of Speech and Text Embedding Spaces.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Supervised and Unsupervised Transfer Learning for Question Answering.

[BibT_eX]

[DOI]

Hung-yi Lee

Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018

Speech2Vec: A Sequence-to-Sequence Framework for Learning Word Embeddings from Speech.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

2017

Learning Deep Representations of Medical Images using Siamese CNNs with Application to Content-Based Image Retrieval.

[BibT_eX]

[DOI]

Wei-Hung Weng

CoRR, 2017

Learning Word Embeddings from Speech.

[BibT_eX]

[DOI]

CoRR, 2017

libact: Pool-based Active Learning in Python.

[BibT_eX]

[DOI]

CoRR, 2017

2016

Cost-Sensitive Deep Learning with Layer-Wise Cost Estimation.

[BibT_eX]

[DOI]

Hsuan-Tien Lin

CoRR, 2016

Audio Word2Vec: Unsupervised Learning of Audio Segment Representations Using Sequence-to-Sequence Autoencoder.

[BibT_eX]

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Cost-Aware Pre-Training for Multiclass Cost-Sensitive Deep Learning.

[BibT_eX]

[DOI]