Alexander H. Liu

Orcid: 0000-0003-1628-0855

Affiliations:

Massachusetts Institute of Technology (MIT), Computer Science and Artificial Intelligence Laboratory, Cambridge, MA, USA
National Taiwan University, Taipei, Taiwan (former)

According to our database¹, Alexander H. Liu authored at least 40 papers between 2018 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Bibliography

2025

Schrodinger Audio-Visual Editor: Object-Level Audiovisual Removal.

[BibT_eX]

[DOI]

Weihan Xu

Kan Jen Cheng

Koichi Saito

Muhammad Jehanzeb Mirza

Gopala Anumanchipalli

Paul Pu Liang

CoRR, December, 2025

Fugatto 1: Foundational Generative Audio Transformer Opus 1.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

UniWav: Towards Unified Pre-training for Speech Representation Learning and Generation.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Generative Speech Foundation Model Pretraining for High-Quality Speech Extraction and Restoration.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Full-Duplex-Bench: A Benchmark to Evaluate Full-Duplex Spoken Dialogue Models on Turn-taking Capabilities.

[BibT_eX]

[DOI]

Gopala Anumanchipalli

Alexander H. Liu

Hung-Yi Lee

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025

USAD: Universal Speech and Audio Representation via Distillation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025

SHuBERT: Self-Supervised Sign Language Representation Learning via Multi-Stream Cluster Prediction.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024

A Closer Look at Neural Codec Resynthesis: Bridging the Gap between Codec and Waveform Generation.

[BibT_eX]

[DOI]

CoRR, 2024

Towards audio language modeling - an overview.

[BibT_eX]

[DOI]

CoRR, 2024

Codec-Superb @ SLT 2024: A Lightweight Benchmark For Neural Audio Codec Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

ESPnet-Codec: Comprehensive Training and Evaluation of Neural Codecs For Audio, Music, and Speech.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

Generative Pre-training for Speech with Flow Matching.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Listen, Think, and Understand.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Revisiting Self-supervised Learning of Speech Representation from a Mutual Information Perspective.

[BibT_eX]

[DOI]

Alexander H. Liu

Sung-Lin Yeh

James R. Glass

Proceedings of the IEEE International Conference on Acoustics, 2024

Codec-SUPERB: An In-Depth Analysis of Sound Codec Models.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2024

2023

DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Self-supervised Fine-tuning for Improved Content Representations by Speaker-invariant Clustering.

[BibT_eX]

[DOI]

Heng-Jui Chang

Alexander H. Liu

James R. Glass

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Contrastive Audio-Visual Masked Autoencoder.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Joint Audio and Speech Understanding.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022

A Fully Integrated 1.7mW Attention-Based Automatic Speech Recognition Processor.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. II Express Briefs, 2022

UAVM: Towards Unifying Audio and Visual Models.

[BibT_eX]

[DOI]

IEEE Signal Process. Lett., 2022

UAVM: A Unified Model for Audio-Visual Learning.

[BibT_eX]

[DOI]

CoRR, 2022

Towards End-to-End Unsupervised Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Simple and Effective Unsupervised Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

On the Interplay between Sparsity, Naturalness, Intelligibility, and Prosody in Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Cross-Modal Discrete Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

2021

Improving Automatic Speech Recognition and Speech Translation via Word Embedding Prediction.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2021

Routing with Self-Attention for Multimodal Capsule Networks.

[BibT_eX]

[DOI]

CoRR, 2021

End-to-End Whispered Speech Recognition with Frequency-Weighted Approaches and Pseudo Whisper Pre-training.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

PARP: Prune, Adjust and Re-Prune for Self-Supervised Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Non-Autoregressive Predictive Coding for Learning Speech Representations from Local Dependencies.

[BibT_eX]

[DOI]

Alexander H. Liu

Yu-An Chung

James R. Glass

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Spoken Moments: Learning Joint Audio-Visual Representations From Video Descriptions.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020

End-to-end Whispered Speech Recognition with Frequency-weighted Approaches and Layer-wise Transfer Learning.

[BibT_eX]

[DOI]

CoRR, 2020

Semi-Supervised Learning for Multi-Speaker Text-to-Speech Synthesis Using Discrete Speech Representation.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Towards Unsupervised Speech Recognition and Synthesis with Quantized Speech Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Sequence-to-Sequence Automatic Speech Recognition with Word Embedding Regularization and Fused Decoding.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Worse WER, but Better BLEU? Leveraging Word Embedding as Intermediate in Multitask End-to-End Speech Translation.

[BibT_eX]

[DOI]

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

2019

Adversarial Training of End-to-end Speech Recognition Using a Criticizing Language Model.

[BibT_eX]

[DOI]

Alexander H. Liu

Hung-yi Lee

Lin-Shan Lee

Proceedings of the IEEE International Conference on Acoustics, 2019

Towards Scene Understanding: Unsupervised Monocular Depth Estimation With Semantic-Aware Representation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018

A Unified Feature Disentangler for Multi-Domain Image Translation and Manipulation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Alexander H. Liu

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...