Hao Tang

CoRR, May, 2026

2025

Speech-FT: A Fine-tuning Strategy for Enhancing Speech Representation Models Without Compromising Generalization Ability.

[BibT_eX]

[DOI]

CoRR, February, 2025

Effective Context in Neural Speech Models.

[BibT_eX]

[DOI]

Yen Meng

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Whisper Has an Internal Word Aligner.

[BibT_eX]

[DOI]

Yen Meng

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025

Is Smaller Always Faster? Tradeoffs in Compressing Self-Supervised Speech Transformers.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025

Identifying Speaker Information in Feed-Forward Layers of Self-Supervised Speech Transformers.

[BibT_eX]

[DOI]

Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2025

2024

Estimating the Completeness of Discrete Speech Units.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

A Simple HMM with Self-Supervised Representations for Phone Segmentation.

[BibT_eX]

[DOI]

Gene-Ping Yang

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

Property Neurons in Self-Supervised Speech Transformers.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

Orthogonality and isotropy of speaker and phonetic information in self-supervised speech representations.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

DAISY: Data Adaptive Self-Supervised Early Exit for Speech Representation Models.

[BibT_eX]

[DOI]

Tzu-Quan Lin

Hung-yi Lee

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

A predictive learning model can simulate temporal dynamics and context effects found in neural representations of continuous speech.

[BibT_eX]

[DOI]

Proceedings of the 46th Annual Meeting of the Cognitive Science Society, 2024

2023

Improving Seq2Seq TTS Frontends With Transcribed Speech Audio.

[BibT_eX]

[DOI]

Siqi Sun

Korin Richmond

IEEE ACM Trans. Audio Speech Lang. Process., 2023

Acoustic Word Embeddings for Untranscribed Target Languages with Continued Pretraining and Learned Pooling.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Self-supervised Predictive Coding Models Encode Speaker and Phonetic Information in Orthogonal Subspaces.

[BibT_eX]

[DOI]

Oli Danyi Liu

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Conditioning and Sampling in Variational Diffusion Models for Speech Super-Resolution.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Learning Dependencies of Discrete Speech Representations with Neural Hidden Markov Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Analyzing Acoustic Word Embeddings from Pre-Trained Self-Supervised Speech Models.

[BibT_eX]

[DOI]

Ramon Sanabria

Proceedings of the IEEE International Conference on Acoustics, 2023

Towards Matching Phones and Speech Representations.

[BibT_eX]

[DOI]

Gene-Ping Yang

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

MelHuBERT: A Simplified Hubert on Mel Spectrograms.

[BibT_eX]

[DOI]

Tzu-Quan Lin

Hung-Yi Lee

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022

Autoregressive Predictive Coding: A Comprehensive Study.

[BibT_eX]

[DOI]

IEEE J. Sel. Top. Signal Process., 2022

Compressing Transformer-based self-supervised models for speech processing.

[BibT_eX]

[DOI]

CoRR, 2022

MelHuBERT: A simplified HuBERT on Mel spectrogram.

[BibT_eX]

[DOI]

Tzu-Quan Lin

Hung-yi Lee

CoRR, 2022

Autoregressive Co-Training for Learning Discrete Speech Representations.

[BibT_eX]

[DOI]

CoRR, 2022

On Compressing Sequences for Self-Supervised Speech Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Autoregressive Co-Training for Learning Discrete Speech Representation.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Phonetic Analysis of Self-supervised Representations of English Speech.

[BibT_eX]

[DOI]

Dan Wells

Korin Richmond

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Supervised Attention in Sequence-to-Sequence Models for Speech Recognition.

[BibT_eX]

[DOI]

Gene-Ping Yang

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

On the Difficulty of Segmenting Words with Attention.

[BibT_eX]

[DOI]

Ramon Sanabria

CoRR, 2021

2020

Vector-Quantized Autoregressive Predictive Coding.

[BibT_eX]

[DOI]

Yu-An Chung

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Audio-Visual Calibration with Polynomial Regression for 2-D Projection Using SVD-PHAT.

[BibT_eX]

[DOI]

François Grondin

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

Time-Contrastive Learning Based Deep Bottleneck Features for Text-Dependent Speaker Verification.

[BibT_eX]

[DOI]

Achintya Kumar Sarkar

IEEE ACM Trans. Audio Speech Lang. Process., 2019

VoiceID Loss: Speech Enhancement for Speaker Verification.

[BibT_eX]

[DOI]

Suwon Shon

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

A Deep Residual Network for Large-Scale Acoustic Scene Analysis.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

An Unsupervised Autoregressive Model for Speech Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

2018

On The Inductive Bias of Words in Acoustics-to-Word Models.

[BibT_eX]

[DOI]

CoRR, 2018

On Training Recurrent Networks with Truncated Backpropagation Through time in Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Frame-Level Speaker Embeddings for Text-Independent Speaker Recognition and Analysis of End-to-End Model.

[BibT_eX]

[DOI]

Suwon Shon

Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

A Study of Enhancement, Augmentation and Autoencoder Methods for Domain Adaptation in Distant Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Unsupervised Adaptation with Interpretable Disentangled Representations for Distant Conversational Speech Recognition.

[BibT_eX]

[DOI]

Wei-Ning Hsu

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

2017

ASR for Under-Resourced Languages From Probabilistic Transcription.

[BibT_eX]

[DOI]

Mark A. Hasegawa-Johnson

Preethi Jyothi

Daniel McCloy

Majid Mirbagheri

Giovanni M. Di Liberto

IEEE ACM Trans. Audio Speech Lang. Process., 2017

End-to-End Neural Segmental Models for Speech Recognition.

[BibT_eX]

[DOI]

IEEE J. Sel. Top. Signal Process., 2017

Lexicon-free fingerspelling recognition from video: Data, models, and signer adaptation.

[BibT_eX]

[DOI]

Gregory Shakhnarovich

Diane Brentari

Comput. Speech Lang., 2017

Sequence Prediction with Neural Segmental Models.

[BibT_eX]

[DOI]

CoRR, 2017

Multitask Learning with Low-Level Auxiliary Tasks for Encoder-Decoder Based Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

2016

End-to-end training approaches for discriminative segmental models.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE Spoken Language Technology Workshop, 2016

Triphone State-Tying via Deep Canonical Correlation Analysis.

[BibT_eX]

[DOI]

Weiran Wang

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Efficient Segmental Cascades for Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Adapting ASR for under-resourced languages using mismatched transcriptions.

[BibT_eX]

[DOI]

Mark Hasegawa-Johnson

Sanjeev Khudanpur

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Signer-independent fingerspelling recognition with deep neural network adaptation.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2015

Discriminative segmental cascades for feature-rich phone recognition.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

2014

A comparison of training approaches for discriminative segmental models.

[BibT_eX]

[DOI]

Kevin Gimpel

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Log-linear dialog manager.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2014

2012

Discriminative Pronunciation Modeling: A Large-Margin, Feature-Rich Approach.

[BibT_eX]

[DOI]

Joseph Keshet

Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, July 8-14, 2012, Jeju Island, Korea, 2012

2010

An initial attempt for phoneme recognition using Structured Support Vector Machine (SVM).

[BibT_eX]

[DOI]