Hao Tang

Orcid: 0000-0002-2445-2605

Affiliations:
  • Massachusetts Institute of Technology, Computer Science and Artificial Intelligence Laboratory, Cambridge, MA, USA
  • Toyota Technological Institute at Chicago, IL, USA (PhD 2017)
  • National Taiwan University, Taipei, Taiwan (former)


According to our database1, Hao Tang authored at least 39 papers between 2009 and 2023.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2023
Improving Seq2Seq TTS Frontends With Transcribed Speech Audio.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

Conditioning and Sampling in Variational Diffusion Models for Speech Super-Resolution.
Proceedings of the IEEE International Conference on Acoustics, 2023

Towards Matching Phones and Speech Representations.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

MelHuBERT: A Simplified Hubert on Mel Spectrograms.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022
Autoregressive Predictive Coding: A Comprehensive Study.
IEEE J. Sel. Top. Signal Process., 2022

Compressing Transformer-based self-supervised models for speech processing.
CoRR, 2022

MelHuBERT: A simplified HuBERT on Mel spectrogram.
CoRR, 2022

Learning Dependencies of Discrete Speech Representations with Neural Hidden Markov Models.
CoRR, 2022

Autoregressive Co-Training for Learning Discrete Speech Representations.
CoRR, 2022

On Compressing Sequences for Self-Supervised Speech Models.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Autoregressive Co-Training for Learning Discrete Speech Representation.
Proceedings of the Interspeech 2022, 2022

Supervised Attention in Sequence-to-Sequence Models for Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2022

2020
Vector-Quantized Autoregressive Predictive Coding.
Proceedings of the Interspeech 2020, 2020

Audio-Visual Calibration with Polynomial Regression for 2-D Projection Using SVD-PHAT.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019
Time-Contrastive Learning Based Deep Bottleneck Features for Text-Dependent Speaker Verification.
IEEE ACM Trans. Audio Speech Lang. Process., 2019

VoiceID Loss: Speech Enhancement for Speaker Verification.
Proceedings of the Interspeech 2019, 2019

A Deep Residual Network for Large-Scale Acoustic Scene Analysis.
Proceedings of the Interspeech 2019, 2019

An Unsupervised Autoregressive Model for Speech Representation Learning.
Proceedings of the Interspeech 2019, 2019

2018
On The Inductive Bias of Words in Acoustics-to-Word Models.
CoRR, 2018

On Training Recurrent Networks with Truncated Backpropagation Through time in Speech Recognition.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Frame-Level Speaker Embeddings for Text-Independent Speaker Recognition and Analysis of End-to-End Model.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

A Study of Enhancement, Augmentation and Autoencoder Methods for Domain Adaptation in Distant Speech Recognition.
Proceedings of the Interspeech 2018, 2018

Unsupervised Adaptation with Interpretable Disentangled Representations for Distant Conversational Speech Recognition.
Proceedings of the Interspeech 2018, 2018

2017
ASR for Under-Resourced Languages From Probabilistic Transcription.
IEEE ACM Trans. Audio Speech Lang. Process., 2017

End-to-End Neural Segmental Models for Speech Recognition.
IEEE J. Sel. Top. Signal Process., 2017

Lexicon-free fingerspelling recognition from video: Data, models, and signer adaptation.
Comput. Speech Lang., 2017

Sequence Prediction with Neural Segmental Models.
CoRR, 2017

Multitask Learning with Low-Level Auxiliary Tasks for Encoder-Decoder Based Speech Recognition.
Proceedings of the Interspeech 2017, 2017

2016
End-to-end training approaches for discriminative segmental models.
Proceedings of the 2016 IEEE Spoken Language Technology Workshop, 2016

Triphone State-Tying via Deep Canonical Correlation Analysis.
Proceedings of the Interspeech 2016, 2016

Efficient Segmental Cascades for Speech Recognition.
Proceedings of the Interspeech 2016, 2016

Adapting ASR for under-resourced languages using mismatched transcriptions.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Signer-independent fingerspelling recognition with deep neural network adaptation.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2015
Discriminative segmental cascades for feature-rich phone recognition.
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

2014
A comparison of training approaches for discriminative segmental models.
Proceedings of the INTERSPEECH 2014, 2014

Log-linear dialog manager.
Proceedings of the IEEE International Conference on Acoustics, 2014

2012
Discriminative Pronunciation Modeling: A Large-Margin, Feature-Rich Approach.
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, July 8-14, 2012, Jeju Island, Korea, 2012

2010
An initial attempt for phoneme recognition using Structured Support Vector Machine (SVM).
Proceedings of the IEEE International Conference on Acoustics, 2010

2009
Spoken term detection from bilingual spontaneous speech using code-switched lattice-based structures for words and subword units.
Proceedings of the 2009 IEEE Workshop on Automatic Speech Recognition & Understanding, 2009


  Loading...