David F. Harwath

Wei-Ning Hsu

Proceedings of the 8th International Conference on Learning Representations, 2020

Trilingual Semantic Embeddings of Visually Grounded Speech with Self-Attention Mechanisms.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

Transfer Learning from Audio-Visual Grounding to Speech Recognition.

[BibT_eX]

[DOI]

Wei-Ning Hsu

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Towards Bilingual Lexicon Discovery From Visually Grounded Speech Audio.

[BibT_eX]

[DOI]

Emmanuel Azuh

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Towards Visually Grounded Sub-word Speech Unit Discovery.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Learning Words by Drawing Images.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Grounding Spoken Words in Unlabeled Video.

[BibT_eX]

[DOI]

Rogério Schmidt Feris

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019

2018

Learning spoken language through vision.

[BibT_eX]

[DOI]

David Frank Harwath

PhD thesis, 2018

Vision as an Interlingua: Learning Multilingual Semantic Embeddings of Untranscribed Speech.

[BibT_eX]

[DOI]

Galen Chuang

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Jointly Discovering Visual Objects and Spoken Words from Raw Sensory Input.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2018, 2018

2017

Learning modality-invariant representations for speech and images.

[BibT_eX]

[DOI]

Kenneth Leidal

Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

Learning Word-Like Units from Joint Audio-Visual Analysis.

[BibT_eX]

[DOI]

Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017

2016

On the Use of Acoustic Unit Discovery for Language Recognition.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2016

Look, listen, and decode: Multimodal speech recognition with images.

[BibT_eX]

[DOI]

Felix Sun

Proceedings of the 2016 IEEE Spoken Language Technology Workshop, 2016

Unsupervised Learning of Spoken Language with Visual Context.

[BibT_eX]

[DOI]

Antonio Torralba

Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

2015

Deep multimodal semantic embeddings for speech and images.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

2014

Choosing useful word alternates for automatic speech recognition correction interfaces.

[BibT_eX]

[DOI]

Alexander Gruenstein

Ian McGraw

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Speech recognition without a lexicon - bridging the gap between graphemic and phonetic systems.

[BibT_eX]

[DOI]

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

2013

A summary of the 2012 JHU CLSP workshop on zero resource speech technologies and models of early language acquisition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2013

Zero resource spoken audio corpus analysis.

[BibT_eX]

[DOI]

Timothy J. Hazen

Proceedings of the IEEE International Conference on Acoustics, 2013

2012

Topic identification based extrinsic evaluation of summarization techniques applied to conversational speech.

[BibT_eX]

[DOI]