Bowen Shi

Yangyang Shi

Vikas Chandra

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

Learning Fine-Grained Controllability on Speech Generation via Efficient Fine-Tuning.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

MusicFlow: Cascaded Flow Matching for Text Guided Music Generation.

[BibT_eX]

[DOI]

Triantafyllos Afouras

David Kant

Marcelo Sandoval-Castañeda

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Generative Pre-training for Speech with Flow Matching.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

M2BART: Multilingual and Multimodal Encoder-Decoder Pre-Training for Any-to-Any Machine Translation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

XLAVS-R: Cross-Lingual Audio-Visual Speech Representation Learning for Noise-Robust Speech Perception.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023

Audiobox: Unified Audio Generation with Natural Language Prompts.

[BibT_eX]

[DOI]

CoRR, 2023

TTIC's Submission to WMT-SLT 23.

[BibT_eX]

[DOI]

Gregory Shakhnarovich

Proceedings of the Eighth Conference on Machine Translation, 2023

Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Expresso: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Comparative Layer-Wise Analysis of Self-Supervised Speech Models.

[BibT_eX]

[DOI]

Ankita Pasad

Proceedings of the IEEE International Conference on Acoustics, 2023

ReVISE: Self-Supervised Speech Resynthesis with Visual Input for Universal and Generalized Speech Regeneration.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

ReVISE: Self-Supervised Speech Resynthesis with Visual Input for Universal and Generalized Speech Enhancement.

[BibT_eX]

[DOI]

CoRR, 2022

A Single Self-Supervised Model for Many Speech Modalities Enables Zero-Shot Modality Transfer.

[BibT_eX]

[DOI]

CoRR, 2022

TTIC's WMT-SLT 22 Sign Language Translation System.

[BibT_eX]

[DOI]

Diane Brentari

Gregory Shakhnarovich

Proceedings of the Seventh Conference on Machine Translation, 2022

u-HuBERT: Unified Mixed-Modal Speech Pretraining And Zero-Shot Transfer to Unlabeled Modality.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Learning Lip-Based Audio-Visual Speaker Embeddings with AV-HuBERT.

[BibT_eX]

[DOI]

Abdelrahman Mohamed

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Robust Self-Supervised Audio-Visual Speech Recognition.

[BibT_eX]

[DOI]

Abdelrahman Mohamed

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction.

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Learning Representations, 2022

Open-Domain Sign Language Translation Learned from Online Video.

[BibT_eX]

[DOI]

Diane Brentari

Gregory Shakhnarovich

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Searching for fingerspelled content in American Sign Language.

[BibT_eX]

[DOI]

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

2021

Whole-Word Segmental Speech Recognition with Acoustic Word Embeddings.

[BibT_eX]

[DOI]

Shane Settle

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Fingerspelling Detection in American Sign Language.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020

A Cross-Task Analysis of Text Span Representations.

[BibT_eX]

[DOI]

Proceedings of the 5th Workshop on Representation Learning for NLP, 2020

A Joint Framework for Audio Tagging and Weakly Supervised Acoustic Event Detection Using DenseNet with Global Average Pooling.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Few-Shot Acoustic Event Detection Via Meta Learning.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

Compression of Acoustic Event Detection Models with Low-rank Matrix Factorization and Quantization Training.

[BibT_eX]

[DOI]

CoRR, 2019

Compression of Acoustic Event Detection Models with Quantized Distillation.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

On the Contributions of Visual and Textual Supervision in Low-Resource Semantic Speech Retrieval.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Fingerspelling Recognition in the Wild With Iterative Visual Attention.

[BibT_eX]

[DOI]

Aurora Martinez Del Rio

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Semi-supervised Acoustic Event Detection Based on Tri-training.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

2018

American Sign Language Fingerspelling Recognition in the Wild.

[BibT_eX]

[DOI]

Aurora Martinez Del Rio

Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

2017

Multitask training with unlabeled data for end-to-end sign language fingerspelling recognition.

[BibT_eX]

[DOI]