Leda Sari

Venkatesh Ravichandran

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

CJST: CTC Compressor based Joint Speech and Text Training for Decoder-Only ASR.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

2024

Frozen Large Language Models Can Perceive Paralinguistic Aspects of Speech.

[BibT_eX]

[DOI]

CoRR, 2024

Dynamic ASR Pathways: An Adaptive Masking Approach Towards Efficient Pruning of a Multilingual ASR Model.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

2023

Augmenting text for spoken language understanding with Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

Towards Selection of Text-to-speech Data to Augment ASR Training.

[BibT_eX]

[DOI]

CoRR, 2023

Synthetic Cross-accent Data Augmentation for Automatic Speech Recognition.

[BibT_eX]

[DOI]

CoRR, 2023

Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Biased Self-supervised Learning for ASR.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Self-Supervised Representations for Singing Voice Conversion.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

2022

Seamless equal accuracy ratio for inclusive CTC speech recognition.

[BibT_eX]

[DOI]

Santhosh Kumar Ramakrishnan

Chang D. Yoo

Speech Commun., 2022

Towards Measuring Fairness in Speech Recognition: Casual Conversations Dataset Transcriptions.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Ego4D: Around the World in 3, 000 Hours of Egocentric Video.

[BibT_eX]

[DOI]

Christoph Feichtenhofer

Kiran K. Somasundaram

Giovanni Maria Farinella

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021

Learning speech embeddings for speaker adaptation and speech understanding

[BibT_eX]

[DOI]

PhD thesis, 2021

Counterfactually Fair Automatic Speech Recognition.

[BibT_eX]

[DOI]

Chang D. Yoo

IEEE ACM Trans. Audio Speech Lang. Process., 2021

Auxiliary Networks for Joint Speaker Adaptation and Speaker Change Detection.

[BibT_eX]

[DOI]

Santhosh Kumar Ramakrishnan

IEEE ACM Trans. Audio Speech Lang. Process., 2021

Ego4D: Around the World in 3, 000 Hours of Egocentric Video.

[BibT_eX]

[DOI]

Christoph Feichtenhofer

Kiran K. Somasundaram

Giovanni Maria Farinella

CoRR, 2021

Worldly Wise (WoW) - Cross-Lingual Knowledge Fusion for Fact-based Visual Spoken-Question Answering.

[BibT_eX]

[DOI]

Kiran Ramnath

Chang D. Yoo

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

A Multi-View Approach to Audio-Visual Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

2020

Identify Speakers in Cocktail Parties with End-to-End Attention.

[BibT_eX]

[DOI]

Junzhe Zhu

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Deep F-Measure Maximization for End-to-End Speech Understanding.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Unsupervised Speaker Adaptation Using Attention-Based Speaker Memory for End-to-End ASR.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Training Spoken Language Understanding Systems with Non-Parallel Speech and Text.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

Learning Speaker Aware Offsets for Speaker Adaptation of Neural Networks.

[BibT_eX]

[DOI]

Mark A. Hasegawa-Johnson

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Pre-training of Speaker Embeddings for Low-latency Speaker Change Detection in Broadcast News.

[BibT_eX]

[DOI]

Michael Picheny

Proceedings of the IEEE International Conference on Acoustics, 2019

2018

Speaker Adaptive Audio-Visual Fusion for the Open-Vocabulary Section of AVICAR.

[BibT_eX]

[DOI]

Kumaran S

Georg Stemmer

Krishnakumar N. Nair

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

2016

Score normalization for keyword search.

[BibT_eX]

[DOI]

Murat Saraclar

Proceedings of the 24th Signal Processing and Communication Application Conference, 2016

Template-based Keyword Search with pseudo posteriorgrams.

[BibT_eX]

[DOI]

Proceedings of the 24th Signal Processing and Communication Application Conference, 2016

2015

Discriminative training of the keyword search confusion model.

[BibT_eX]

[DOI]

Murat Saraclar

Proceedings of the 2015 23nd Signal Processing and Communications Applications Conference (SIU), 2015

Posteriorgram based approaches in keyword search.

[BibT_eX]

[DOI]

Batuhan Gündogdu

Murat Saraclar

Proceedings of the 2015 23nd Signal Processing and Communications Applications Conference (SIU), 2015

Fusion of LVCSR and posteriorgram based keyword search.

[BibT_eX]

[DOI]

Batuhan Gündogdu

Murat Saraçlar

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

2014

Texture Defect Detection Using Independent Vector Analysis in Wavelet Domain.

[BibT_eX]

[DOI]