Aswin Shanmugam Subramanian

CoRR, June, 2025

PHRASED: Phrase Dictionary Biasing for Speech Translation.

[BibT_eX]

[DOI]

Jinyu Li

CoRR, June, 2025

Streaming Speaker Change Detection and Gender Classification for Transducer-Based Multi-Talker Speech Translation.

[BibT_eX]

[DOI]

CoRR, February, 2025

Length Aware Speech Translation for Video Dubbing.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Improving Practical Aspects of End-to-End Multi-Talker Speech Recognition for Online and Offline Scenarios.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

2024

TS-SEP: Joint Diarization and Separation Conditioned on Estimated Speaker Embeddings.

[BibT_eX]

[DOI]

Christoph Böddeker

Reinhold Haeb-Umbach

IEEE ACM Trans. Audio Speech Lang. Process., 2024

Soft Language Identification for Language-Agnostic Many-to-One End-to-End Speech Translation.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Late Audio-Visual Fusion for in-the-Wild Speaker Diarization.

[BibT_eX]

[DOI]

Zexu Pan

François G. Germain

Proceedings of the IEEE International Conference on Acoustics, 2024

2023

Tackling the Cocktail Fork Problem for Separation and Transcription of Real-World Soundtracks.

[BibT_eX]

[DOI]

Darius Petermann

Zhong-Qiu Wang

IEEE ACM Trans. Audio Speech Lang. Process., 2023

Hyperbolic Audio Source Separation.

[BibT_eX]

[DOI]

Darius Petermann

Proceedings of the IEEE International Conference on Acoustics, 2023

Reverberation as Supervision For Speech Separation.

[BibT_eX]

[DOI]

Rohith Aralikatti

Christoph Böddeker

Proceedings of the IEEE International Conference on Acoustics, 2023

2022

Deep learning based multi-source localization with source splitting and its effectiveness in multi-talker speech recognition.

[BibT_eX]

[DOI]

Comput. Speech Lang., 2022

Towards End-to-end Speaker Diarization in the Wild.

[BibT_eX]

[DOI]

Zexu Pan

François G. Germain

CoRR, 2022

Heterogeneous Target Speech Separation.

[BibT_eX]

[DOI]

Efthymios Tzinis

Paris Smaragdis

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Improved Domain Generalization via Disentangled Multi-Task Learning in Unsupervised Anomalous Sound Detection.

[BibT_eX]

[DOI]

Satvik Venkatesh

Proceedings of the 7th Workshop on Detection and Classification of Acoustic Scenes and Events 2022, 2022

2021

ESPnet-SE: End-To-End Speech Enhancement and Separation Toolkit Designed for ASR Integration.

[BibT_eX]

[DOI]

Chenda Li

Jing Shi

Wangyou Zhang

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Directional ASR: A New Paradigm for E2E Multi-Speaker Speech Recognition with Source Localization.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

An Exploration of Self-Supervised Pretrained Representations for End-to-End Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

2020

Significance of spectral cues in automatic speech segmentation for Indian language speech synthesizers.

[BibT_eX]

[DOI]

Arun Baby

Jeena J. Prakash

Speech Commun., 2020

The 2020 ESPnet update: new features, broadened applications, performance improvements, and future plans.

[BibT_eX]

[DOI]

Wangyou Zhang

CoRR, 2020

The JHU Multi-Microphone Multi-Speaker ASR System for the CHiME-6 Challenge.

[BibT_eX]

[DOI]

Ashish Arora

Desh Raj

CoRR, 2020

End-to-End Far-Field Speech Recognition with Unified Dereverberation and Beamforming.

[BibT_eX]

[DOI]

Wangyou Zhang

Xuankai Chang

Yanmin Qian

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

End-to-End ASR with Adaptive Span Self-Attention.

[BibT_eX]

[DOI]

Xuankai Chang

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Far-Field Location Guided Target Speech Extraction Using End-to-End Speech Recognition Objectives.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Attention-Based ASR with Lightweight and Dynamic Convolutions.

[BibT_eX]

[DOI]

Yuya Fujita

Motoi Omachi

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

Dry, Focus, and Transcribe: End-to-End Integration of Dereverberation, Beamforming, and ASR.

[BibT_eX]

[DOI]

CoRR, 2019

Generalized Weighted-Prediction-Error Dereverberation with Varying Source Priors For Reverberant Speech Recognition.

[BibT_eX]

[DOI]

Toru Taniguchi

Proceedings of the 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2019

Speech Enhancement Using End-to-End Speech Recognition Objectives.

[BibT_eX]

[DOI]

Xiaofei Wang

Murali Karthick Baskar

Proceedings of the 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2019

2018

Student-Teacher Learning for BLSTM Mask-based Speech Enhancement.

[BibT_eX]

[DOI]

Szu-Jui Chen

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Building State-of-the-art Distant Speech Recognition Using the CHiME-4 Challenge with a Setup of Speech Enhancement Baseline.

[BibT_eX]

[DOI]

Szu-Jui Chen

Hainan Xu

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

2017

TBT (Toolkit to Build TTS): A High Performance Framework to Build Multiple Language HTS Voice.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

2016

Significance of Pseudo-syllables in building better acoustic models for Indian English TTS.

[BibT_eX]

[DOI]

S. Rupak Vignesh

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2015

Building speech synthesis systems for Indian languages.

[BibT_eX]

[DOI]

Proceedings of the Twenty First National Conference on Communications, 2015

Blizzard Challenge 2015 : Submission by DONLab, IIT Madras.

[BibT_eX]

[DOI]

Anusha Prakash

Arun Baby

Rupak Vignesh Swaminathan

Jeena J. Prakash

N. L. Nishanthi

Raghava Krishanan

Proceedings of the Blizzard Challenge 2015, 2015

2014

Group delay based phone segmentation for HTS.

[BibT_eX]

[DOI]

Proceedings of the Twentieth National Conference on Communications, 2014

A hybrid approach to segmentation of speech using group delay processing and HMM based embedded reestimation.

[BibT_eX]

[DOI]

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

IIT Madras's Submission to the Blizzard Challenge 2014.

[BibT_eX]

[DOI]

Raghava Krishnan

Anusha Prakash

G. R. Kasthuri

Proceedings of the Blizzard Challenge 2014, Singapore, Singapore, September 19, 2014, 2014

2013

A common attribute based unified HTS framework for speech synthesis in Indian languages.

[BibT_eX]

[DOI]

Mahesh Kumar Nandwana

Proceedings of the Eighth ISCA Tutorial and Research Workshop on Speech Synthesis, 2013

A syllable based statistical text to speech system.

[BibT_eX]

[DOI]

Abhijit Pradhan

Anusha Prakash

Kamakoti Veezhinathan