Soumi Maiti

Orcid: 0000-0001-6940-0115

According to our database¹, Soumi Maiti authored at least 37 papers between 2017 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

TMT: Tri-Modal Translation Between Speech, Image, and Text by Processing Different Modalities as Different Languages.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2026

ASVspoof 5: Design, collection and validation of resources for spoofing, deepfake, and adversarial attack detection using crowdsourced speech.

[BibT_eX]

[DOI]

Comput. Speech Lang., 2026

2025

The Text-to-speech in the Wild (TITW) Database.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

2024

ASVspoof 5: Design, Collection and Validation of Resources for Spoofing, Deepfake, and Adversarial Attack Detection Using Crowdsourced Speech.

[BibT_eX]

[DOI]

Dataset, December, 2024

Text-Inductive Graphone-Based Language Adaptation for Low-Resource Speech Synthesis.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2024

SpoofCeleb: Speech Deepfake Detection and SASV In The Wild.

[BibT_eX]

[DOI]

CoRR, 2024

Text-To-Speech Synthesis In The Wild.

[BibT_eX]

[DOI]

CoRR, 2024

SynesLM: A Unified Approach for Audio-visual Speech Recognition and Translation via Language Model and Synthetic Data.

[BibT_eX]

[DOI]

CoRR, 2024

SpeechComposer: Unifying Multiple Speech Tasks with Prompt Composition.

[BibT_eX]

[DOI]

CoRR, 2024

IndicMOS: Multilingual MOS Prediction for 7 Indian languages.

[BibT_eX]

[DOI]

Sathvik Udupa

Soumi Maiti

Prasanta Kumar Ghosh

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

SpeechBERTScore: Reference-Aware Automatic Evaluation of Speech Generation Leveraging NLP Evaluation Metrics.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

VoxtLM: Unified Decoder-Only Models for Consolidating Speech Recognition, Synthesis and Speech, Text Continuation Tasks.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-Training and Multi-Modal Tokens.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Towards Robust Speech Representation Learning for Thousands of Languages.

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

2023

Evaluating Speech Synthesis by Training Recognizers on Synthetic Speech.

[BibT_eX]

[DOI]

CoRR, 2023

Unsupervised Data Selection for TTS: Using Arabic Broadcast News as a Case Study.

[BibT_eX]

[DOI]

CoRR, 2023

CMU's IWSLT 2023 Simultaneous Speech Translation System.

[BibT_eX]

[DOI]

Proceedings of the 20th International Conference on Spoken Language Translation, 2023

Reducing Barriers to Self-Supervised Learning: HuBERT Pre-training with Academic Compute.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

Speechlmscore: Evaluating Speech Generation Using Speech Language Model.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

FindAdaptNet: Find and Insert Adapters by Learned Layer Importance.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Improving Massively Multilingual ASR with Auxiliary CTC Objectives.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Reproducing Whisper-Style Training Using An Open-Source Toolkit And Publicly Available Data.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Joint Prediction and Denoising for Large-Scale Multilingual Self-Supervised Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit.

[BibT_eX]

[DOI]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), 2023

2022

EEND-SS: Joint End-to-End Neural Speaker Diarization and Speech Separation for Flexible Number of Speakers.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

TriniTTS: Pitch-controllable End-to-end TTS without External Aligner.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

2021

Speech Enhancement Using Speech Synthesis Techniques.

[BibT_eX]

[DOI]

Soumi Maiti

PhD thesis, 2021

End-To-End Diarization for Variable Number of Speakers with Local-Global Networks and Discriminative Speaker Embeddings.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

2020

Generating Multilingual Voices Using Speaker Space Translation Based on Bilingual Speaker Data.

[BibT_eX]

[DOI]

Soumi Maiti

Erik Marchi

Alistair Conkie

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Speaker Independence of Neural Vocoders and Their Effect on Parametric Resynthesis Speech Enhancement.

[BibT_eX]

[DOI]

Soumi Maiti

Michael I. Mandel

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

Parametric Resynthesis With Neural Vocoders.

[BibT_eX]

[DOI]

Soumi Maiti

Michael I. Mandel

Proceedings of the 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2019

Speech Denoising by Parametric Resynthesis.

[BibT_eX]

[DOI]

Soumi Maiti

Michael I. Mandel

Proceedings of the IEEE International Conference on Acoustics, 2019

2018

Large Vocabulary Concatenative Resynthesis.

[BibT_eX]

[DOI]

Soumi Maiti

Joey Ching

Michael I. Mandel

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

2017

Predicting Interaction Quality in Customer Service Dialogs.

[BibT_eX]

[DOI]

Svetlana Stoyanchev

Soumi Maiti

Srinivas Bangalore

Proceedings of the Advanced Social Interaction with Agents, 2017

Concatenative Resynthesis Using Twin Networks.

[BibT_eX]

[DOI]

Soumi Maiti

Michael I. Mandel

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Soumi Maiti

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...