Soumi Maiti

Orcid: 0000-0001-6940-0115

According to our database1, Soumi Maiti authored at least 29 papers between 2017 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Text-Inductive Graphone-Based Language Adaptation for Low-Resource Speech Synthesis.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages.
CoRR, 2024

SpeechComposer: Unifying Multiple Speech Tasks with Prompt Composition.
CoRR, 2024

SpeechBERTScore: Reference-Aware Automatic Evaluation of Speech Generation Leveraging NLP Evaluation Metrics.
CoRR, 2024

2023
Evaluating Speech Synthesis by Training Recognizers on Synthetic Speech.
CoRR, 2023

Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study.
CoRR, 2023

Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens.
CoRR, 2023

Voxtlm: unified decoder-only models for consolidating speech recognition/synthesis and speech/text continuation tasks.
CoRR, 2023

Reducing Barriers to Self-Supervised Learning: HuBERT Pre-training with Academic Compute.
CoRR, 2023

Unsupervised Data Selection for TTS: Using Arabic Broadcast News as a Case Study.
CoRR, 2023

CMU's IWSLT 2023 Simultaneous Speech Translation System.
Proceedings of the 20th International Conference on Spoken Language Translation, 2023

Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining.
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

Speechlmscore: Evaluating Speech Generation Using Speech Language Model.
Proceedings of the IEEE International Conference on Acoustics, 2023

FindAdaptNet: Find and Insert Adapters by Learned Layer Importance.
Proceedings of the IEEE International Conference on Acoustics, 2023

Improving Massively Multilingual ASR with Auxiliary CTC Objectives.
Proceedings of the IEEE International Conference on Acoustics, 2023

Reproducing Whisper-Style Training Using An Open-Source Toolkit And Publicly Available Data.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Joint Prediction and Denoising for Large-Scale Multilingual Self-Supervised Learning.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 2023

2022
EEND-SS: Joint End-to-End Neural Speaker Diarization and Speech Separation for Flexible Number of Speakers.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

TriniTTS: Pitch-controllable End-to-end TTS without External Aligner.
Proceedings of the Interspeech 2022, 2022

2021
Speech Enhancement Using Speech Synthesis Techniques.
PhD thesis, 2021

End-To-End Diarization for Variable Number of Speakers with Local-Global Networks and Discriminative Speaker Embeddings.
Proceedings of the IEEE International Conference on Acoustics, 2021

2020
Generating Multilingual Voices Using Speaker Space Translation Based on Bilingual Speaker Data.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Speaker Independence of Neural Vocoders and Their Effect on Parametric Resynthesis Speech Enhancement.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019
Parametric Resynthesis With Neural Vocoders.
Proceedings of the 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2019

Speech Denoising by Parametric Resynthesis.
Proceedings of the IEEE International Conference on Acoustics, 2019

2018
Large Vocabulary Concatenative Resynthesis.
Proceedings of the Interspeech 2018, 2018

2017
Predicting Interaction Quality in Customer Service Dialogs.
Proceedings of the Advanced Social Interaction with Agents, 2017

Concatenative Resynthesis Using Twin Networks.
Proceedings of the Interspeech 2017, 2017


  Loading...