Soumi Maiti
Orcid: 0000-0001-6940-0115
According to our database1,
Soumi Maiti
authored at least 29 papers
between 2017 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
2024
IEEE ACM Trans. Audio Speech Lang. Process., 2024
TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages.
CoRR, 2024
SpeechBERTScore: Reference-Aware Automatic Evaluation of Speech Generation Leveraging NLP Evaluation Metrics.
CoRR, 2024
2023
Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study.
CoRR, 2023
Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens.
CoRR, 2023
Voxtlm: unified decoder-only models for consolidating speech recognition/synthesis and speech/text continuation tasks.
CoRR, 2023
Reducing Barriers to Self-Supervised Learning: HuBERT Pre-training with Academic Compute.
CoRR, 2023
CoRR, 2023
Proceedings of the 20th International Conference on Spoken Language Translation, 2023
Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining.
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Reproducing Whisper-Style Training Using An Open-Source Toolkit And Publicly Available Data.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023
Joint Prediction and Denoising for Large-Scale Multilingual Self-Supervised Learning.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 2023
2022
EEND-SS: Joint End-to-End Neural Speaker Diarization and Speech Separation for Flexible Number of Speakers.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022
Proceedings of the Interspeech 2022, 2022
2021
End-To-End Diarization for Variable Number of Speakers with Local-Global Networks and Discriminative Speaker Embeddings.
Proceedings of the IEEE International Conference on Acoustics, 2021
2020
Generating Multilingual Voices Using Speaker Space Translation Based on Bilingual Speaker Data.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Speaker Independence of Neural Vocoders and Their Effect on Parametric Resynthesis Speech Enhancement.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
2019
Proceedings of the 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2019
Proceedings of the IEEE International Conference on Acoustics, 2019
2018
Proceedings of the Interspeech 2018, 2018
2017
Proceedings of the Advanced Social Interaction with Agents, 2017
Proceedings of the Interspeech 2017, 2017