Soham Deshmukh

CoRR, October, 2025

MMAU-Pro: A Challenging and Comprehensive Benchmark for Holistic Evaluation of Audio General Intelligence.

[BibT_eX]

[DOI]

CoRR, August, 2025

OpenBEATs: A Fully Open-Source General-Purpose Audio Encoder.

[BibT_eX]

[DOI]

CoRR, July, 2025

CoLMbo: Speaker Language Model for Descriptive Profiling.

[BibT_eX]

[DOI]

CoRR, June, 2025

Mellow: a small audio language model for reasoning.

[BibT_eX]

[DOI]

CoRR, March, 2025

ADIFF: Explaining audio difference using natural language.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

MACE: Leveraging Audio for Evaluating Audio Captioning Systems.

[BibT_eX]

[DOI]

Satvik Dixit

Proceedings of the IEEE International Conference on Acoustics, 2025

Audio Entailment: Assessing Deductive Reasoning for Audio Understanding.

[BibT_eX]

[DOI]

Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024

Domain Adaptation for Contrastive Audio-Language Models.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

PAM: Prompting Audio-Language Models for Audio Quality Assessment.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

SELM: Enhancing Speech Emotion Recognition for Out-of-Domain Scenarios.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Natural Language Supervision For General-Purpose Audio Representations.

[BibT_eX]

[DOI]

Huaming Wang

Proceedings of the IEEE International Conference on Acoustics, 2024

Prompting Audios Using Acoustic Properties for Emotion Representation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Training Audio Captioning Models without Audio.

[BibT_eX]

[DOI]

Dimitra Emmanouilidou

Huaming Wang

Proceedings of the IEEE International Conference on Acoustics, 2024

2023

LoFT: Local Proxy Fine-tuning For Improving Transferability Of Adversarial Attacks Against Large Language Model.

[BibT_eX]

[DOI]

CoRR, 2023

Synergy between human and machine approaches to sound/scene recognition and processing: An overview of ICASSP special session.

[BibT_eX]

[DOI]

CoRR, 2023

Pengi: An Audio Language Model for Audio Tasks.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Audio Retrieval with WavText5K and CLAP Training.

[BibT_eX]

[DOI]

Huaming Wang

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Multi-View Learning for Speech Emotion Recognition with Categorical Emotion, Categorical Sentiment, and Dimensional Scores.

[BibT_eX]

[DOI]

Daniel Tompkins

Dimitra Emmanouilidou

Proceedings of the IEEE International Conference on Acoustics, 2023

CLAP Learning Audio Concepts from Natural Language Supervision.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

2022

Describing emotions with acoustic property prompts for speech emotion recognition.

[BibT_eX]

[DOI]

CoRR, 2022

Adapting Task-Oriented Dialogue Models for Email Conversations.

[BibT_eX]

[DOI]

Charles Lee

CoRR, 2022

2021

NaRLE: Natural Language Models using Reinforcement Learning with Emotion Feedback.

[BibT_eX]

[DOI]

CoRR, 2021

Improving Weakly Supervised Sound Event Detection with Self-Supervised Auxiliary Tasks.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Detection of Covid-19 Through the Analysis of Vocal Fold Oscillations.

[BibT_eX]

[DOI]

Mahmoud Al Ismail

Proceedings of the IEEE International Conference on Acoustics, 2021

Interpreting Glottal Flow Dynamics for Detecting Covid-19 From Voice.

[BibT_eX]

[DOI]

Mahmoud Al Ismail

Proceedings of the IEEE International Conference on Acoustics, 2021

2020

Multi-Task Learning for Interpretable Weakly Labelled Sound Event Detection.

[BibT_eX]

[DOI]

CoRR, 2020

2019

Attacker Behaviour Profiling using Stochastic Ensemble of Hidden Markov Models.

[BibT_eX]

[DOI]