Sreyan Ghosh

Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

ProSE: Diffusion Priors for Speech Enhancement.

[BibT_eX]

[DOI]

Anton Jeran Ratnarajah

Ramani Duraiswami

Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Visual Description Grounding Reduces Hallucinations and Boosts Reasoning in LVLMs.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Adversarial Speech-Text Pre-Training for Speech Translation.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Large Language Models Are Efficient Learners as Zero-Shot Speech Translators.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

ReCLAP: Improving Zero Shot Audio Classification by Describing Sounds.

[BibT_eX]

[DOI]

Oriol Nieto

Ramani Duraiswami

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

EGOILLUSION: Benchmarking Hallucinations in Egocentric Video Understanding.

[BibT_eX]

[DOI]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

MULTIVOX: A Benchmark for Evaluating Voice Assistants for Multimodal Interactions.

[BibT_eX]

[DOI]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

Failing Forward: Improving Generative Error Correction for ASR with Synthetic Data and Retrieval Augmentation.

[BibT_eX]

[DOI]

Mohammad Sadegh Rasooli

Proceedings of the Findings of the Association for Computational Linguistics, 2025

2024

Multi-Domain Audio Question Answering in the DCASE 2025 Challenge.

[BibT_eX]

[DOI]

Dataset, April, 2024

GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities.

[BibT_eX]

[DOI]

CoRR, 2024

ABEX: Data Augmentation for Low-Resource NLU via Expanding Abstract Descriptions.

[BibT_eX]

[DOI]

S. Ramaneswaran

Sakshi Singh

CoRR, 2024

VDGD: Mitigating LVLM Hallucinations in Cognitive Prompts by Bridging the Visual Perception Gap.

[BibT_eX]

[DOI]

CoRR, 2024

Do Vision-Language Models Understand Compound Nouns?

[BibT_eX]

[DOI]

CoRR, 2024

Do Vision-Language Models Understand Compound Nouns?

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Short Papers, 2024

CoDa: Constrained Generation based Data Augmentation for Low-Resource NLP.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2024, 2024

LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

A Closer Look at the Limitations of Instruction Tuning.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

FusDom: Combining in-Domain and Out-of-Domain Knowledge for Continuous Self-Supervised Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Stable Distillation: Regularizing Continued Pre-Training for Low-Resource Automatic Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Recap: Retrieval-Augmented Audio Captioning.

[BibT_eX]

[DOI]

Ramani Duraiswami

Proceedings of the IEEE International Conference on Acoustics, 2024

EH-MAM: Easy-to-Hard Masked Acoustic Modeling for Self-Supervised Speech Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities.

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

AV-RIR: Audio-Visual Room Impulse Response Estimation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

ABEX: Data Augmentation for Low-Resource NLU via Expanding Abstract Descriptions.

[BibT_eX]

[DOI]

Ramaneswaran S.

S. Sakshi

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

ASPIRE: Language-Guided Data Augmentation for Improving Robustness Against Spurious Correlations.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2024

2023

ASPIRE: Language-Guided Augmentation for Robust Image Classification.

[BibT_eX]

[DOI]

CoRR, 2023

BioAug: Conditional Generation based Data Augmentation for Low-Resource Biomedical NER.

[BibT_eX]

[DOI]

Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2023

MMER: Multimodal Multi-task Learning for Speech Emotion Recognition.

[BibT_eX]

[DOI]

S. Ramaneswaran

Harshvardhan Srivastava

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

AdVerb: Visually Guided Audio Dereverberation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

SLICER: Learning Universal Audio Representations Using Low-Resource Self-Supervised Pre-Training.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Unfused: Unsupervised Finetuning Using Self Supervised Distillation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Data2vec-Aqc: Search for the Right Teaching Assistant in the Teacher-Student Training Setup.

[BibT_eX]

[DOI]

Vasista Sai Lodagala

Proceedings of the IEEE International Conference on Acoustics, 2023

MAST: Multiscale Audio Spectrogram Transformers.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

CoSyn: Detecting Implicit Hate Speech in Online Conversations Using a Context Synergized Hyperbolic Network.

[BibT_eX]

[DOI]

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

DALE: Generative Data Augmentation for Low-Resource Legal NLP.

[BibT_eX]

[DOI]

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

ACLM: A Selective-Denoising based Generative Data Augmentation Approach for Low-Resource Complex NER.

[BibT_eX]

[DOI]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

2022

Decorrelating Feature Spaces for Learning General-Purpose Audio Representations.

[BibT_eX]

[DOI]

IEEE J. Sel. Top. Signal Process., 2022

A novel multimodal dynamic fusion network for disfluency detection in spoken utterances.

[BibT_eX]

[DOI]

CoRR, 2022

Analyzing the factors affecting usefulness of Self-Supervised Pre-trained Representations for Speech Recognition.

[BibT_eX]

[DOI]

Lodagala Durga Prasad

Lodagala V. S. V. Durga Prasad

CoRR, 2022

PADA: Pruning Assisted Domain Adaptation for Self-Supervised Speech Representations.

[BibT_eX]

[DOI]

CoRR, 2022

A Discourse Aware Sequence Learning Approach for Emotion Recognition in Conversations.

[BibT_eX]

[DOI]

Harshvardhan Srivastava

CoRR, 2022

MMER: Multimodal Multi-task learning for Emotion Recognition in Spoken Utterances.

[BibT_eX]

[DOI]

Harshvardhan Srivastava

CoRR, 2022

DeLoRes: Decorrelating Latent Spaces for Low-Resource Audio Representation Learning.

[BibT_eX]

[DOI]

CoRR, 2022

PADA: Pruning Assisted Domain Adaptation for Self-Supervised Speech Representations.

[BibT_eX]

[DOI]

Vasista Sai Lodagala

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

CCC-WAV2VEC 2.0: Clustering AIDED Cross Contrastive Self-Supervised Learning of Speech Representations.

[BibT_eX]

[DOI]

Vasista Sai Lodagala

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

DeToxy: A Large-Scale Multimodal Dataset for Toxicity Classification in Spoken Utterances.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Span Classification with Structured Information for Disfluency Detection in Spoken Utterances.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Span Extraction Aided Improved Code-mixed Sentiment Classification.

[BibT_eX]

[DOI]

Ramaneswaran S.

Sean Benhur

Proceedings of the Eighth Workshop on Noisy User-generated Text, 2022

2021

Deep Clustering For General-Purpose Audio Representations.

[BibT_eX]

[DOI]

CoRR, 2021

Speech Toxicity Analysis: A New Spoken Language Processing Task.

[BibT_eX]

[DOI]

CoRR, 2021

Cisco at AAAI-CAD21 shared task: Predicting Emphasis in Presentation Slides using Contextualised Embeddings.

[BibT_eX]

[DOI]

CoRR, 2021

Cisco at SemEval-2021 Task 5: What's Toxic?: Leveraging Transformers for Multiple Toxic Span Extraction from Online Comments.

[BibT_eX]

[DOI]

Proceedings of the 15th International Workshop on Semantic Evaluation, 2021

Leveraging Transformers for Hate Speech Detection in Conversational Code-Mixed Tweets.

[BibT_eX]

[DOI]

Zaki Mustafa Farooqi

Rajiv Ratn Shah

Proceedings of the Working Notes of FIRE 2021, 2021

2020

End-to-End Named Entity Recognition from English Speech.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

2016

Structure and stability of noble gas bound EX3+ compounds (E = C, Ge, Sn, Pb; X = H, F, Cl, Br).

[BibT_eX]

[DOI]

Sudip Pan

Diego Moreno

Pratim Kumar Chattaraj

Gabriel Merino

J. Comput. Chem., 2016

Embodied Material Guidance: Augmenting Material for Carving.

[BibT_eX]

[DOI]

Marcel Penz