Nima Mesgarani

Junkai Wu

Vishal Choudhari

Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2025

ZeroSep: Separate Anything in Audio with Zero Training.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion.

[BibT_eX]

[DOI]

Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

Neuro2Semantic: A Transfer Learning Framework for Semantic Reconstruction of Continuous Language from Human Intracranial EEG.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Speech Slytherin: Examining the Performance and Efficiency of Mamba for Speech Separation, Recognition, and Synthesis.

[BibT_eX]

[DOI]

Adrian Nicolas Florea

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Dual-path Mamba: Short and Long-term Bidirectional Selective Structured State Space Models for Speech Separation.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Decoding the Unintelligible: Neural Speech Tracking in Low Signal-to-Noise Ratios.

[BibT_eX]

[DOI]

Xiaomin He

Vinay S. Raghavan

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Layer-wise Minimal Pair Probing Reveals Contextual Grammatical-Conceptual Hierarchy in Speech Representations.

[BibT_eX]

[DOI]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

AAD-LLM: Neural Attention-Driven Auditory Scene Understanding.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Large Language Models as Neurolinguistic Subjects: Discrepancy between Performance and Competence.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2025

2024

Contextual feature extraction hierarchies converge in large language models and the brain.

[BibT_eX]

[DOI]

Nat. Mac. Intell., 2024

Large Language Models as Neurolinguistic Subjects: Identifying Internal Representations for Form and Meaning.

[BibT_eX]

[DOI]

CoRR, 2024

Style-Talker: Finetuning Audio Language Model and Style-Based Text-to-Speech Model for Fast Spoken Dialogue Generation.

[BibT_eX]

[DOI]

CoRR, 2024

DeepSpeech models show Human-like Performance and Processing of Cochlear Implant Inputs.

[BibT_eX]

[DOI]

Cynthia R. Steinhardt

Menoua Keshishian

Kim Stachenfeld

CoRR, 2024

Listen, Chat, and Edit: Text-Guided Soundscape Modification for Enhanced Auditory Experience.

[BibT_eX]

[DOI]

CoRR, 2024

Just ASR + LLM? A Study on Speech Large Language Models' Ability to Identify And Understand Speaker in Spoken Dialogue.

[BibT_eX]

[DOI]

Mark Hasegawa-Johnson

Mari Ostendorf

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

SSAMBA: Self-Supervised Audio Representation Learning With Mamba State Space Model.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

Exploring Self-supervised Contrastive Learning of Spatial Sound Event Representation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Decoding auditory attention for real-time BCI control.

[BibT_eX]

[DOI]

Proceedings of the 46th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 2024

2023

naplib-python: Neural acoustic data processing and analysis tools in python.

[BibT_eX]

[DOI]

Softw. Impacts, September, 2023

Deep neural networks effectively model neural adaptation to changing background noise and suggest nonlinear noise filtering methods in auditory cortex.

[BibT_eX]

[DOI]

NeuroImage, 2023

HiFTNet: A Fast High-Quality Neural Vocoder with Harmonic-plus-Noise Filter and Inverse Short Time Fourier Transform.

[BibT_eX]

[DOI]

CoRR, 2023

SLMGAN: Exploiting Speech Language Model Representations for Unsupervised Zero-Shot Voice Conversion in GANs.

[BibT_eX]

[DOI]

Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2023

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

DeCoR: Defy Knowledge Forgetting by Predicting Earlier Audio Codes.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Phoneme-Level Bert for Enhanced Prosody of Text-To-Speech with Grapheme Predictions.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Online Binaural Speech Separation Of Moving Speakers With A Wavesplit Network.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Improved Decoding of Attentional Selection in Multi-Talker Environments with Self-Supervised Learned Speech Representation.

[BibT_eX]

[DOI]

Proceedings of the 45th Annual International Conference of the IEEE Engineering in Medicine & Biology Society, 2023

2022

Styletts-VC: One-Shot Voice Conversion by Knowledge Transfer From Style-Based TTS Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

2021

Group Communication With Context Codec for Lightweight Source Separation.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2021

Neural representation of linguistic feature hierarchy reflects second-language proficiency.

[BibT_eX]

[DOI]

Giovanni M. Di Liberto

NeuroImage, 2021

Functional characterization of human Heschl's gyrus in response to natural speech.

[BibT_eX]

[DOI]

NeuroImage, 2021

Distortion-Controlled Training for end-to-end Reverberant Speech Separation with Auxiliary Autoencoding Loss.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Understanding Adaptive, Multiscale Temporal Integration In Deep Speech Recognition Systems.

[BibT_eX]

[DOI]

Menoua Keshishian

Samuel Norman-Haignere

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Implicit Filter-and-Sum Network for End-to-End Multi-Channel Speech Separation.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Empirical Analysis of Generalized Iterative Speech Separation Networks.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

StarGANv2-VC: A Diverse, Unsupervised, Non-Parallel Framework for Natural-Sounding Voice Conversion.

[BibT_eX]

[DOI]

Ali Zare

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Binaural Speech Separation of Moving Speakers With Preserved Spatial Cues.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Continuous Speech Separation Using Speaker Inventory for Long Recording.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Ultra-Lightweight Speech Separation Via Group Communication.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Rethinking The Separation Layers In Speech Separation Networks.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Speaker and Direction Inferred Dual-Channel Speech Separation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

2020

Brain-informed speech separation (BISS) for enhancement of target speaker in multitalker speech perception.

[BibT_eX]

[DOI]

NeuroImage, 2020

Continuous Speech Separation Using Speaker Inventory for Long Multi-talker Recording.

[BibT_eX]

[DOI]

CoRR, 2020

Group Communication with Context Codec for Ultra-Lightweight Source Separation.

[BibT_eX]

[DOI]

CoRR, 2020

Implicit Filter-and-sum Network for Multi-channel Speech Separation.

[BibT_eX]

[DOI]

CoRR, 2020

Separating Varying Numbers of Sources with Auxiliary Autoencoding Loss.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

End-to-end Microphone Permutation and Number Invariant Multi-channel Speech Separation.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Real-Time Binaural Speech Separation with Preserved Spatial Cues.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2019

Online Deep Attractor Network for Real-time Single-channel Speech Separation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Augmented Time-frequency Mask Estimation in Cluster-based Source Separation Algorithms.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

FaSNet: Low-Latency Adaptive Beamforming for Multi-Microphone Audio Processing.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

2018

Speaker-Independent Speech Separation With Deep Attractor Network.

[BibT_eX]

[DOI]

Zhuo Chen

IEEE ACM Trans. Audio Speech Lang. Process., 2018

TasNet: Surpassing Ideal Time-Frequency Masking for Speech Separation.

[BibT_eX]

[DOI]

CoRR, 2018

Speech Processing in the Human Brain Meets Deep Learning.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Music Source Activity Detection and Separation Using Deep Attractor Network.

[BibT_eX]

[DOI]

Rajath Kumar

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Real-time Single-channel Dereverberation and Separation with Time-domain Audio Separation Network.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Lip2Audspec: Speech Reconstruction from Silent Lip Movements Video.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

TaSNet: Time-Domain Audio Separation Network for Real-Time, Single-Channel Speech Separation.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017

Understanding the Representation and Computation of Multilayer Perceptrons: A Case Study in Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 34th International Conference on Machine Learning, 2017

Deep clustering and conventional networks for music separation: Stronger together.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

NAPLib: An open source toolbox for real-time and offline Neural Acoustic Processing.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Deep attractor network for single-microphone speaker separation.

[BibT_eX]

[DOI]

Zhuo Chen

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Neural decoding of attentional selection in multi-speaker environments without access to separated sources.

[BibT_eX]

[DOI]

Proceedings of the 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), 2017

2016

On the Role of Nonlinear Transformations in Deep Neural Network Acoustic Models.

[BibT_eX]

[DOI]

Michael L. Seltzer

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Adaptation of Neural Networks Constrained by Prior Statistics of Node Co-Activations.

[BibT_eX]

[DOI]

Zhuo Chen

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Synaptic depression in deep neural networks for speech processing.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Designing a hands-on brain computer interface laboratory course.

[BibT_eX]

[DOI]

Bahar Khalighinejad

Laura Kathleen Long

Proceedings of the 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 2016

Analyzing distributional learning of phonemic categories in unsupervised deep neural networks.

[BibT_eX]

[DOI]

Okko Räsänen

Proceedings of the 38th Annual Meeting of the Cognitive Science Society, 2016

2015

Keynote addresses: Reverse engineering the neural mechanisms involved in robust speech processing.

[BibT_eX]

[DOI]

Mark D. Plumbley

Proceedings of the 2015 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2015

Speech reconstruction from human auditory cortex with deep neural networks.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Exploring how deep neural networks form phonemic categories.

[BibT_eX]

[DOI]

Michael L. Seltzer

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

2014

Stimulus Reconstruction from Cortical Responses.

[BibT_eX]

[DOI]

Proceedings of the Encyclopedia of Computational Neuroscience, 2014

Principal components of auditory spectro-temporal receptive fields.

[BibT_eX]

[DOI]

Nagaraj Mahajan

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

2013

Developing a speaker identification system for the DARPA RATS project.

[BibT_eX]

[DOI]

Sri Harish Reddy Mallidi

Proceedings of the IEEE International Conference on Acoustics, 2013

2012

Automatic intelligibility assessment of pathologic speech in head and neck cancer based on auditory-inspired spectro-temporal modulations.

[BibT_eX]

[DOI]

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Acoustic and Data-driven Features for Robust Speech Activity Detection.

[BibT_eX]

[DOI]

Sri Harish Reddy Mallidi

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Developing a Speech Activity Detection System for the DARPA RATS Program.

[BibT_eX]

[DOI]

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Speech and speaker separation in human auditory cortex.

[BibT_eX]

[DOI]

Edward Chang

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

The UMD-JHU 2011 speaker recognition system.

[BibT_eX]

[DOI]

Daniel Garcia-Romero

Xinhui Zhou

Dmitry N. Zotkin

Balaji Vasan Srinivasan

Yuancheng Luo

Sriram Ganapathy

Garimella S. V. S. Sivaram

Sridhar Krishna Nemala

Majid Mirbagheri

Sri Harish Reddy Mallidi

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

2011

Performance monitoring for robustness in automatic recognition of speechi.

[BibT_eX]

[DOI]

Proceedings of the 2011 Symposium on Machine Learning in Speech and Language Processing, 2011

Adaptive Stream Fusion in Multistream Recognition of Speech.

[BibT_eX]

[DOI]

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Speech processing with a cortical representation of audio.

[BibT_eX]

[DOI]

Garimella S. V. S. Sivaram

Proceedings of the IEEE International Conference on Acoustics, 2011

2010

Data-Driven and Feedback Based Spectro-Temporal Features for Speech Recognition.

[BibT_eX]

[DOI]

Sridhar Krishna Nemala

IEEE Signal Process. Lett., 2010

A computational model of rapid task-related plasticity of auditory cortical receptive fields.

[BibT_eX]

[DOI]

Jonathan B. Fritz

J. Comput. Neurosci., 2010

The use of spike-based representations for hardware audition systems.

[BibT_eX]

[DOI]

Proceedings of the International Symposium on Circuits and Systems (ISCAS 2010), May 30, 2010

A phoneme recognition framework based on auditory spectro-temporal receptive fields.

[BibT_eX]

[DOI]

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

A multistream multiresolution framework for phoneme recognition.

[BibT_eX]

[DOI]

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Nonlinear filtering of spectrotemporal modulations in speech enhancement.

[BibT_eX]

[DOI]

Majid Mirbagheri

Proceedings of the IEEE International Conference on Acoustics, 2010

2009

Discriminant spectrotemporal features for phoneme recognition.

[BibT_eX]

[DOI]

Garimella S. V. S. Sivaram

Sridhar Krishna Nemala

Mounya Elhilali

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

2008

Representation of speech in the primary auditory cortex and its implications for robust speech processing.

[BibT_eX]

[DOI]

PhD thesis, 2008

2007

Denoising in the Domain of Spectrotemporal Modulations.

[BibT_eX]

[DOI]

EURASIP J. Audio Speech Music. Process., 2007

Representation of Phonemes in Primary Auditory Cortex: How the Brain Analyzes Speech.

[BibT_eX]

[DOI]

Stephen V. David

Proceedings of the IEEE International Conference on Acoustics, 2007

2006

Discrimination of speech from nonspeech based on multiscale spectro-temporal Modulations.

[BibT_eX]

[DOI]

Malcolm Slaney

IEEE Trans. Speech Audio Process., 2006

Discriminating speech and non-speech with regularized least squares.

[BibT_eX]

[DOI]

Ryan Rifkin

Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

2005

Speech Enhancement Based on Filtering the Spectrotemporal Modulations.

[BibT_eX]

[DOI]

Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

2004

Speech discrimination based on multiscale spectro-temporal modulations.

[BibT_eX]

[DOI]