Michael I. Mandel

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

2024

Data-Centric Methods for Environmental Sound Classification With Limited Labels.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2024

What do MLLMs hear? Examining reasoning with text and sound components in Multimodal Large Language Models.

[BibT_eX]

[DOI]

Enis Berk Çoban

CoRR, 2024

emg2qwerty: A Large Dataset with Baselines for Touch Typing using Surface Electromyography.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Towards High Resolution Weather Monitoring With Sound Data.

[BibT_eX]

[DOI]

Enis Berk Çoban

Megan Perra

Proceedings of the IEEE International Conference on Acoustics, 2024

2023

Estimating Shapley Values of Training Utterances for Automatic Speech Recognition Models.

[BibT_eX]

[DOI]

Ali Raza Syed

Proceedings of the IEEE International Conference on Acoustics, 2023

2022

Importantaug: A Data Augmentation Agent for Speech.

[BibT_eX]

[DOI]

Hassan Salami Kavaki

Proceedings of the IEEE International Conference on Acoustics, 2022

EDANSA-2019: The Ecoacoustic Dataset from Arctic North Slope Alaska.

[BibT_eX]

[DOI]

Proceedings of the 7th Workshop on Detection and Classification of Acoustic Scenes and Events 2022, 2022

2021

Directly Comparing the Listening Strategies of Humans and Machines.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2021

Towards Large Scale Ecoacoustic Monitoring with Small Amounts of Labeled Data.

[BibT_eX]

[DOI]

Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2021

WPD++: An Improved Neural Beamformer for Simultaneous Speech Separation and Dereverberation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

2020

Combining Spatial Clustering with LSTM Speech Models for Multichannel Speech Enhancement.

[BibT_eX]

[DOI]

CoRR, 2020

Improved MVDR Beamforming Using LSTM Speech Models to Clean Spatial Clustering Masks.

[BibT_eX]

[DOI]

CoRR, 2020

Enhancement of Spatial Clustering-Based Time-Frequency Masks using LSTM Neural Networks.

[BibT_eX]

[DOI]

CoRR, 2020

CHiME-6 Challenge: Tackling Multispeaker Speech Recognition for Unsegmented Recordings.

[BibT_eX]

[DOI]

CoRR, 2020

Large Scale Evaluation of Importance Maps in Automatic Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Identifying Important Time-Frequency Locations in Continuous Speech Utterances.

[BibT_eX]

[DOI]

Hassan Salami Kavaki

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Mask-Dependent Phase Estimation for Monaural Speaker Separation.

[BibT_eX]

[DOI]

Zhaoheng Ni

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Speaker Independence of Neural Vocoders and Their Effect on Parametric Resynthesis Speech Enhancement.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Transfer Learning from Youtube Soundtracks to Tag Arctic Ecoacoustic Recordings.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

Onssen: an open-source speech separation and enhancement library.

[BibT_eX]

[DOI]

Zhaoheng Ni

CoRR, 2019

Parametric Resynthesis With Neural Vocoders.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2019

Speech Denoising by Parametric Resynthesis.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

2018

Unusable Spoken Response Detection with BLSTM Neural Networks.

[BibT_eX]

[DOI]

David Suendermann-Oeft

Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

Bubble Cooperative Networks for Identifying Important Speech Cues.

[BibT_eX]

[DOI]

Brian McFee

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Concatenative Resynthesis with Improved Training Signals for Speech Enhancement.

[BibT_eX]

[DOI]

Ali Raza Syed

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Large Vocabulary Concatenative Resynthesis.

[BibT_eX]

[DOI]

Joey Ching

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

2017

Concatenative Resynthesis Using Twin Networks.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Active learning for low-resource speech recognition: Impact of selection size and language modeling data.

[BibT_eX]

[DOI]

Ali Raza Syed

Andrew Rosenberg

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

An evaluation of score-informed methods for estimating fundamental frequency and power from polyphonic audio.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Analyzing Human and Machine Performance In Resolving Ambiguous Spoken Sentences.

[BibT_eX]

[DOI]

Hussein Ghaly

Proceedings of the Workshop on Speech-Centric Natural Language Processing, 2017

Confused or not Confused?: Disentangling Brain Activity from EEG Data Using Bidirectional LSTM Recurrent Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 8th ACM International Conference on Bioinformatics, 2017

Discriminative Beamforming with Phase-Aware Neural Networks for Speech Enhancement and Recognition.

[BibT_eX]

[DOI]

Proceedings of the New Era for Robust Speech Recognition, Exploiting Deep Learning., 2017

Multichannel Spatial Clustering Using Model-Based Source Separation.

[BibT_eX]

[DOI]

Jon P. Barker

Proceedings of the New Era for Robust Speech Recognition, Exploiting Deep Learning., 2017

2016

Multichannel Spatial Clustering for Robust Far-Field Automatic Speech Recognition in Mismatched Conditions.

[BibT_eX]

[DOI]

Jon Barker

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Directly Comparing the Listening Strategies of Humans and Machines.

[BibT_eX]

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Improved MVDR Beamforming Using Single-Channel Mask Prediction Networks.

[BibT_eX]

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Deep beamforming networks for multi-channel speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2015

Exciting estimated clean spectra for speech resynthesis.

[BibT_eX]

[DOI]

Sreyas Srimath Tirumala

Proceedings of the 2015 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2015

Audio super-resolution using concatenative resynthesis.

[BibT_eX]

[DOI]

Young Suk Cho

Proceedings of the 2015 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2015

Enforcing consistency in spectral masks using Markov random fields.

[BibT_eX]

[DOI]

Nicoleta Roman

Proceedings of the 23rd European Signal Processing Conference, 2015

Combining spectral feature mapping and multi-channel model-based source separation for noise-robust automatic speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

2014

Generalizing time-frequency importance functions across noises, talkers, and phonemes.

[BibT_eX]

[DOI]

Sarah E. Yoho

Eric W. Healy

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Analysis-by-synthesis feature estimation for robust automatic speech recognition using spectral masks.

[BibT_eX]

[DOI]

Arun Narayanan

Proceedings of the IEEE International Conference on Acoustics, 2014

Learning a concatenative resynthesis system for noise suppression.

[BibT_eX]

[DOI]

Young Suk Cho

Yuxuan Wang

Proceedings of the 2014 IEEE Global Conference on Signal and Information Processing, 2014

2013

Gestural Query Specification.

[BibT_eX]

[DOI]

Arnab Nandi

Lilong Jiang

Proc. VLDB Endow., 2013

GestureQuery: A Multitouch Database Query Interface.

[BibT_eX]

[DOI]

Lilong Jiang

Arnab Nandi

Proc. VLDB Endow., 2013

Learning an intelligibility map of individual utterances.

[BibT_eX]

[DOI]

Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2013

Classification based binaural dereverberation.

[BibT_eX]

[DOI]

Nicoleta Roman

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

The interactive join: recognizing gestures for database queries.

[BibT_eX]

[DOI]

Arnab Nandi

Proceedings of the 2013 ACM SIGCHI Conference on Human Factors in Computing Systems, 2013

2012

Learning Algorithms for the Classification Restricted Boltzmann Machine.

[BibT_eX]

[DOI]

J. Mach. Learn. Res., 2012

A Study of Intonation in Three-Part Singing using the Automatic Music Performance Analysis and Comparison Toolkit (AMPACT).

[BibT_eX]

[DOI]

Ichiro Fujinaga

Proceedings of the 13th International Society for Music Information Retrieval Conference, 2012

2011

Contextual tag inference.

[BibT_eX]

[DOI]

ACM Trans. Multim. Comput. Commun. Appl., 2011

Combining localization cues and source model constraints for binaural source separation.

[BibT_eX]

[DOI]

Ron J. Weiss

Speech Commun., 2011

Autotagging music with conditional restricted Boltzmann machines

[BibT_eX]

[DOI]

CoRR, 2011

Characterizing singing voice fundamental frequency trajectories.

[BibT_eX]

[DOI]

Ichiro Fujinaga

Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2011

2010

Model-Based Expectation-Maximization Source Separation and Localization.

[BibT_eX]

[DOI]

Ron J. Weiss

IEEE Trans. Speech Audio Process., 2010

Evaluating Source Separation Algorithms With Reverberant Speech.

[BibT_eX]

[DOI]

Barbara G. Shinn-Cunningham

Scott Bressler

IEEE Trans. Speech Audio Process., 2010

Learning Tags that Vary Within a Song.

[BibT_eX]

[DOI]

Douglas Eck

Yoshua Bengio

Proceedings of the 11th International Society for Music Information Retrieval Conference, 2010

Scalable Genre and Tag Prediction with Spectral Covariance.

[BibT_eX]

[DOI]

James Bergstra

Douglas Eck

Proceedings of the 11th International Society for Music Information Retrieval Conference, 2010

2009

The Ideal Interaural Parameter Mask: A bound on binaural separation systems.

[BibT_eX]

[DOI]

Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2009

Improving MIDI-audio alignment with acoustic features.

[BibT_eX]

[DOI]

Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2009

Evaluation of Algorithms Using Games: The Case of Music Tagging.

[BibT_eX]

[DOI]

Proceedings of the 10th International Society for Music Information Retrieval Conference, 2009

2008

Active Learning for Interactive Multimedia Retrieval.

[BibT_eX]

[DOI]

Proc. IEEE, 2008

Multiple-Instance Learning for Music Information Retrieval.

[BibT_eX]

[DOI]

Proceedings of the ISMIR 2008, 2008

Source separation based on binaural cues and source model constraints.

[BibT_eX]

[DOI]

Ron J. Weiss

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Cross-correlation of beat-synchronous representations for music similarity.

[BibT_eX]

[DOI]

Courtenay V. Cotton

Proceedings of the IEEE International Conference on Acoustics, 2008

2007

A Web-Based Game for Collecting Music Metadata.

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Music Information Retrieval, 2007

2006

Support vector machine active learning for music retrieval.

[BibT_eX]

[DOI]

Graham E. Poliner

Multim. Syst., 2006

An EM Algorithm for Localizing Multiple Sound Sources in Reverberant Environments.

[BibT_eX]

[DOI]

Tony Jebara

Proceedings of the Advances in Neural Information Processing Systems 19, 2006

A probability model for interaural phase difference.

[BibT_eX]

[DOI]

Proceedings of the ISCA Tutorial and Research Workshop on Statistical and Perceptual Audition, 2006

2005

Song-Level Features and Support Vector Machines for Music Classification.

[BibT_eX]

[DOI]

Dan Ellis

Proceedings of the ISMIR 2005, 2005

2004

Distributed Occlusion Reasoning for Tracking with Nonparametric Belief Propagation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 17 [Neural Information Processing Systems, 2004

Visual Hand Tracking Using Nonparametric Belief Propagation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2004

Modeling Physical Capabilities of Humanoid Agents Using Motion Capture Dat.

[BibT_eX]

[DOI]

Gita Sukthankar