Marc Delcroix

Christoph Boeddeker

Tsubasa Ochiai

IEEE Signal Process. Mag., November, 2024

Module-Based End-to-End Distant Speech Processing: A case study of far-field automatic speech recognition [Special Issue On Model-Based and Data-Driven Audio Signal Processing].

[BibT_eX]

[DOI]

IEEE Signal Process. Mag., November, 2024

Rethinking Processing Distortions: Disentangling the Impact of Speech Enhancement Errors on Speech Recognition Performance.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2024

Applying LLMs for Rescoring N-best ASR Hypotheses of Casual Conversations: Effects of Domain Adaptation and Context Carry-over.

[BibT_eX]

[DOI]

CoRR, 2024

Unveiling the Linguistic Capabilities of a Self-Supervised Speech Model Through Cross-Lingual Benchmark and Layer- Wise Similarity Analysis.

[BibT_eX]

[DOI]

IEEE Access, 2024

Recursive Attentive Pooling For Extracting Speaker Embeddings From Multi-Speaker Recordings.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

Investigation of Speaker Representation for Target-Speaker Speech Processing.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

Multi-Stream Diffusion Model for Probabilistic Integration of Model-Based and Data-Driven Speech Enhancement.

[BibT_eX]

[DOI]

Proceedings of the 18th International Workshop on Acoustic Signal Enhancement, 2024

Interaural Time Difference Loss for Binaural Target Sound Extraction.

[BibT_eX]

[DOI]

Carlos Hernandez-Olivan

Proceedings of the 18th International Workshop on Acoustic Signal Enhancement, 2024

Array Geometry-Robust Attention-Based Neural Beamformer for Moving Speakers.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

SpeakerBeam-SS: Real-time Target Speaker Extraction with Lightweight Conv-TasNet and State Space Modeling.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Sentence-wise Speech Summarization: Task, Datasets, and End-to-End Modeling with LM Knowledge Distillation.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Lightweight Zero-shot Text-to-Speech with Mixture of Adapters.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Online Target Sound Extraction with Knowledge Distillation from Partially Non-Causal Teacher.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

NTT Speaker Diarization System for Chime-7: Multi-Domain, Multi-Microphone end-to-end and Vector Clustering Diarization.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Ensemble Inference for Diffusion Model-Based Speech Enhancement.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Neural Network-Based Virtual Microphone Estimation with Virtual Microphone and Beamformer-Level Multi-Task Loss.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Target Speech Extraction with Pre-Trained Self-Supervised Learning Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Probing Self-Supervised Learning Models With Target Speech Extraction.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Meeting Recognition with Continuous Speech Separation and Transcription-Supported Diarization.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Discriminative Training of VBx Diarization.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Diffusion Model-Based MIMO Speech Denoising and Dereverberation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

How Does End-To-End Speech Recognition Training Impact Speech Enhancement Artifacts?

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Noise-Robust Zero-Shot Text-to-Speech Synthesis Conditioned on Self-Supervised Speech-Representation Model with Adapters.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Train Long and Test Long: Leveraging Full Document Contexts in Speech Processing.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

What Do Self-Supervised Speech and Speaker Models Learn? New Findings from a Cross Model Layer-Wise Analysis.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

2023

Neural Target Speech Extraction: An overview.

[BibT_eX]

[DOI]

IEEE Signal Process. Mag., May, 2023

Mask-Based Neural Beamforming for Moving Speakers With Self-Attention-Based Tracking.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2023

Segment-Less Continuous Speech Separation of Meetings: Training and Evaluation Criteria.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2023

SoundBeam: Target Sound Extraction Conditioned on Sound-Class Labels and Enrollment Clues for Increased Performance and Continuous Learning.

[BibT_eX]

[DOI]

Jorge Bennasar Vázquez

IEEE ACM Trans. Audio Speech Lang. Process., 2023

MeetEval: A Toolkit for Computation of Word Error Rates for Meeting Transcription Systems.

[BibT_eX]

[DOI]

CoRR, 2023

Streaming End-to-End Target-Speaker Automatic Speech Recognition and Activity Detection.

[BibT_eX]

[DOI]

IEEE Access, 2023

Downstream Task Agnostic Speech Enhancement with Self-Supervised Representation Loss.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Knowledge Distillation for Neural Transducer-based Target-Speaker ASR: Exploiting Parallel Mixture/Single-Talker Speech Data.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Transfer Learning from Pre-trained Language Models Improves End-to-End Speech Summarization.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Target Speech Extraction with Conditional Diffusion Model.

[BibT_eX]

[DOI]

Naoyuki Kamo

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Multi-Stream Extension of Variational Bayesian HMM Clustering (MS-VBx) for Combined End-to-End and Vector Clustering-based Diarization.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

SpeechGLUE: How Well Can Self-Supervised Speech Models Capture Linguistic Knowledge?

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Iterative Shallow Fusion of Backward Language Model for End-To-End Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

On Word Error Rate Definitions and Their Efficient Computation for Multi-Speaker Speech Recognition Systems.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Leveraging Large Text Corpora For End-To-End Speech Summarization.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Speech Summarization of Long Spoken Document: Improving Memory Efficiency of Speech/Text Encoders.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Espnet-Summ: Introducing a Novel Large Dataset, Toolkit, and a Cross-Corpora Evaluation of Speech Summarization Systems.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Summarize While Translating: Universal Model With Parallel Decoding for Summarization and Translation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022

Joint speaker diarization and speech recognition based on region proposal networks.

[BibT_eX]

[DOI]

Zili Huang

Leibny Paola García-Perera

Desh Raj

Sanjeev Khudanpur

Comput. Speech Lang., 2022

ConceptBeam: Concept Driven Target Speech Extraction.

[BibT_eX]

[DOI]

Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Analysis of Impact of Emotions on Target Speech Extraction and Speech Separation.

[BibT_eX]

[DOI]

Proceedings of the 17th International Workshop on Acoustic Signal Enhancement, 2022

Strategies to Improve Robustness of Target Speech Extraction to Enrollment Variations.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Streaming Target-Speaker ASR with Neural Transducer.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Revisiting joint decoding based multi-talker speech recognition with DNN acoustic model.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Utterance-by-utterance overlap-aware neural diarization with Graph-PIT.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

How bad are artifacts?: Analyzing the impact of speech enhancement errors on ASR.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Listen only to me! How well can target speech extraction handle false alarms?

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Learning to Enhance or Not: Neural Network-Based Switching of Enhanced and Observed Signals for Overlapping Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Lattice Rescoring Based on Large Ensemble of Complementary Neural Language Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

SA-SDR: A Novel Loss Function for Separation of Meeting Style Data.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Hybrid RNN-T/Attention-Based Streaming ASR with Triggered Chunkwise Attention and Dual Internal Language Model Integration.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Tight Integration Of Neural- And Clustering-Based Diarization Through Deep Unfolding Of Infinite Gaussian Mixture Model.

[BibT_eX]

[DOI]

Tomoharu Iwata

Proceedings of the IEEE International Conference on Acoustics, 2022

Integrating Multiple ASR Systems into NLP Backend with Attention Fusion.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

Far-Field Automatic Speech Recognition.

[BibT_eX]

[DOI]

Proc. IEEE, 2021

Integration of Variational Autoencoder and Spatial Clustering for Adaptive Multi-Channel Neural Speech Separation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Multimodal Attention Fusion for Target Speaker Extraction.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Dual-Path RNN for Long Recording Speech Separation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Auxiliary Loss Function for Target Speech Extraction and Recognition with Weak Supervision Based on Speaker Characteristics.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

PILOT: Introducing Transformers for Probabilistic Sound Event Localization.

[BibT_eX]

[DOI]

Christopher Schymura

Benedikt T. Bönninghoff

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Should We Always Separate?: Switching Between Enhanced and Observed Signals for Overlapping Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Graph-PIT: Generalized Permutation Invariant Training for Continuous Separation of Arbitrary Numbers of Speakers.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Streaming End-to-End Speech Recognition for Hybrid RNN-T/Attention Architecture.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Advances in Integration of End-to-End Neural and Clustering-Based Diarization for Real Conversational Speech.

[BibT_eX]

[DOI]

Naohiro Tawara

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Continuous Speech Separation Using Speaker Inventory for Long Recording.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Few-Shot Learning of New Sound Classes for Target Sound Extraction.

[BibT_eX]

[DOI]

Jorge Bennasar Vázquez

Tsubasa Ochiai

Shoko Araki

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

End-to-End Dereverberation, Beamforming, and Speech Recognition with Improved Numerical Stability and Advanced Frontend.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Data Fusion for Audiovisual Speaker Localization: Extending Dynamic Stream Weights to the Spatial Domain.

[BibT_eX]

[DOI]

Julio Wissing

Benedikt T. Boenninghoff

Proceedings of the IEEE International Conference on Acoustics, 2021

BLSTM-Based Confidence Estimation for End-to-End Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Neural Network-Based Virtual Microphone Estimator.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Dual-Path Modeling for Long Recording Speech Separation in Meetings.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Integrating End-to-End Neural and Clustering-Based Diarization: Getting the Best of Both Worlds.

[BibT_eX]

[DOI]

Naohiro Tawara

Proceedings of the IEEE International Conference on Acoustics, 2021

Speaker Activity Driven Neural Speech Extraction.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Convolutive Transfer Function Invariant SDR Training Criteria for Multi-Channel Reverberant Speech Separation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Attention-Based Multi-Hypothesis Fusion for Speech Summarization.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

Speeding Up Permutation Invariant Training for Source Separation.

[BibT_eX]

[DOI]

Proceedings of the 14th ITG Conference on Speech Communication, online, September 29, 2021

2020

Jointly Optimal Denoising, Dereverberation, and Source Separation.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2020

Continuous Speech Separation Using Speaker Inventory for Long Multi-talker Recording.

[BibT_eX]

[DOI]

CoRR, 2020

Convolutive Transfer Function Invariant SDR training criteria for Multi-Channel Reverberant Speech Separation.

[BibT_eX]

[DOI]

CoRR, 2020

Cognitive-Driven Convolutional Beamforming Using EEG-Based Auditory Attention Decoding.

[BibT_eX]

[DOI]

Proceedings of the 30th IEEE International Workshop on Machine Learning for Signal Processing, 2020

Language Model Data Augmentation Based on Text Domain Transfer.

[BibT_eX]

[DOI]

Atsunori Ogawa

Naohiro Tawara

Nelson Enrique Yalta Soplin

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Listen to What You Want: Neural Network-Based Universal Sound Selector.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Multi-Talker ASR for an Unknown Number of Sources: Joint Training of Source Counting, Separation and ASR.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Self-Distillation for Improving CTC-Transformer-Based ASR Systems.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Multi-Path RNN for Hierarchical Modeling of Long Sequential Data and its Application to Speaker Stream Separation.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Frame-Level Phoneme-Invariant Speaker Embedding for Text-Independent Speaker Recognition on Extremely Short Utterances.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

A Dynamic Stream Weight Backprop Kalman Filter for Audiovisual Speaker Tracking.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Beam-TasNet: Time-domain Audio Separation Network Meets Frequency-domain Beamformer.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

End-to-End Training of Time Domain Audio Separation and Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

DNN-supported Mask-based Convolutional Beamforming for Simultaneous Denoising, Dereverberation, and Source Separation.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Speech Enhancement Using Self-Adaptation and Multi-Head Self-Attention.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Improving Noise Robust Automatic Speech Recognition with Single-Channel Time-Domain Enhancement Network.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Tackling Real Noisy Reverberant Meetings with All-Neural Source Separation, Counting, and Diarization System.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Improving Speaker Discrimination of Target Speech Extraction With Time-Domain Speakerbeam.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Exploiting Attention-based Sequence-to-Sequence Architectures for Sound Event Localization.

[BibT_eX]

[DOI]

Proceedings of the 28th European Signal Processing Conference, 2020

2019

SpeakerBeam: Speaker Aware Neural Network for Target Speaker Extraction in Speech Mixtures.

[BibT_eX]

[DOI]

IEEE J. Sel. Top. Signal Process., 2019

Feature Based Domain Adaptation for Neural Network Language Models with Factorised Hidden Layers.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2019

Improved Deep Duel Model for Rescoring N-Best Speech Recognition List Using Backward LSTMLM and Ensemble Encoders.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Multimodal SpeakerBeam: Single Channel Target Speech Extraction with Audio-Visual Speaker Clues.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Improving Transformer-Based End-to-End Speech Recognition with Connectionist Temporal Classification and Language Model Integration.

[BibT_eX]

[DOI]

Shigeki Karita

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

End-to-End SpeakerBeam for Single Channel Target Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

A Unified Framework for Neural Speech Separation and Extraction.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

All-neural Online Source Separation, Counting, and Diarization for Meeting Analysis.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Mask-based MVDR Beamformer for Noisy Multisource Environments: Introduction of Time-varying Spatial Covariance Model.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Semi-supervised End-to-end Speech Recognition Using Text-to-speech and Autoencoders.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

A Unified Framework for Feature-based Domain Adaptation of Neural Network Language Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Compact Network for Speakerbeam Target Speaker Extraction.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Estimation of Sampling Frequency Mismatch between Distributed Asynchronous Microphones under Existence of Source Movements with Stationary Time Periods Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Projection Back onto Filtered Observations for Speech Separation with Distributed Microphone Array.

[BibT_eX]

[DOI]

Proceedings of the 8th IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing, 2019

2018

Context Adaptive Neural Network Based Acoustic Models for Rapid Adaptation.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2018

Online Integration of DNN-Based and Spatial Clustering-Based Mask Estimation for Robust MVDR Beamforming.

[BibT_eX]

[DOI]

Proceedings of the 16th International Workshop on Acoustic Signal Enhancement, 2018

Comparison of Reference Microphone Selection Algorithms for Distributed Microphone Array Based Speech Enhancement in Meeting Recognition Scenarios.

[BibT_eX]

[DOI]

Proceedings of the 16th International Workshop on Acoustic Signal Enhancement, 2018

Multi-task Learning with Augmentation Strategy for Acoustic-to-word Attention-based Encoder-decoder Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Semi-Supervised End-to-End Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Integrating Neural Network Based Beamforming and Weighted Prediction Error Dereverberation.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Auxiliary Feature Based Adaptation of End-to-end ASR Systems.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Optimization of Speaker-Aware Multichannel Speech Extraction with ASR Criterion.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Rescoring N-Best Speech Recognition List Based on One-on-One Hypothesis Comparison Using Encoder-Classifier Model.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Listening to Each Speaker One by One with Recurrent Selective Hearing Networks.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Sequence Training of Encoder-Decoder Model Using Policy Gradient for End-to-End Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Single Channel Target Speaker Extraction and Recognition with Speaker Beam.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Meeting Recognition with Asynchronous Distributed Microphone Array Using Block-Wise Refinement of Mask-Based MVDR Beamformer.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Progressive Neural Network-based Knowledge Transfer in Acoustic Models.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018

Feature-Based Learning Hidden Unit Contributions for Domain Adaptation of RNN-LMs.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018

Factorised Hidden Layer Based Domain Adaptation for Recurrent Neural Network Language Models.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018

2017

Online MVDR Beamformer Based on Complex Gaussian Mixture Model With Spatial Prior for Noise Robust ASR.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2017

Speaker-Aware Neural Network Based Beamformer for Speaker Extraction in Speech Mixtures.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Uncertainty Decoding with Adaptive Sampling for Noise Robust DNN-Based Acoustic Modeling.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Unfolded Deep Recurrent Convolutional Neural Network with Jump Ahead Connections for Acoustic Modeling.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Improved Example-Based Speech Enhancement by Using Deep Neural Network Acoustic Model for Noise Robust Example Search.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Neural Network-Based Spectrum Estimation for Online WPE Dereverberation.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Forward-Backward Convolutional LSTM for Acoustic Modeling.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Deep Clustering-Based Beamforming for Separation with Unknown Number of Sources.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Feedback connection for deep neural network-based acoustic modeling.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Cumulative moving averaged bottleneck speaker vectors for online speaker adaptation of CNN-based acoustic models.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Deep mixture density network for statistical model-based feature enhancement.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Probabilistic spatial dictionary based online adaptive beamforming for meeting recognition in noisy and reverberant environments.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Online environmental adaptation of CNN-based acoustic models using spatial diffuseness features.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Online meeting recognition in noisy environments with time-frequency mask based MVDR beamforming.

[BibT_eX]

[DOI]

Proceedings of the Hands-free Speech Communications and Microphone Arrays, 2017

Learning speaker representation for neural network based multichannel speaker extraction.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

Adversarial training for data-driven speech enhancement without parallel corpus.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

Meeting recognition with asynchronous distributed microphone array.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

Exploiting imbalanced textual and acoustic data for training prosodically-enhanced RNNLMs.

[BibT_eX]

[DOI]

Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

Toolkits for Robust Speech Processing.

[BibT_eX]

[DOI]

Proceedings of the New Era for Robust Speech Recognition, Exploiting Deep Learning., 2017

Preliminaries.

[BibT_eX]

[DOI]

Proceedings of the New Era for Robust Speech Recognition, Exploiting Deep Learning., 2017

The REVERB Challenge: A Benchmark Task for Reverberation-Robust ASR Techniques.

[BibT_eX]

[DOI]

Proceedings of the New Era for Robust Speech Recognition, Exploiting Deep Learning., 2017

Training Data Augmentation and Data Selection.

[BibT_eX]

[DOI]

Proceedings of the New Era for Robust Speech Recognition, Exploiting Deep Learning., 2017

Multichannel Speech Enhancement Approaches to DNN-Based Far-Field Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the New Era for Robust Speech Recognition, Exploiting Deep Learning., 2017

2016

A summary of the REVERB challenge: state-of-the-art and remaining challenges in reverberant speech processing research.

[BibT_eX]

[DOI]

EURASIP J. Adv. Signal Process., 2016

Differenced maximum mutual information criterion for robust unsupervised acoustic model adaptation.

[BibT_eX]

[DOI]

Comput. Speech Lang., 2016

Data Selection by Sequence Summarizing Neural Network in Mismatch Condition Training.

[BibT_eX]

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Factorized Linear Input Network for Acoustic Model Adaptation in Noisy Conditions.

[BibT_eX]

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Robust Example Search Using Bottleneck Features for Example-Based Speech Enhancement.

[BibT_eX]

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Context Adaptive Neural Network for Rapid Adaptation of Deep CNN Based Acoustic Models.

[BibT_eX]

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Joint acoustic factor learning for robust deep neural network based automatic speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Context adaptive deep neural networks for fast acoustic model adaptation in noisy conditions.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2015

Strategies for distant speech recognitionin reverberant environments.

[BibT_eX]

[DOI]

EURASIP J. Adv. Signal Process., 2015

Robust i-vector extraction for neural network adaptation in noisy environment.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Text-informed speech enhancement with deep neural networks.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

WFST-based structural classification integrating dnn acoustic features and RNN language features for speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Context adaptive deep neural networks for fast acoustic model adaptation.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Exploring multi-channel features for denoising-autoencoder-based speech enhancement.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

The NTT CHiME-3 system: Advances in speech enhancement and recognition for mobile multi-microphone devices.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

2014

Location Feature Integration for Clustering-Based Speech Separation in Distributed Microphone Arrays.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2014

Defeating reverberation: Advanced dereverberation and recognition techniques for hands-free speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE Global Conference on Signal and Information Processing, 2014

2013

Dominance Based Integration of Spatial and Spectral Features for Speech Enhancement.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2013

Cluster-based dynamic variance adaptation for interconnecting speech enhancement pre-processor and speech recognizer.

[BibT_eX]

[DOI]

Comput. Speech Lang., 2013

Speech recognition in living rooms: Integrated speech enhancement and recognition system based on spatial, spectral and temporal modeling of sounds.

[BibT_eX]

[DOI]

Comput. Speech Lang., 2013

The reverb challenge: Acommon evaluation framework for dereverberation and recognition of reverberant speech.

[BibT_eX]

[DOI]

Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2013

Conditional emission densities for combining speech enhancement and recognition systems.

[BibT_eX]

[DOI]

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Is speech enhancement pre-processing still relevant when using deep neural networks for acoustic modeling?

[BibT_eX]

[DOI]

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Formulation of the REMOS concept from an uncertainty decoding perspective.

[BibT_eX]

[DOI]

Proceedings of the 18th International Conference on Digital Signal Processing, 2013

Feature space variational Bayesian linear regression and its combination with model space VBLR.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2013

Unsupervised discriminative adaptation using differenced maximum mutual information based linear regression.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2013

2012

Making Machines Understand Us in Reverberant Rooms: Robustness Against Reverberation for Automatic Speech Recognition.

[BibT_eX]

[DOI]

IEEE Signal Process. Mag., 2012

Noise Power Spectral Density Tracking: A Maximum Likelihood Perspective.

[BibT_eX]

[DOI]

IEEE Signal Process. Lett., 2012

Distributed microphone array processing for speech source separation with classifier fusion.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Workshop on Machine Learning for Signal Processing, 2012

Dynamic variance adaptation using differenced maximum mutual information.

[BibT_eX]

[DOI]

Proceedings of the 2012 Symposium on Machine Learning in Speech and Language Processing, 2012

Example-based speech enhancement with joint utilization of spatial, spectral & temporal cues of speech and noise.

[BibT_eX]

[DOI]

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

LogMax observation model with MFCC-based spectral prior for reduction of highly nonstationary ambient noise.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Discriminative feature transforms using differenced maximum mutual information.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Survey on approaches to speech recognition in reverberant environments.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2012

2011

A Multichannel Feature-Based Processing for Robust Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Reduction of Highly Nonstationary Ambient Noise by Integrating Spectral and Locational Characteristics of Speech and Noise for Robust ASR.

[BibT_eX]

[DOI]

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Single Channel Dereverberation Using Example-Based Speech Enhancement with Uncertainty Decoding Technique.

[BibT_eX]

[DOI]

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Variance Compensation for Recognition of Reverberant Speech with Dereverberation Preprocessing.

[BibT_eX]

[DOI]

Proceedings of the Robust Speech Recognition of Uncertain or Missing Data, 2011

2010

Inverse Filtering for Speech Dereverberation Without the Use of Room Acoustics Information.

[BibT_eX]

[DOI]

Proceedings of the Speech Dereverberation., 2010

2009

Suppression of Late Reverberation Effect on Speech Signal Using Long-Term Multiple-step Linear Prediction.

[BibT_eX]

[DOI]

IEEE Trans. Speech Audio Process., 2009

Static and Dynamic Variance Compensation for Recognition of Reverberant Speech With Dereverberation Preprocessing.

[BibT_eX]

[DOI]

IEEE Trans. Speech Audio Process., 2009

2008

Speech Dereverberation Based on Maximum-Likelihood Estimation With Time-Varying Gaussian Source Model.

[BibT_eX]

[DOI]

IEEE Trans. Speech Audio Process., 2008

Calculating Inverse Filters for Speech Dereverberation.

[BibT_eX]

[DOI]

IEICE Trans. Fundam. Electron. Commun. Comput. Sci., 2008

Missing feature speech recognition in a meeting situation with maximum SNR beamforming.

[BibT_eX]

[DOI]

Proceedings of the International Symposium on Circuits and Systems (ISCAS 2008), 2008

Combined static and dynamic variance adaptation for efficient interconnection of speech enhancement pre-processor with speech recognizer.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2008

2007

Dereverberation and Denoising Using Multichannel Linear Prediction.

[BibT_eX]

[DOI]

IEEE Trans. Speech Audio Process., 2007

Precise Dereverberation Using Multichannel Linear Prediction.

[BibT_eX]

[DOI]

IEEE Trans. Speech Audio Process., 2007

Inverse Filtering for Speech Dereverberation Less Sensitive to Noise and Room Transfer Function Fluctuations.

[BibT_eX]

[DOI]

EURASIP J. Adv. Signal Process., 2007

Robust blind dereverberation of speech signals based on characteristics of short-time speech segments.

[BibT_eX]

[DOI]

Proceedings of the International Symposium on Circuits and Systems (ISCAS 2007), 2007

Multi-step linear prediction based speech dereverberation in noisy reverberant environment.

[BibT_eX]

[DOI]

Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Study on Speech Dereverberation with Autocorrelation Codebook.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2007

2006

On a Blind Speech Dereverberation Algorithm Using Multi-Channel Linear Prediction.

[BibT_eX]

[DOI]

IEICE Trans. Fundam. Electron. Commun. Comput. Sci., 2006

On the Use of Lime Dereverberation Algorithm in an Acoustic Environment With a Noise Source.

[BibT_eX]

[DOI]

Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

On robust inverse filter design for room transfer function fluctuations.

[BibT_eX]

[DOI]

Proceedings of the 14th European Signal Processing Conference, 2006

2005

Improved blind dereverberation performance by using spatial information.

[BibT_eX]

[DOI]

Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

Blind Dereverberation based on Estimates of Signal Transmission Channels without Precise Information of Channel Order.

[BibT_eX]

[DOI]

Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

2004

Dereverberation of speech signals based on linear prediction.

[BibT_eX]

[DOI]