We stand with Ukraine

We stand with Ukraine

Tsubasa Ochiai

Orcid: 0000-0002-2519-2032

According to our database¹, Tsubasa Ochiai authored at least 72 papers between 2014 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

Frontend Token Enhancement for Token-Based Speech Recognition.

[DOI]

Takanori Ashihara

,

Shota Horiguchi

,

,

,

CoRR, February, 2026

Microphone array geometry-independent multi-talker distant ASR: NTT system for DASR task of the CHiME-8 challenge.

[DOI]

,

,

,

,

,

Rintaro Ikeshita

,

Takafumi Moriya

,

Shota Horiguchi

,

,

,

,

Takanori Ashihara

,

,

,

,

Tomohiro Nakatani

,

,

Comput. Speech Lang., 2026

2025

Generic Speech Enhancement with Self-Supervised Representation Space Loss.

[DOI]

,

,

,

Takafumi Moriya

,

Takanori Ashihara

,

CoRR, July, 2025

Microphone Array Signal Processing and Deep Learning for Speech Enhancement.

[DOI]

Reinhold Haeb-Umbach

,

Tomohiro Nakatani

,

,

Christoph Boeddeker

,

CoRR, January, 2025

Real-time TSE demonstration via SoundBeam with KD.

[DOI]

,

,

Takafumi Moriya

,

,

,

,

Masahiro Yasuda

,

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

MOVER: Combining Multiple Meeting Recognition Systems.

[DOI]

,

,

,

Tomohiro Nakatani

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Analysis of Semantic and Acoustic Token Variability Across Speech, Music, and Audio Domains.

[DOI]

Takanori Ashihara

,

,

,

,

Shota Horiguchi

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

TS-SUPERB: A Target Speech Processing Benchmark for Speech Self-Supervised Learning Models.

[DOI]

,

Takanori Ashihara

,

,

,

,

,

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

SoundBeam meets M2D: Target Sound Extraction with Audio Foundation Model.

[DOI]

Carlos Hernandez-Olivan

,

,

,

Daisuke Niizumi

,

,

Tomohiro Nakatani

,

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

2024

Microphone Array Signal Processing and Deep Learning for Speech Enhancement: Combining model-based and data-driven approaches to parameter estimation and filtering [Special Issue On Model-Based and Data-Driven Audio Signal Processing].

[DOI]

Reinhold Häb-Umbach

,

Tomohiro Nakatani

,

,

Christoph Boeddeker

,

IEEE Signal Process. Mag., November, 2024

Module-Based End-to-End Distant Speech Processing: A case study of far-field automatic speech recognition [Special Issue On Model-Based and Data-Driven Audio Signal Processing].

[DOI]

,

Shinji Watanabe

,

,

,

,

IEEE Signal Process. Mag., November, 2024

Rethinking Processing Distortions: Disentangling the Impact of Speech Enhancement Errors on Speech Recognition Performance.

[DOI]

,

,

,

Rintaro Ikeshita

,

,

,

Shigeru Katagiri

IEEE ACM Trans. Audio Speech Lang. Process., 2024

Investigation of Speaker Representation for Target-Speaker Speech Processing.

[DOI]

Takanori Ashihara

,

Takafumi Moriya

,

Shota Horiguchi

,

,

,

,

,

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

Interaural Time Difference Loss for Binaural Target Sound Extraction.

[DOI]

Carlos Hernandez-Olivan

,

,

,

,

Tomohiro Nakatani

,

Proceedings of the 18th International Workshop on Acoustic Signal Enhancement, 2024

Array Geometry-Robust Attention-Based Neural Beamformer for Moving Speakers.

[DOI]

,

,

,

Tomohiro Nakatani

,

,

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

SpeakerBeam-SS: Real-time Target Speaker Extraction with Lightweight Conv-TasNet and State Space Modeling.

[DOI]

,

Takafumi Moriya

,

,

Shota Horiguchi

,

,

Takanori Ashihara

,

,

Kentaro Shinayama

,

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Online Target Sound Extraction with Knowledge Distillation from Partially Non-Causal Teacher.

[DOI]

,

,

,

Masahiro Yasuda

,

Shoichiro Saito

,

,

Proceedings of the IEEE International Conference on Acoustics, 2024

Neural Network-Based Virtual Microphone Estimation with Virtual Microphone and Beamformer-Level Multi-Task Loss.

[DOI]

,

,

,

Tomohiro Nakatani

,

Rintaro Ikeshita

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2024

Target Speech Extraction with Pre-Trained Self-Supervised Learning Models.

[DOI]

,

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2024

Probing Self-Supervised Learning Models With Target Speech Extraction.

[DOI]

,

,

,

,

Takanori Ashihara

,

,

Proceedings of the IEEE International Conference on Acoustics, 2024

How Does End-To-End Speech Recognition Training Impact Speech Enhancement Artifacts?

[DOI]

,

,

,

Rintaro Ikeshita

,

,

,

Shigeru Katagiri

Proceedings of the IEEE International Conference on Acoustics, 2024

2023

Neural Target Speech Extraction: An overview.

[DOI]

Katerina Zmolíková

,

,

,

Keisuke Kinoshita

,

,

IEEE Signal Process. Mag., May, 2023

Mask-Based Neural Beamforming for Moving Speakers With Self-Attention-Based Tracking.

[DOI]

,

,

Tomohiro Nakatani

,

IEEE ACM Trans. Audio Speech Lang. Process., 2023

SoundBeam: Target Sound Extraction Conditioned on Sound-Class Labels and Enrollment Clues for Increased Performance and Continuous Learning.

[DOI]

,

Jorge Bennasar Vázquez

,

,

Keisuke Kinoshita

,

Yasunori Ohishi

,

IEEE ACM Trans. Audio Speech Lang. Process., 2023

Streaming End-to-End Target-Speaker Automatic Speech Recognition and Activity Detection.

[DOI]

Takafumi Moriya

,

,

,

,

Takahiro Shinozaki

IEEE Access, 2023

Downstream Task Agnostic Speech Enhancement with Self-Supervised Representation Loss.

[DOI]

,

,

,

,

Takafumi Moriya

,

Takanori Ashihara

,

Kentaro Shinayama

,

,

,

Tomohiro Tanaka

,

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Knowledge Distillation for Neural Transducer-based Target-Speaker ASR: Exploiting Parallel Mixture/Single-Talker Speech Data.

[DOI]

Takafumi Moriya

,

,

,

,

Takanori Ashihara

,

,

Tomohiro Tanaka

,

,

,

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Impact of Residual Noise and Artifacts in Speech Enhancement Errors on Intelligibility of Human and Machine.

[DOI]

,

,

,

,

,

Tomohiro Nakatani

,

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

2022

ConceptBeam: Concept Driven Target Speech Extraction.

[DOI]

Yasunori Ohishi

,

,

,

,

,

Daisuke Niizumi

,

,

,

Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Analysis of Impact of Emotions on Target Speech Extraction and Speech Separation.

[DOI]

,

Katerina Zmolíková

,

,

,

,

Ladislav Mosner

,

Jan Honza Cernocký

Proceedings of the 17th International Workshop on Acoustic Signal Enhancement, 2022

Strategies to Improve Robustness of Target Speech Extraction to Enrollment Variations.

[DOI]

,

,

,

Keisuke Kinoshita

,

Takafumi Moriya

,

Naoki Makishima

,

,

Tomohiro Tanaka

,

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Streaming Target-Speaker ASR with Neural Transducer.

[DOI]

Takafumi Moriya

,

,

,

,

Takahiro Shinozaki

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Revisiting joint decoding based multi-talker speech recognition with DNN acoustic model.

[DOI]

,

Katerina Zmolíková

,

,

,

,

,

,

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

How bad are artifacts?: Analyzing the impact of speech enhancement errors on ASR.

[DOI]

,

,

,

Rintaro Ikeshita

,

,

,

Shigeru Katagiri

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Listen only to me! How well can target speech extraction handle false alarms?

[DOI]

,

Keisuke Kinoshita

,

,

Katerina Zmolíková

,

,

Tomohiro Nakatani

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Learning to Enhance or Not: Neural Network-Based Switching of Enhanced and Observed Signals for Overlapping Speech Recognition.

[DOI]

,

,

,

Keisuke Kinoshita

,

,

Takafumi Moriya

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

Multimodal Attention Fusion for Target Speaker Extraction.

[DOI]

,

,

Keisuke Kinoshita

,

,

Tomohiro Nakatani

,

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

PILOT: Introducing Transformers for Probabilistic Sound Event Localization.

[DOI]

Christopher Schymura

,

Benedikt T. Bönninghoff

,

,

,

Keisuke Kinoshita

,

Tomohiro Nakatani

,

,

Dorothea Kolossa

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Should We Always Separate?: Switching Between Enhanced and Observed Signals for Overlapping Speech Recognition.

[DOI]

,

,

,

Keisuke Kinoshita

,

Takafumi Moriya

,

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Streaming End-to-End Speech Recognition for Hybrid RNN-T/Attention Architecture.

[DOI]

Takafumi Moriya

,

Tomohiro Tanaka

,

Takanori Ashihara

,

,

,

,

,

,

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Few-Shot Learning of New Sound Classes for Target Sound Extraction.

[DOI]

,

Jorge Bennasar Vázquez

,

,

Keisuke Kinoshita

,

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

End-to-End Dereverberation, Beamforming, and Speech Recognition with Improved Numerical Stability and Advanced Frontend.

[DOI]

,

Christoph Böddeker

,

Shinji Watanabe

,

Tomohiro Nakatani

,

,

Keisuke Kinoshita

,

,

,

Reinhold Haeb-Umbach

,

Proceedings of the IEEE International Conference on Acoustics, 2021

Data Fusion for Audiovisual Speaker Localization: Extending Dynamic Stream Weights to the Spatial Domain.

[DOI]

,

Benedikt T. Boenninghoff

,

Dorothea Kolossa

,

,

,

Keisuke Kinoshita

,

Tomohiro Nakatani

,

,

Christopher Schymura

Proceedings of the IEEE International Conference on Acoustics, 2021

Neural Network-Based Virtual Microphone Estimator.

[DOI]

,

,

Tomohiro Nakatani

,

Rintaro Ikeshita

,

Keisuke Kinoshita

,

Proceedings of the IEEE International Conference on Acoustics, 2021

Simpleflat: A Simple Whole-Network Pre-Training Approach for RNN Transducer-Based End-to-End Speech Recognition.

[DOI]

Takafumi Moriya

,

Takanori Ashihara

,

Tomohiro Tanaka

,

,

,

,

,

,

Yusuke Shinohara

Proceedings of the IEEE International Conference on Acoustics, 2021

Speaker Activity Driven Neural Speech Extraction.

[DOI]

,

Katerina Zmolíková

,

,

Keisuke Kinoshita

,

Tomohiro Nakatani

Proceedings of the IEEE International Conference on Acoustics, 2021

Convolutive Transfer Function Invariant SDR Training Criteria for Multi-Channel Reverberant Speech Separation.

[DOI]

Christoph Böddeker

,

,

Tomohiro Nakatani

,

Keisuke Kinoshita

,

,

,

,

,

Reinhold Haeb-Umbach

Proceedings of the IEEE International Conference on Acoustics, 2021

2020

Convolutive Transfer Function Invariant SDR training criteria for Multi-Channel Reverberant Speech Separation.

[DOI]

Christoph Böddeker

,

,

Tomohiro Nakatani

,

Keisuke Kinoshita

,

,

,

,

,

Shinji Watanabe

,

Reinhold Haeb-Umbach

CoRR, 2020

Listen to What You Want: Neural Network-Based Universal Sound Selector.

[DOI]

,

,

,

,

Keisuke Kinoshita

,

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Self-Distillation for Improving CTC-Transformer-Based ASR Systems.

[DOI]

Takafumi Moriya

,

,

,

,

Tomohiro Tanaka

,

Takanori Ashihara

,

,

Yusuke Shinohara

,

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

A Dynamic Stream Weight Backprop Kalman Filter for Audiovisual Speaker Tracking.

[DOI]

Christopher Schymura

,

,

,

Keisuke Kinoshita

,

Tomohiro Nakatani

,

,

Dorothea Kolossa

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Beam-TasNet: Time-domain Audio Separation Network Meets Frequency-domain Beamformer.

[DOI]

,

,

Rintaro Ikeshita

,

Keisuke Kinoshita

,

Tomohiro Nakatani

,

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

DNN-supported Mask-based Convolutional Beamforming for Simultaneous Denoising, Dereverberation, and Source Separation.

[DOI]

Tomohiro Nakatani

,

,

,

Keisuke Kinoshita

,

Rintaro Ikeshita

,

,

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Improving Noise Robust Automatic Speech Recognition with Single-Channel Time-Domain Enhancement Network.

[DOI]

Keisuke Kinoshita

,

,

,

Tomohiro Nakatani

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Improving Speaker Discrimination of Target Speech Extraction With Time-Domain Speakerbeam.

[DOI]

,

,

Katerina Zmolíková

,

Keisuke Kinoshita

,

,

Tomohiro Nakatani

,

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Exploiting Attention-based Sequence-to-Sequence Architectures for Sound Event Localization.

[DOI]

Christopher Schymura

,

,

,

Keisuke Kinoshita

,

Tomohiro Nakatani

,

,

Dorothea Kolossa

Proceedings of the 28th European Signal Processing Conference, 2020

2019

SpeakerBeam: Speaker Aware Neural Network for Target Speaker Extraction in Speech Mixtures.

[DOI]

Katerina Zmolíková

,

,

Keisuke Kinoshita

,

,

Tomohiro Nakatani

,

,

IEEE J. Sel. Top. Signal Process., 2019

Multimodal SpeakerBeam: Single Channel Target Speech Extraction with Audio-Visual Speaker Clues.

[DOI]

,

,

Keisuke Kinoshita

,

,

Tomohiro Nakatani

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

End-to-End SpeakerBeam for Single Channel Target Speech Recognition.

[DOI]

,

Shinji Watanabe

,

,

Keisuke Kinoshita

,

,

,

Tomohiro Nakatani

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

A Unified Framework for Neural Speech Separation and Extraction.

[DOI]

,

,

Keisuke Kinoshita

,

,

Tomohiro Nakatani

Proceedings of the IEEE International Conference on Acoustics, 2019

Compact Network for Speakerbeam Target Speaker Extraction.

[DOI]

,

Katerina Zmolíková

,

,

Keisuke Kinoshita

,

,

Tomohiro Nakatani

Proceedings of the IEEE International Conference on Acoustics, 2019

2018

ESPnet: End-to-End Speech Processing Toolkit.

[DOI]

Shinji Watanabe

,

,

,

,

,

,

Nelson Enrique Yalta Soplin

,

,

Matthew Wiesner

,

,

Adithya Renduchintala

,

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Speaker Adaptation for Multichannel End-to-End Speech Recognition.

[DOI]

,

Shinji Watanabe

,

Shigeru Katagiri

,

,

John R. Hershey

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017

Unified Architecture for Multichannel End-to-End Speech Recognition With Neural Beamforming.

[DOI]

,

Shinji Watanabe

,

,

John R. Hershey

,

IEEE J. Sel. Top. Signal Process., 2017

Does speech enhancement work with end-to-end ASR objectives?: Experimental analysis of multichannel end-to-end ASR.

[DOI]

,

Shinji Watanabe

,

Shigeru Katagiri

Proceedings of the 27th IEEE International Workshop on Machine Learning for Signal Processing, 2017

Multichannel End-to-end Speech Recognition.

[DOI]

,

Shinji Watanabe

,

,

John R. Hershey

Proceedings of the 34th International Conference on Machine Learning, 2017

Automatic node selection for Deep Neural Networks using Group Lasso regularization.

[DOI]

,

Shigeki Matsuda

,

Hideyuki Watanabe

,

Shigeru Katagiri

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Cumulative moving averaged bottleneck speaker vectors for online speaker adaptation of CNN-based acoustic models.

[DOI]

,

,

Keisuke Kinoshita

,

,

,

Shigeru Katagiri

,

Tomohiro Nakatani

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

2016

Speaker Adaptive Training Localizing Speaker Modules in DNN for Hybrid DNN-HMM Speech Recognizers.

[DOI]

,

Shigeki Matsuda

,

Hideyuki Watanabe

,

,

,

,

Shigeru Katagiri

IEICE Trans. Inf. Syst., 2016

Bottleneck linear transformation network adaptation for speaker adaptive training-based hybrid DNN-HMM speech recognizer.

[DOI]

,

Shigeki Matsuda

,

Hideyuki Watanabe

,

,

,

Shigeru Katagiri

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2015

Speaker adaptive training for deep neural networks embedding linear transformation networks.

[DOI]

,

Shigeki Matsuda

,

Hideyuki Watanabe

,

,

,

Shigeru Katagiri

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

2014

Speaker Adaptive Training using Deep Neural Networks.

[DOI]

,

Shigeki Matsuda

,

,

,

Shigeru Katagiri

Proceedings of the IEEE International Conference on Acoustics, 2014

Loading...