We stand with Ukraine

We stand with Ukraine

Hiroshi Sato

Affiliations:

NTT Corporation, NTT Media Intelligence Laboratories, Japan

According to our database¹, Hiroshi Sato authored at least 48 papers between 2012 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

Microphone array geometry-independent multi-talker distant ASR: NTT system for DASR task of the CHiME-8 challenge.

[DOI]

,

,

,

,

,

Rintaro Ikeshita

,

Takafumi Moriya

,

Shota Horiguchi

,

,

,

,

Takanori Ashihara

,

,

,

,

Tomohiro Nakatani

,

,

Comput. Speech Lang., 2026

2025

Generic Speech Enhancement with Self-Supervised Representation Space Loss.

[DOI]

,

,

,

Takafumi Moriya

,

Takanori Ashihara

,

CoRR, July, 2025

Real-time TSE demonstration via SoundBeam with KD.

[DOI]

,

,

Takafumi Moriya

,

,

,

,

Masahiro Yasuda

,

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Attention-Free Dual-Mode ASR with Latency-Controlled Selective State Spaces.

[DOI]

Takafumi Moriya

,

,

,

,

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Alignment-Free Training for Transducer-based Multi-Talker ASR.

[DOI]

Takafumi Moriya

,

Shota Horiguchi

,

,

,

Takanori Ashihara

,

,

,

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Guided Speaker Embedding.

[DOI]

Shota Horiguchi

,

Takafumi Moriya

,

,

Takanori Ashihara

,

,

,

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

All-in-One ASR: Unifying Encoder-Decoder Models of CTC, Attention, and Transducer in Dual-Mode ASR.

[DOI]

Takafumi Moriya

,

,

Tomohiro Tanaka

,

,

,

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025

2024

Rethinking Processing Distortions: Disentangling the Impact of Speech Enhancement Errors on Speech Recognition Performance.

[DOI]

,

,

,

Rintaro Ikeshita

,

,

,

Shigeru Katagiri

IEEE ACM Trans. Audio Speech Lang. Process., 2024

Recursive Attentive Pooling For Extracting Speaker Embeddings From Multi-Speaker Recordings.

[DOI]

Shota Horiguchi

,

,

Takafumi Moriya

,

Takanori Ashihara

,

,

,

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

Investigation of Speaker Representation for Target-Speaker Speech Processing.

[DOI]

Takanori Ashihara

,

Takafumi Moriya

,

Shota Horiguchi

,

,

,

,

,

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

SpeakerBeam-SS: Real-time Target Speaker Extraction with Lightweight Conv-TasNet and State Space Modeling.

[DOI]

,

Takafumi Moriya

,

,

Shota Horiguchi

,

,

Takanori Ashihara

,

,

Kentaro Shinayama

,

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Boosting Hybrid Autoregressive Transducer-based ASR with Internal Acoustic Model Training and Dual Blank Thresholding.

[DOI]

Takafumi Moriya

,

Takanori Ashihara

,

,

,

,

,

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

How Does End-To-End Speech Recognition Training Impact Speech Enhancement Artifacts?

[DOI]

,

,

,

Rintaro Ikeshita

,

,

,

Shigeru Katagiri

Proceedings of the IEEE International Conference on Acoustics, 2024

Noise-Robust Zero-Shot Text-to-Speech Synthesis Conditioned on Self-Supervised Speech-Representation Model with Adapters.

[DOI]

,

,

Takanori Ashihara

,

Hiroki Kanagawa

,

,

Takafumi Moriya

,

Proceedings of the IEEE International Conference on Acoustics, 2024

2023

Streaming End-to-End Target-Speaker Automatic Speech Recognition and Activity Detection.

[DOI]

Takafumi Moriya

,

,

,

,

Takahiro Shinozaki

IEEE Access, 2023

Downstream Task Agnostic Speech Enhancement with Self-Supervised Representation Loss.

[DOI]

,

,

,

,

Takafumi Moriya

,

Takanori Ashihara

,

Kentaro Shinayama

,

,

,

Tomohiro Tanaka

,

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Knowledge Distillation for Neural Transducer-based Target-Speaker ASR: Exploiting Parallel Mixture/Single-Talker Speech Data.

[DOI]

Takafumi Moriya

,

,

,

,

Takanori Ashihara

,

,

Tomohiro Tanaka

,

,

,

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

End-to-End Joint Target and Non-Target Speakers ASR.

[DOI]

,

Naoki Makishima

,

,

Yoshihiko Yamazaki

,

,

,

,

,

,

Tomohiro Tanaka

,

Akihiko Takashima

,

,

Takafumi Moriya

,

,

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Leveraging Language Embeddings for Cross-Lingual Self-Supervised Speech Representation Learning.

[DOI]

Tomohiro Tanaka

,

,

,

,

,

Takanori Ashihara

,

,

Takafumi Moriya

Proceedings of the IEEE International Conference on Acoustics, 2023

Improving Scheduled Sampling for Neural Transducer-Based ASR.

[DOI]

Takafumi Moriya

,

Takanori Ashihara

,

,

,

Tomohiro Tanaka

,

Proceedings of the IEEE International Conference on Acoustics, 2023

2022

On the Use of Modality-Specific Large-Scale Pre-Trained Encoders for Multimodal Sentiment Analysis.

[DOI]

,

,

Akihiko Takashima

,

,

Naoki Makishima

,

,

Takafumi Moriya

,

Takanori Ashihara

,

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Domain Adversarial Self-Supervised Speech Representation Learning for Improving Unknown Domain Downstream Tasks.

[DOI]

Tomohiro Tanaka

,

,

,

,

,

Takanori Ashihara

,

Takafumi Moriya

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Strategies to Improve Robustness of Target Speech Extraction to Enrollment Variations.

[DOI]

,

,

,

Keisuke Kinoshita

,

Takafumi Moriya

,

Naoki Makishima

,

,

Tomohiro Tanaka

,

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Streaming Target-Speaker ASR with Neural Transducer.

[DOI]

Takafumi Moriya

,

,

,

,

Takahiro Shinozaki

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

End-to-End Joint Modeling of Conversation History-Dependent and Independent ASR Systems with Multi-History Training.

[DOI]

,

Yoshihiro Yamazaki

,

,

Naoki Makishima

,

,

,

,

Tomohiro Tanaka

,

Akihiko Takashima

,

,

,

Takafumi Moriya

,

,

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

How bad are artifacts?: Analyzing the impact of speech enhancement errors on ASR.

[DOI]

,

,

,

Rintaro Ikeshita

,

,

,

Shigeru Katagiri

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Listen only to me! How well can target speech extraction handle false alarms?

[DOI]

,

Keisuke Kinoshita

,

,

Katerina Zmolíková

,

,

Tomohiro Nakatani

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Learning to Enhance or Not: Neural Network-Based Switching of Enhanced and Observed Signals for Overlapping Speech Recognition.

[DOI]

,

,

,

Keisuke Kinoshita

,

,

Takafumi Moriya

Proceedings of the IEEE International Conference on Acoustics, 2022

Hybrid RNN-T/Attention-Based Streaming ASR with Triggered Chunkwise Attention and Dual Internal Language Model Integration.

[DOI]

Takafumi Moriya

,

Takanori Ashihara

,

,

,

Tomohiro Tanaka

,

,

,

,

Takahiro Shinozaki

Proceedings of the IEEE International Conference on Acoustics, 2022

Customer Satisfaction Estimation Using Unsupervised Representation Learning with Multi-Format Prediction Loss.

[DOI]

,

,

,

,

Naoki Makishima

,

Takafumi Moriya

,

Takanori Ashihara

,

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

Multimodal Attention Fusion for Target Speaker Extraction.

[DOI]

,

,

Keisuke Kinoshita

,

,

Tomohiro Nakatani

,

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Should We Always Separate?: Switching Between Enhanced and Observed Signals for Overlapping Speech Recognition.

[DOI]

,

,

,

Keisuke Kinoshita

,

Takafumi Moriya

,

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Streaming End-to-End Speech Recognition for Hybrid RNN-T/Attention Architecture.

[DOI]

Takafumi Moriya

,

Tomohiro Tanaka

,

Takanori Ashihara

,

,

,

,

,

,

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Simpleflat: A Simple Whole-Network Pre-Training Approach for RNN Transducer-Based End-to-End Speech Recognition.

[DOI]

Takafumi Moriya

,

Takanori Ashihara

,

Tomohiro Tanaka

,

,

,

,

,

,

Yusuke Shinohara

Proceedings of the IEEE International Conference on Acoustics, 2021

Speech Emotion Recognition Based on Listener Adaptive Models.

[DOI]

,

,

,

Takafumi Moriya

,

Takanori Ashihara

,

,

Proceedings of the IEEE International Conference on Acoustics, 2021

2020

Self-Distillation for Improving CTC-Transformer-Based ASR Systems.

[DOI]

Takafumi Moriya

,

,

,

,

Tomohiro Tanaka

,

Takanori Ashihara

,

,

Yusuke Shinohara

,

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Distilling Attention Weights for CTC-Based ASR Systems.

[DOI]

Takafumi Moriya

,

,

Tomohiro Tanaka

,

Takanori Ashihara

,

,

Yusuke Shinohara

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

End-to-End Automatic Speech Recognition with a Reconstruction Criterion Using Speech-to-Text and Text-to-Speech Encoder-Decoders.

[DOI]

,

,

Tomohiro Tanaka

,

Takafumi Moriya

,

,

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Neural Whispered Speech Detection with Imbalanced Learning.

[DOI]

Takanori Ashihara

,

Yusuke Shinohara

,

,

Takafumi Moriya

,

,

Takaaki Fukutomi

,

Yoshikazu Yamaguchi

,

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Revisiting Dynamic Adjustment of Language Model Scaling Factor for Automatic Speech Recognition.

[DOI]

,

Takafumi Moriya

,

Yusuke Shinohara

,

,

Takaaki Fukutomi

,

,

Takanori Ashihara

,

Yoshikazu Yamaguchi

,

Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

2016

GPS Trajectory Data Enrichment based on a Latent Statistical Model.

[DOI]

Akira Kinoshita

,

Atsuhiro Takasu

,

,

,

Hisashi Kurasawa

,

,

Motonori Nakamura

,

Proceedings of the 5th International Conference on Pattern Recognition Applications and Methods, 2016

2015

Top of worlds: estimating time complexity of calculating rank order in multi-dimensional hierarchical sets.

[DOI]

,

Hitoshi Kawasaki

,

Hisashi Kurasawa

,

,

Motonori Nakamura

,

Akihiro Tsutsui

Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2015 ACM International Symposium on Wearable Computers, 2015

2014

Missing sensor value estimation method for participatory sensing environment.

[DOI]

Hisashi Kurasawa

,

,

Atsushi Yamamoto

,

Hitoshi Kawasaki

,

Motonori Nakamura

,

,

Hajime Matsumura

Proceedings of the IEEE International Conference on Pervasive Computing and Communications, 2014

2013

A digital signal processor implementation of silent/electrolaryngeal speech enhancement based on real-time statistical voice conversion.

[DOI]

Takuto Moriguchi

,

,

,

,

,

,

Satoshi Nakamura

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

An evaluation of method for encouraging participation.

[DOI]

Hitoshi Kawasaki

,

Atsushi Yamamoto

,

Hisashi Kurasawa

,

,

Motonori Nakamura

,

Proceedings of the 2013 ACM International Joint Conference on Pervasive and Ubiquitous Computing, 2013

2012

Top of worlds: method for improving motivation to participate in sensing services.

[DOI]

Hitoshi Kawasaki

,

Atsushi Yamamoto

,

Hisashi Kurasawa

,

,

Motonori Nakamura

,

Hajime Matsumura

Proceedings of the 2012 ACM Conference on Ubiquitous Computing, 2012

Online Top-k Similar Time-Lagged Pattern Pair Search in Multiple Time Series.

[DOI]

Hisashi Kurasawa

,

,

Motonori Nakamura

,

Hajime Matsumura

Proceedings of the Database and Expert Systems Applications, 2012

Distributed Sampling Storage for Statistical Analysis of Massive Sensor Data.

[DOI]

,

Hisashi Kurasawa

,

,

Motonori Nakamura

,

Hajime Matsumura

,

Kei'ichi Koyanagi

Proceedings of the Multidisciplinary Research and Practice for Information Systems, 2012

Loading...