Hiroshi Sato

Affiliations:
  • NTT Corporation, NTT Media Intelligence Laboratories, Japan


According to our database1, Hiroshi Sato authored at least 48 papers between 2012 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
Microphone array geometry-independent multi-talker distant ASR: NTT system for DASR task of the CHiME-8 challenge.
Comput. Speech Lang., 2026

2025
Generic Speech Enhancement with Self-Supervised Representation Space Loss.
CoRR, July, 2025

Real-time TSE demonstration via SoundBeam with KD.
Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Attention-Free Dual-Mode ASR with Latency-Controlled Selective State Spaces.
Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Alignment-Free Training for Transducer-based Multi-Talker ASR.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Guided Speaker Embedding.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

All-in-One ASR: Unifying Encoder-Decoder Models of CTC, Attention, and Transducer in Dual-Mode ASR.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025

2024
Rethinking Processing Distortions: Disentangling the Impact of Speech Enhancement Errors on Speech Recognition Performance.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

Recursive Attentive Pooling For Extracting Speaker Embeddings From Multi-Speaker Recordings.
Proceedings of the IEEE Spoken Language Technology Workshop, 2024

Investigation of Speaker Representation for Target-Speaker Speech Processing.
Proceedings of the IEEE Spoken Language Technology Workshop, 2024

SpeakerBeam-SS: Real-time Target Speaker Extraction with Lightweight Conv-TasNet and State Space Modeling.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Boosting Hybrid Autoregressive Transducer-based ASR with Internal Acoustic Model Training and Dual Blank Thresholding.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

How Does End-To-End Speech Recognition Training Impact Speech Enhancement Artifacts?
Proceedings of the IEEE International Conference on Acoustics, 2024

Noise-Robust Zero-Shot Text-to-Speech Synthesis Conditioned on Self-Supervised Speech-Representation Model with Adapters.
Proceedings of the IEEE International Conference on Acoustics, 2024

2023
Streaming End-to-End Target-Speaker Automatic Speech Recognition and Activity Detection.
IEEE Access, 2023

Downstream Task Agnostic Speech Enhancement with Self-Supervised Representation Loss.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Knowledge Distillation for Neural Transducer-based Target-Speaker ASR: Exploiting Parallel Mixture/Single-Talker Speech Data.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

End-to-End Joint Target and Non-Target Speakers ASR.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Leveraging Language Embeddings for Cross-Lingual Self-Supervised Speech Representation Learning.
Proceedings of the IEEE International Conference on Acoustics, 2023

Improving Scheduled Sampling for Neural Transducer-Based ASR.
Proceedings of the IEEE International Conference on Acoustics, 2023

2022
On the Use of Modality-Specific Large-Scale Pre-Trained Encoders for Multimodal Sentiment Analysis.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Domain Adversarial Self-Supervised Speech Representation Learning for Improving Unknown Domain Downstream Tasks.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Strategies to Improve Robustness of Target Speech Extraction to Enrollment Variations.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Streaming Target-Speaker ASR with Neural Transducer.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

End-to-End Joint Modeling of Conversation History-Dependent and Independent ASR Systems with Multi-History Training.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

How bad are artifacts?: Analyzing the impact of speech enhancement errors on ASR.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Listen only to me! How well can target speech extraction handle false alarms?
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Learning to Enhance or Not: Neural Network-Based Switching of Enhanced and Observed Signals for Overlapping Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2022

Hybrid RNN-T/Attention-Based Streaming ASR with Triggered Chunkwise Attention and Dual Internal Language Model Integration.
Proceedings of the IEEE International Conference on Acoustics, 2022

Customer Satisfaction Estimation Using Unsupervised Representation Learning with Multi-Format Prediction Loss.
Proceedings of the IEEE International Conference on Acoustics, 2022

2021
Multimodal Attention Fusion for Target Speaker Extraction.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Should We Always Separate?: Switching Between Enhanced and Observed Signals for Overlapping Speech Recognition.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Streaming End-to-End Speech Recognition for Hybrid RNN-T/Attention Architecture.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Simpleflat: A Simple Whole-Network Pre-Training Approach for RNN Transducer-Based End-to-End Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2021

Speech Emotion Recognition Based on Listener Adaptive Models.
Proceedings of the IEEE International Conference on Acoustics, 2021

2020
Self-Distillation for Improving CTC-Transformer-Based ASR Systems.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Distilling Attention Weights for CTC-Based ASR Systems.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019
End-to-End Automatic Speech Recognition with a Reconstruction Criterion Using Speech-to-Text and Text-to-Speech Encoder-Decoders.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Neural Whispered Speech Detection with Imbalanced Learning.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Revisiting Dynamic Adjustment of Language Model Scaling Factor for Automatic Speech Recognition.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

2016
GPS Trajectory Data Enrichment based on a Latent Statistical Model.
Proceedings of the 5th International Conference on Pattern Recognition Applications and Methods, 2016

2015
Top of worlds: estimating time complexity of calculating rank order in multi-dimensional hierarchical sets.
Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2015 ACM International Symposium on Wearable Computers, 2015

2014
Missing sensor value estimation method for participatory sensing environment.
Proceedings of the IEEE International Conference on Pervasive Computing and Communications, 2014

2013
A digital signal processor implementation of silent/electrolaryngeal speech enhancement based on real-time statistical voice conversion.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

An evaluation of method for encouraging participation.
Proceedings of the 2013 ACM International Joint Conference on Pervasive and Ubiquitous Computing, 2013

2012
Top of worlds: method for improving motivation to participate in sensing services.
Proceedings of the 2012 ACM Conference on Ubiquitous Computing, 2012

Online Top-k Similar Time-Lagged Pattern Pair Search in Multiple Time Series.
Proceedings of the Database and Expert Systems Applications, 2012

Distributed Sampling Storage for Statistical Analysis of Massive Sensor Data.
Proceedings of the Multidisciplinary Research and Practice for Information Systems, 2012


  Loading...