Atsushi Ando

Orcid: 0000-0002-3971-0654

According to our database¹, Atsushi Ando authored at least 53 papers between 2015 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

Microphone array geometry-independent multi-talker distant ASR: NTT system for DASR task of the CHiME-8 challenge.

[BibT_eX]

[DOI]

Comput. Speech Lang., 2026

2025

Dissecting the Segmentation Model of End-to-End Diarization with Vector Clustering.

[BibT_eX]

[DOI]

CoRR, June, 2025

Pretraining Multi-Speaker Identification for Neural Speaker Diarization.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Mitigating Non-Target Speaker Bias in Guided Speaker Embedding.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Multi-channel Speaker Counting for EEND-VC-based Speaker Diarization on Multi-domain Conversation.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Mamba-based Segmentation Model for Speaker Diarization.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Guided Speaker Embedding.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Speech Emotion Recognition Based on Large-Scale Automatic Speech Recognizer.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Can We Really Repurpose Multi-Speaker ASR Corpus for Speaker Diarization?

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025

Predictive ASR and Turn-taking Prediction at Once: Towards More Responsive Spoken Dialog System.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025

2024

Speech Rhythm-Based Speaker Embeddings Extraction from Phonemes and Phoneme Duration for Multi-Speaker Speech Synthesis.

[BibT_eX]

[DOI]

Kenichi Fujita

Atsushi Ando

Yusuke Ijima

IEICE Trans. Inf. Syst., January, 2024

Recursive Attentive Pooling For Extracting Speaker Embeddings From Multi-Speaker Recordings.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

SpeakerBeam-SS: Real-time Target Speaker Extraction with Lightweight Conv-TasNet and State Space Modeling.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Unified Multi-Talker ASR with and without Target-speaker Enrollment.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

SOMSRED: Sequential Output Modeling for Joint Multi-talker Overlapped Speech Recognition and Speaker Diarization.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Factor-Conditioned Speaking-Style Captioning.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

NTT Speaker Diarization System for Chime-7: Multi-Domain, Multi-Microphone end-to-end and Vector Clustering Diarization.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

2023

Multi-region CNN-Transformer for Micro-gesture Recognition in Face and Upper Body.

[BibT_eX]

[DOI]

Proceedings of the ACM Multimedia Asia 2023, 2023

End-to-End Joint Target and Non-Target Speakers ASR.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Joint Autoregressive Modeling of End-to-End Multi-Talker Overlapped Speech Recognition and Utterance-level Timestamp Prediction.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

OnDA-DETR: Online Domain Adaptation for Detection Transformers with Self-Training Framework.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Image Processing, 2023

Adversarial Finetuning with Latent Representation Constraint to Mitigate Accuracy-Robustness Tradeoff.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

2022

Knowledge Transferred Fine-Tuning: Convolutional Neural Network Is Born Again With Anti-Aliasing Even in Data-Limited Situations.

[BibT_eX]

[DOI]

IEEE Access, 2022

On the Use of Modality-Specific Large-Scale Pre-Trained Encoders for Multimodal Sentiment Analysis.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Interactive Co-Learning with Cross-Modal Transformer for Audio-Visual Emotion Recognition.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

End-to-End Joint Modeling of Conversation History-Dependent and Independent ASR Systems with Multi-History Training.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Speaker consistency loss and step-wise optimization for semi-supervised joint training of TTS and ASR using unpaired text data.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Hybrid RNN-T/Attention-Based Streaming ASR with Triggered Chunkwise Attention and Dual Internal Language Model Integration.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Customer Satisfaction Estimation Using Unsupervised Representation Learning with Multi-Format Prediction Loss.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

Speech Emotion Recognition in Real Environments using Characteristics of Emotional Expression and Perception.

[BibT_eX]

[DOI]

Atsushi Ando

PhD thesis, 2021

Streaming End-to-End Speech Recognition for Hybrid RNN-T/Attention Architecture.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Phoneme Duration Modeling Using Speech Rhythm-Based Speaker Embeddings for Multi-Speaker Speech Synthesis.

[BibT_eX]

[DOI]

Kenichi Fujita

Atsushi Ando

Yusuke Ijima

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Simpleflat: A Simple Whole-Network Pre-Training Approach for RNN Transducer-Based End-to-End Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Speech Emotion Recognition Based on Listener Adaptive Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

2020

Customer Satisfaction Estimation in Contact Center Calls Based on a Hierarchical Multi-Task Model.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2020

Sequence-Level Consistency Training for Semi-Supervised End-to-End Automatic Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Speaker Age Estimation Using Age-Dependent Insensitive Loss.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2020

2019

Improving Conversation-Context Language Models with Multiple Spoken Language Understanding Models.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Speech Emotion Recognition Based on Multi-Label Emotion Existence Model.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Does the Lombard Effect Improve Emotional Communication in Noise? - Analysis of Emotional Speech Acted in Noise.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Improving Speech-Based End-of-Turn Detection Via Cross-Modal Representation Learning with Punctuated Text Data.

[BibT_eX]

[DOI]

Ryuichiro Higashinaka

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Urgent Voicemail Detection Focused on Long-term Temporal Variation.

[BibT_eX]

[DOI]

Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Likability Estimation of Call-center Agents by Suppressing Annotator Variability.

[BibT_eX]

[DOI]

Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

2018

Neural Dialogue Context Online End-of-Turn Detection.

[BibT_eX]

[DOI]

Ryuichiro Higashinaka

Yushi Aono

Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue, 2018

Role Play Dialogue Aware Language Models Based on Conditional Hierarchical Recurrent Encoder-Decoder.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Automatic Question Detection from Acoustic and Phonetic Features Using Feature-wise Pre-training.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Soft-Target Training with Ambiguous Emotional Utterances for DNN-Based Speech Emotion Classification.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Online Call Scene Segmentation of Contact Center Dialogues based on Role Aware Hierarchical LSTM-RNNs.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018

2017

Interaction and Transition Model for Speech Emotion Recognition in Dialogue.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Hierarchical LSTMs with Joint Learning for Estimating Customer Satisfaction from Contact Center Calls.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Robust children and adults speech identification and confidence measure based on DNN posteriorgram.

[BibT_eX]

[DOI]

Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

2016

Speaker recognition in duration-mismatched condition using bootstrapped i-vectors.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016

2015

Agreement and disagreement utterance detection in conversational speech by extracting and integrating local features.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Atsushi Ando

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...