Yui Sudo

Orcid: 0000-0003-2094-6701

According to our database¹, Yui Sudo authored at least 38 papers between 2019 and 2026.

Collaborative distances:

Dijkstra number² of five.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Bibliography

2026

Speech-Worthy Alignment for Japanese SpeechLLMs via Direct Preference Optimization.

[BibT_eX]

[DOI]

CoRR, March, 2026

Streaming Translation and Transcription Through Speech-to-Text Causal Alignment.

[BibT_eX]

[DOI]

CoRR, March, 2026

Distilling LLM Semantic Priors into Encoder-Only Multi-Talker ASR with Talker-Count Routing.

[BibT_eX]

[DOI]

CoRR, March, 2026

DuplexCascade: Full-Duplex Speech-to-Speech Dialogue with VAD-Free Cascaded ASR-LLM-TTS Pipeline and Micro-Turn Optimization.

[BibT_eX]

[DOI]

Jianing Yang

Yusuke Fujita

Yui Sudo

CoRR, March, 2026

2025

OWSM-Biasing: Contextualizing Open Whisper-Style Speech Models for Automatic Speech Recognition with Dynamic Vocabulary.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

DYNAC: Dynamic Vocabulary-based Non-Autoregressive Contextualization for Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Is Synthetic Data Truly Effective for Training Speech Language Models?

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Joint Target-Speaker ASR and Activity Detection.

[BibT_eX]

[DOI]

Chikara Maeda

Muhammad Shakeel

Yui Sudo

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

AC/DC: LLM-based Audio Comprehension via Dialogue Continuation.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

OWSM v4: Improving Open Whisper-Style Speech Models via Data Scaling and Cleaning.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Serialized Output Prompting for Large Language Model-based Multi-Talker Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025

Unifying Diarization, Separation, and ASR with Multi-Speaker Encoder.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025

Evaluating Japanese Dialect Robustness Across Speech and Text-based Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025

Conversation Context-Aware Direct Preference Optimization for Style-Controlled Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2025

2024

Online adaptation of fourier series-based acoustic transfer function model and its application to sound source localization and separation.

[BibT_eX]

[DOI]

Adv. Robotics, October, 2024

4D ASR: Joint Beam Search Integrating CTC, Attention, Transducer, and Mask Predict Decoders.

[BibT_eX]

[DOI]

CoRR, 2024

Contextualized Automatic Speech Recognition With Dynamic Vocabulary.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Contextualized End-to-end Automatic Speech Recognition with Intermediate Biasing Loss.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Improving Noise Robustness of Automatic Speech Recognition Based on a Parallel Adapter Model with Near-Identity Initialization.

[BibT_eX]

[DOI]

Proceedings of the Advances and Trends in Artificial Intelligence. Theory and Applications, 2024

Contextualized Automatic Speech Recognition With Attention-Based Bias Phrase Boosted Beam Search.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Joint Optimization of Streaming and Non-Streaming Automatic Speech Recognition with Multi-Decoder and Knowledge Distillation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023

Online Adaptation of Fourier Series Based Acoustic Transfer Function Model to Improve Sound Source Localization and Separation.

[BibT_eX]

[DOI]

Proceedings of the 32nd IEEE International Conference on Robot and Human Interactive Communication, 2023

Retraining-free Customized ASR for Enharmonic Words Based on a Named-Entity-Aware Model and Phoneme Similarity Estimation.

[BibT_eX]

[DOI]

Yui Sudo

Kazuya Hata

Kazuhiro Nakadai

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

4D ASR: Joint modeling of CTC, Attention, Transducer, and Mask-Predict decoders.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Time-synchronous one-pass Beam Search for Parallel Online and Offline Transducers with Dynamic Block Training.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

DPHuBERT: Joint Distillation and Pruning of Self-Supervised Speech Models.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Reproducing Whisper-Style Training Using An Open-Source Toolkit And Publicly Available Data.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Flexible Evidence Model to Reduce Uncertainty Mismatch Between Speech Enhancement and ASR Based on Encoder-Decoder Architecture.

[BibT_eX]

[DOI]

Ryu Takeda

Yui Sudo

Kazunori Komatani

Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2023

2022

Empirical Sampling from Latent Utterance-wise Evidence Model for Missing Data ASR based on Neural Encoder-Decoder Model.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Streaming Automatic Speech Recognition with Re-blocking Processing Based on Integrated Voice Activity Detection.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

2021

Multichannel environmental sound segmentation.

[BibT_eX]

[DOI]

Appl. Intell., 2021

Multi-channel Environmental Sound Segmentation utilizing Sound Source Localization and Separation U-Net.

[BibT_eX]

[DOI]

Proceedings of the IEEE/SICE International Symposium on System Integration, 2021

2020

Sound event aware environmental sound segmentation with Mask U-Net.

[BibT_eX]

[DOI]

Adv. Robotics, 2020

Multi-channel Environmental sound segmentation.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/SICE International Symposium on System Integration, 2020

2019

Environmental sound segmentation utilizing Mask U-Net.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2019

Improvement of DOA Estimation by using Quaternion Output in Sound Event Localization and Detection.

[BibT_eX]

[DOI]

Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events 2019 (DCASE 2019), 2019

Yui Sudo

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...