We stand with Ukraine

We stand with Ukraine

Jiatong Shi

Orcid: 0000-0002-9050-8304

According to our database¹, Jiatong Shi authored at least 132 papers between 2018 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

Bagpiper: Solving Open-Ended Audio Tasks via Rich Captions.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

Takashi Maekaku

,

,

Yusuke Shinohara

,

,

Chao-Han Huck Yang

,

Shinji Watanabe

CoRR, February, 2026

Optimizing Conversational Quality in Spoken Dialogue Systems with Reinforcement Learning from AI Feedback.

[DOI]

,

,

,

,

Yosuke Kashiwagi

,

,

Shinji Watanabe

CoRR, January, 2026

Do Neural Codecs Generalize? A Controlled Study Across Unseen Languages and Non-Speech Tasks.

[DOI]

,

,

,

,

Shinji Watanabe

CoRR, January, 2026

IKFST: IOO and KOO Algorithms for Accelerated and Precise WFST-based End-to-End Automatic Speech Recognition.

[DOI]

,

,

,

,

,

,

,

CoRR, January, 2026

BSCodec: A Band-Split Neural Codec for High-Quality Universal Audio Reconstruction.

[DOI]

,

,

,

,

,

Shinji Watanabe

Proceedings of the Findings of the Association for Computational Linguistics: EACL 2026, 2026

2025

Adapting Speech Language Model to Singing Voice Synthesis.

[DOI]

,

,

,

,

,

,

Shinji Watanabe

CoRR, December, 2025

CartoonSing: Unifying Human and Nonhuman Timbres in Singing Generation.

[DOI]

,

,

,

,

,

,

Shinji Watanabe

CoRR, November, 2025

SingingSDS: A Singing-Capable Spoken Dialogue System for Conversational Roleplay Applications.

[DOI]

,

,

,

,

,

,

,

Shinji Watanabe

CoRR, November, 2025

Speech-DRAME: A Framework for Human-Aligned Benchmarks in Speech Role-Play.

[DOI]

,

,

,

Santiago Pascual

,

,

,

Shinji Watanabe

,

,

CoRR, November, 2025

Chain-of-Thought Reasoning in Streaming Full-Duplex End-to-End Spoken Dialogue Systems.

[DOI]

,

,

,

,

Yosuke Kashiwagi

,

,

Shinji Watanabe

CoRR, October, 2025

SingMOS-Pro: An Comprehensive Benchmark for Singing Quality Assessment.

[DOI]

,

,

,

,

,

,

,

CoRR, October, 2025

The Singing Voice Conversion Challenge 2025: From Singer Identity Conversion To Singing Style Conversion.

[DOI]

Lester Phillip Violeta

,

,

,

,

,

,

CoRR, September, 2025

Improving Speech Enhancement with Multi-Metric Supervision from Learned Quality Assessment.

[DOI]

,

,

,

,

Shinji Watanabe

,

CoRR, June, 2025

DiscoSum: Discourse-aware News Summarization.

[DOI]

Alexander Spangher

,

,

,

,

CoRR, June, 2025

MIKU-PAL: An Automated and Standardized Multi-Modal Method for Speech Paralinguistic and Affect Labeling.

[DOI]

,

,

CoRR, May, 2025

Discrete Audio Tokens: More Than a Survey!

[DOI]

Trans. Mach. Learn. Res., 2025

ARECHO: Autoregressive Evaluation via Chain-Based Hypothesis Optimization for Speech Multi-Metric Estimation.

[DOI]

,

,

,

,

,

Samuele Cornell

,

,

,

Shinji Watanabe

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

ESPnet-SpeechLM: An Open Speech Language Model Toolkit.

[DOI]

,

,

,

,

Yoshiki Masuyama

,

Takashi Maekaku

,

,

,

Shikhar Bharadwaj

,

,

Samuele Cornell

,

,

,

Chao-Han Huck Yang

,

,

Shinji Watanabe

Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

VERSA: A Versatile Evaluation Toolkit for Speech, Audio, and Music.

[DOI]

,

,

,

,

,

Darius Petermann

,

,

,

,

,

Dareen Alharthi

,

,

,

,

,

,

Shinji Watanabe

Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

ESPnet-SDS: Unified Toolkit and Demo for Spoken Dialogue Systems.

[DOI]

,

,

,

,

,

Shikhar Bharadwaj

,

,

Yosuke Kashiwagi

,

,

Shuichiro Shimizu

,

Vaibhav Srivastav

,

Shinji Watanabe

Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

FoodPuzzle: Toward Developing Large Language Model Agents as Autonomous Flavor Scientists.

[DOI]

,

,

,

,

Emily Steliotes

,

,

,

Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, V.2, 2025

Aligning Text-to-Music Evaluation with Human Preferences.

[DOI]

,

,

,

,

Shinji Watanabe

,

,

,

Proceedings of the 26th International Society for Music Information Retrieval Conference, 2025

OpusLM: A Family of Open Unified Speech Language Models.

[DOI]

,

,

,

,

,

Shikhar Bharadwaj

,

Takashi Maekaku

,

Yusuke Shinohara

,

,

,

,

Shinji Watanabe

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Uni-VERSA: Versatile Speech Assessment with a Unified Network.

[DOI]

,

,

Shinji Watanabe

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Scalable Spontaneous Speech Dataset (SSSD): Crowdsourcing Data Collection to Promote Dialogue Research.

[DOI]

,

Shuichiro Shimizu

,

,

,

Samuele Cornell

,

,

Shinji Watanabe

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Bridging Speech and Singing: Multi-stage Speech-Prompted Singing Voice Conversion with Speaker Embedding Adaptation.

[DOI]

,

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

MIKU-PAL: An Automated and Standardized Multimodal Method for Speech Paralinguistic and Affect Labeling.

[DOI]

,

,

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

The ML-SUPERB 2.0 Challenge: Towards Inclusive ASR Benchmarking for All Language Varieties.

[DOI]

,

,

,

Martijn Bartelds

,

,

Hsiu-Hsuan Wang

,

Rafael Mosquera

,

,

,

Antonis Anastasopoulos

,

,

,

Shinji Watanabe

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Chain-of-Thought Training for Open E2E Spoken Dialogue Systems.

[DOI]

,

,

,

,

,

Yosuke Kashiwagi

,

,

Shinji Watanabe

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks.

[DOI]

,

,

,

,

,

,

Wei-Cheng Tseng

,

,

,

,

,

,

,

,

,

,

,

,

,

Fabian Alejandro Ritter Gutierrez

,

et al.

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Preference Alignment Improves Language Model-Based TTS.

[DOI]

,

,

,

,

,

Shinji Watanabe

,

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Robust Training of Singing Voice Synthesis Using Prior and Posterior Uncertainty.

[DOI]

,

,

,

,

Shinji Watanabe

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025

Continual Pre-training for Codec-Based Speech LLMs: Balancing Understanding and Generation.

[DOI]

,

,

,

,

,

Shinji Watanabe

,

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025

PURE Codec: Progressive Unfolding of Residual Entropy for Speech Codec Learning.

[DOI]

,

,

,

,

,

,

Shinji Watanabe

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025

VERSA-v2: A Modular and Scalable Toolkit for Speech and Audio Evaluation with Expanded Metrics, Visualization, and LLM Integration.

[DOI]

,

,

Shikhar Bharadwaj

,

,

,

,

,

,

,

,

Nezih Topaloglu

,

,

,

,

,

,

,

Shinji Watanabe

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2025

2024

A Large-Scale Evaluation of Speech Foundation Models.

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2024

How to Learn a New Language? An Efficient Solution for Self-Supervised Learning Models Unseen Languages Adaption in Low-Resource Scenario.

[DOI]

,

,

,

,

,

,

,

,

CoRR, 2024

Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks.

[DOI]

,

,

,

,

,

,

Wei-Cheng Tseng

,

,

,

,

,

,

,

,

,

,

,

,

,

Fabian Ritter Gutierrez

,

,

,

,

,

,

,

Chung-Ming Chien

,

,

Cheng-Hsiu Hsieh

,

,

,

,

Heitor R. Guimarães

,

,

,

,

,

,

,

,

,

,

,

,

,

Kuan-Yu Fang Chiang

,

,

,

,

Shao-Syuan Huang

,

,

,

,

,

,

,

,

,

,

Shih-Yun Shan Kuan

,

,

,

,

,

,

,

,

Chao-Han Huck Yang

,

,

,

Shao-Xiang Yuan

,

,

,

,

,

,

Shinji Watanabe

,

CoRR, 2024

Findings of the IWSLT 2024 Evaluation Campaign.

[DOI]

CoRR, 2024

FoodPuzzle: Developing Large Language Model Agents as Flavor Scientists.

[DOI]

,

,

,

,

Emily Steliotes

,

,

,

CoRR, 2024

SingMOS: An extensive Open-Source Singing Voice Dataset for MOS Prediction.

[DOI]

,

,

,

CoRR, 2024

MMM: Multi-Layer Multi-Residual Multi-Stream Discrete Speech Representation from Self-supervised Learning Model.

[DOI]

,

,

Hirofumi Inaguma

,

,

Shinji Watanabe

CoRR, 2024

VISinger2+: End-to-End Singing Voice Synthesis Augmented by Self-Supervised Learning Representation.

[DOI]

,

,

,

Shinji Watanabe

CoRR, 2024

4D ASR: Joint Beam Search Integrating CTC, Attention, Transducer, and Mask Predict Decoders.

[DOI]

,

Muhammad Shakeel

,

Yosuke Fukumoto

,

,

,

,

Shinji Watanabe

CoRR, 2024

SVDD Challenge 2024: A Singing Voice Deepfake Detection Challenge Evaluation Plan.

[DOI]

,

,

,

Ryuichi Yamamoto

,

,

,

,

CoRR, 2024

Singing Voice Data Scaling-up: An Introduction to ACE-Opencpop and KiSing-v2.

[DOI]

,

,

,

,

,

,

,

,

Shinji Watanabe

CoRR, 2024

ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible recipes, self-supervised front-ends, and off-the-shelf models.

[DOI]

,

,

,

Zakaria Aldeneh

,

,

Barry-John Theobald

,

Ahmed Hussen Abdelaziz

,

Shinji Watanabe

CoRR, 2024

SVDD 2024: The Inaugural Singing Voice Deepfake Detection Challenge.

[DOI]

,

,

,

Ryuichi Yamamoto

,

,

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

Visinger2+: End-to-End Singing Voice Synthesis Augmented by Self-Supervised Learning Representation.

[DOI]

,

,

,

,

Shinji Watanabe

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

Fusion Of Discrete Representations and Self-Augmented Representations for Multilingual Automatic Speech Recognition.

[DOI]

,

,

,

Shinji Watanabe

,

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

ESPnet-EZ: Python-Only ESPnet For Easy Fine-Tuning And Integration.

[DOI]

,

,

,

,

Samuele Cornell

,

,

,

,

Vaibhav Srivastav

,

Shinji Watanabe

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

ESPnet-Codec: Comprehensive Training and Evaluation of Neural Codecs For Audio, Music, and Speech.

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

Muskits-ESPnet: A Comprehensive Toolkit for Singing Voice Synthesis in New Paradigm.

[DOI]

,

,

,

,

,

,

,

,

Shinji Watanabe

,

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

FINDINGS OF THE IWSLT 2024 EVALUATION CAMPAIGN.

[DOI]

Proceedings of the 21st International Conference on Spoken Language Translation, 2024

A Systematic Exploration of Joint-Training for Singing Voice Synthesis.

[DOI]

,

,

,

,

Proceedings of the 14th IEEE International Symposium on Chinese Spoken Language Processing, 2024

An Exploration on Singing MOS Prediction.

[DOI]

,

,

,

Proceedings of the 14th IEEE International Symposium on Chinese Spoken Language Processing, 2024

CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection.

[DOI]

,

,

,

Ryuichi Yamamoto

,

,

,

,

,

,

,

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

TokSing: Singing Voice Synthesis based on Discrete Tokens.

[DOI]

,

,

,

,

,

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

SingOMD: Singing Oriented Multi-resolution Discrete Representation Construction from Speech Models.

[DOI]

,

,

,

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

EFFUSE: Efficient Self-Supervised Feature Fusion for E2E ASR in Low Resource and Multilingual Scenarios.

[DOI]

Tejes Srivastava

,

,

,

Shinji Watanabe

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets.

[DOI]

,

,

,

Martijn Bartelds

,

Vanya Bannihatti Kumar

,

,

,

,

,

,

Shinji Watanabe

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

MMM: Multi-Layer Multi-Residual Multi-Stream Discrete Speech Representation from Self-supervised Learning Model.

[DOI]

,

,

Hirofumi Inaguma

,

,

Shinji Watanabe

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Singing Voice Data Scaling-up: An Introduction to ACE-Opencpop and ACE-KiSing.

[DOI]

,

,

,

,

,

,

,

,

Shinji Watanabe

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer.

[DOI]

,

,

,

,

,

,

Muhammad Shakeel

,

,

,

,

,

Shinji Watanabe

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

PL-TTS: A Generalizable Prompt-based Diffusion TTS Augmented by Large Language Model.

[DOI]

,

,

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible recipes, self-supervised front-ends, and off-the-shelf models.

[DOI]

,

,

,

Zakaria Aldeneh

,

,

,

Barry-John Theobald

,

Ahmed Hussen Abdelaziz

,

Shinji Watanabe

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

The Interspeech 2024 Challenge on Speech Processing Using Discrete Units.

[DOI]

,

,

,

,

,

,

Shinji Watanabe

,

,

,

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Self-supervised Speech Representations Still Struggle with African American Vernacular English.

[DOI]

,

,

,

Hsuan-Ming Chen

,

Nicole Holliday

,

Odette Scharenborg

,

David R. Mortensen

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

UniAudio: Towards Universal Audio Generation with Large Language Models.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Multi-resolution HuBERT: Multi-resolution Speech Self-Supervised Learning with Masked Unit Prediction.

[DOI]

,

Hirofumi Inaguma

,

,

,

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Hubertopic: Enhancing Semantic Representation of Hubert Through Self-Supervision Utilizing Topic Model.

[DOI]

Takashi Maekaku

,

,

,

,

Shinji Watanabe

Proceedings of the IEEE International Conference on Acoustics, 2024

Dynamic-Superb: Towards a Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark For Speech.

[DOI]

,

,

,

,

,

,

,

,

,

,

Roshan S. Sharma

,

Shinji Watanabe

,

Bhiksha Ramakrishnan

,

,

Proceedings of the IEEE International Conference on Acoustics, 2024

Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study.

[DOI]

,

,

,

,

,

,

Roshan S. Sharma

,

,

,

Shinji Watanabe

,

,

Takashi Maekaku

,

,

,

,

,

Hsiu-Hsuan Wang

Proceedings of the IEEE International Conference on Acoustics, 2024

Exploiting Longitudinal Speech Sessions via Voice Assistant Systems for Early Detection of Cognitive Decline.

[DOI]

,

,

Caroline Summerour

,

,

Proceedings of the IEEE International Conference on E-health Networking, 2024

Towards Robust Speech Representation Learning for Thousands of Languages.

[DOI]

,

,

,

,

,

,

,

,

,

Shinji Watanabe

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Make-A-Voice: Revisiting Voice Large Language Models as Scalable Multilingual and Multitask Learners.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

Wav2Gloss: Generating Interlinear Glossed Text from Speech.

[DOI]

,

,

,

Nathaniel R. Robinson

,

,

Shinji Watanabe

,

,

David R. Mortensen

,

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

Shinji Watanabe

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

An iteration-based interactive attention network for 3D point cloud registration.

[DOI]

,

,

,

Neurocomputing, December, 2023

A dynamic graph aggregation framework for 3D point cloud registration.

[DOI]

,

,

Eng. Appl. Artif. Intell., April, 2023

Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation over More Languages and Beyond.

[DOI]

,

,

,

Hsiu-Hsuan Wang

,

,

,

,

,

,

,

Abdelrahman Mohamed

,

,

Shinji Watanabe

CoRR, 2023

EFFUSE: Efficient Self-Supervised Feature Fusion for E2E ASR in Multilingual and Low Resource Scenarios.

[DOI]

Tejes Srivastava

,

,

,

Shinji Watanabe

CoRR, 2023

UniAudio: An Audio Foundation Model Toward Universal Audio Generation.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

Shinji Watanabe

,

CoRR, 2023

The Singing Voice Conversion Challenge 2023.

[DOI]

,

Lester Phillip Violeta

,

,

,

,

CoRR, 2023

Exploration on HuBERT with Multiple Resolutions.

[DOI]

,

,

Hirofumi Inaguma

,

,

,

Shinji Watanabe

CoRR, 2023

Improving Cascaded Unsupervised Speech Translation with Denoising Back-translation.

[DOI]

,

Liang-Hsuan Tseng

,

,

,

,

Shinji Watanabe

,

CoRR, 2023

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

Shinji Watanabe

CoRR, 2023

CMU's IWSLT 2023 Simultaneous Speech Translation System.

[DOI]

,

,

,

,

,

,

,

Shinji Watanabe

Proceedings of the 20th International Conference on Spoken Language Translation, 2023

Findings of the IWSLT 2023 Evaluation Campaign.

[DOI]

,

Antonios Anastasopoulos

,

Luisa Bentivogli

,

,

,

,

Roldano Cattoni

,

,

,

,

,

Alexandra Chronopoulou

,

,

Thierry Declerck

,

,

,

Yannick Estève

,

Marcello Federico

,

Souhir Gahbiche

,

,

,

,

Hirofumi Inaguma

,

Dávid Javorský

,

,

,

,

,

,

,

Prashant Mathur

,

,

,

,

,

,

Satoshi Nakamura

,

,

,

,

,

,

,

,

,

Lonneke van der Plas

,

,

,

Elizabeth Salesky

,

,

Matthias Sperber

,

Sebastian Stüker

,

Katsuhito Sudoh

,

,

,

,

,

,

,

Shinji Watanabe

,

Rodolfo Zevallos

Proceedings of the 20th International Conference on Spoken Language Translation, 2023

4D ASR: Joint modeling of CTC, Attention, Transducer, and Mask-Predict decoders.

[DOI]

,

Muhammad Shakeel

,

,

,

Shinji Watanabe

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

ML-SUPERB: Multilingual Speech Universal PERformance Benchmark.

[DOI]

,

,

,

,

,

,

,

,

Abdelrahman Mohamed

,

,

Shinji Watanabe

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Exploration on HuBERT with Multiple Resolution.

[DOI]

,

,

Hirofumi Inaguma

,

,

,

Shinji Watanabe

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Phoneix: Acoustic Feature Processing Strategy for Enhanced Singing Pronunciation With Phoneme Distribution Predictor.

[DOI]

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2023

Enhancing Speech-To-Speech Translation with Multiple TTS Targets.

[DOI]

,

,

,

Hirofumi Inaguma

,

,

,

Shinji Watanabe

Proceedings of the IEEE International Conference on Acoustics, 2023

Bridging Speech and Textual Pre-Trained Models With Unsupervised ASR.

[DOI]

,

,

,

,

,

Shinji Watanabe

,

,

Proceedings of the IEEE International Conference on Acoustics, 2023

Euro: Espnet Unsupervised ASR Open-Source Toolkit.

[DOI]

,

,

,

Leibny Paola García

,

,

Shinji Watanabe

,

Sanjeev Khudanpur

Proceedings of the IEEE International Conference on Acoustics, 2023

Improving Massively Multilingual ASR with Auxiliary CTC Objectives.

[DOI]

,

,

,

,

,

Shinji Watanabe

Proceedings of the IEEE International Conference on Acoustics, 2023

Findings of the 2023 ML-Superb Challenge: Pre-Training And Evaluation Over More Languages And Beyond.

[DOI]

,

,

,

Hsiu-Hsuan Wang

,

,

,

,

,

,

,

Abdelrahman Mohamed

,

,

Shinji Watanabe

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Reproducing Whisper-Style Training Using An Open-Source Toolkit And Publicly Available Data.

[DOI]

,

,

,

,

,

,

,

,

,

Roshan S. Sharma

,

,

,

Muhammad Shakeel

,

,

,

Shinji Watanabe

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

The Singing Voice Conversion Challenge 2023.

[DOI]

,

Lester Phillip Violeta

,

,

,

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Evaluating Self-Supervised Speech Models on a Taiwanese Hokkien Corpus.

[DOI]

,

,

,

,

Alice Wen-Hsin Bi

,

,

,

,

,

,

Iu-Tshian Phoann

,

,

,

,

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Joint Prediction and Denoising for Large-Scale Multilingual Self-Supervised Learning.

[DOI]

,

,

,

,

,

,

,

,

Shinji Watanabe

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit.

[DOI]

,

,

,

Hirofumi Inaguma

,

,

Siddharth Dalmia

,

,

Patrick Fernandes

,

,

,

,

,

,

,

,

Shinji Watanabe

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), 2023

UniLG: A Unified Structure-aware Framework for Lyrics Generation.

[DOI]

,

,

,

,

,

,

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

2022

An investigation of neural uncertainty estimation for target speaker extraction equipped RNN transducer.

[DOI]

,

,

,

Shinji Watanabe

,

,

Comput. Speech Lang., 2022

Muskits: an End-to-End Music Processing Toolkit for Singing Voice Synthesis.

[DOI]

,

,

,

,

,

,

,

,

,

,

Shinji Watanabe

,

CoRR, 2022

On Compressing Sequences for Self-Supervised Speech Models.

[DOI]

,

,

,

Shinji Watanabe

,

,

,

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Superb @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning.

[DOI]

,

Shuyan Annie Dong

,

,

,

,

,

,

,

,

,

Shinji Watanabe

,

Abdelrahman Mohamed

,

,

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

CMU's IWSLT 2022 Dialect Speech Translation System.

[DOI]

,

Patrick Fernandes

,

Siddharth Dalmia

,

,

,

,

,

,

Shinji Watanabe

Proceedings of the 19th International Conference on Spoken Language Translation, 2022

Findings of the IWSLT 2022 Evaluation Campaign.

[DOI]

Proceedings of the 19th International Conference on Spoken Language Translation, 2022

Streaming Automatic Speech Recognition with Re-blocking Processing Based on Integrated Voice Activity Detection.

[DOI]

,

Muhammad Shakeel

,

Kazuhiro Nakadai

,

,

Shinji Watanabe

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

VQ-T: RNN Transducers using Vector-Quantized Prediction Network States.

[DOI]

,

,

,

Shinji Watanabe

,

Brian Kingsbury

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Muskits: an End-to-end Music Processing Toolkit for Singing Voice Synthesis.

[DOI]

,

,

,

,

,

,

,

,

,

Shinji Watanabe

,

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

SingAug: Data Augmentation for Singing Voice Synthesis with Cycle-consistent Training Strategy.

[DOI]

,

,

,

Shinji Watanabe

,

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Blockwise Streaming Transformer for Spoken Language Understanding and Simultaneous Speech Translation.

[DOI]

,

Shinji Watanabe

,

,

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Combining Spectral and Self-Supervised Features for Low Resource Speech Recognition and Translation.

[DOI]

,

,

,

Osbel López-Francisco

,

Jonathan D. Amith

,

Shinji Watanabe

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Towards end-to-end Speaker Diarization with Generalized Neural Speaker Clustering.

[DOI]

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2022

Training Strategies for Automatic Song Writing: A Unified Framework Perspective.

[DOI]

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2022

SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities.

[DOI]

Hsiang-Sheng Tsai

,

,

,

,

Kushal Lakhotia

,

,

,

,

,

,

,

,

,

,

Shinji Watanabe

,

Abdelrahman Mohamed

,

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

2021

Leveraging deep learning with audio analytics to predict the success of crowdfunding projects.

[DOI]

,

,

,

J. Supercomput., 2021

ESPnet2-TTS: Extending the Edge of TTS Research.

[DOI]

,

Ryuichi Yamamoto

,

Takenori Yoshimura

,

,

,

,

,

,

Shinnosuke Takamichi

,

Shinji Watanabe

CoRR, 2021

ESPnet-ST IWSLT 2021 Offline Speech Translation System.

[DOI]

Hirofumi Inaguma

,

,

Siddharth Dalmia

,

,

,

,

Shinji Watanabe

Proceedings of the 18th International Conference on Spoken Language Translation, 2021

SUPERB: Speech Processing Universal PERformance Benchmark.

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Improving RNN Transducer with Target Speaker Extraction and Neural Uncertainty Estimation.

[DOI]

,

,

,

Shinji Watanabe

,

,

Proceedings of the IEEE International Conference on Acoustics, 2021

Sequence-To-Sequence Singing Voice Synthesis With Perceptual Entropy Loss.

[DOI]

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2021

Recent Developments on Espnet Toolkit Boosted By Conformer.

[DOI]

,

,

,

,

,

Hirofumi Inaguma

,

,

,

Daniel Garcia-Romero

,

,

,

Shinji Watanabe

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2021

Leveraging End-to-End ASR for Endangered Language Documentation: An Empirical Study on Yolóxochitl Mixtec.

[DOI]

,

Jonathan D. Amith

,

Rey Castillo García

,

Esteban Guadalupe Sierra

,

,

Shinji Watanabe

Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 2021

Cross-Lingual Transfer for Speech Processing Using Acoustic Language Similarity.

[DOI]

,

,

,

Shinji Watanabe

,

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

Understanding the Tradeoffs in Client-side Privacy for Downstream Speech Tasks.

[DOI]

,

,

,

Ruslan Salakhutdinov

,

Shinji Watanabe

,

Louis-Philippe Morency

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

2020

Context-Aware Goodness of Pronunciation for Computer-Assisted Pronunciation Training.

[DOI]

,

,

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Large-Scale End-to-End Multilingual Speech Recognition and Language Identification with Multi-Task Learning.

[DOI]

,

,

,

,

,

Takahiro Shinozaki

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

2018

Identifying Impact Factors of Question Quality in Online Health Q&A Communities: an Empirical Analysis on MedHelp.

[DOI]

,

,

Proceedings of the 22nd Pacific Asia Conference on Information Systems, 2018

Loading...