Takaaki Saeki

Orcid: 0000-0001-6003-768X

According to our database¹, Takaaki Saeki authored at least 35 papers between 2010 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

Speaker-conditioned phrase break prediction for text-to-speech with phoneme-level pre-trained language model.

[BibT_eX]

[DOI]

Speech Commun., 2026

2025

TTSOps: A Closed-Loop Corpus Optimization Framework for Training Multi-Speaker TTS Models from Dark Data.

[BibT_eX]

[DOI]

CoRR, June, 2025

Toward Data-Efficient Speech Synthesis: Active Learning-Based Corpus Construction for Multi-Speaker Text-to-Speech Synthesis.

[BibT_eX]

[DOI]

IEEE Access, 2025

Speech Re-Painting for Robust ASR.

[BibT_eX]

[DOI]

Pedro Moreno Mengibar

Françoise Beaufays

Andrew Rosenberg

Bhuvana Ramabhadran

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Active Learning for Text-to-Speech Synthesis with Informative Sample Collection.

[BibT_eX]

[DOI]

Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2025

2024

Text-Inductive Graphone-Based Language Adaptation for Low-Resource Speech Synthesis.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2024

SpeechBERTScore: Reference-Aware Automatic Evaluation of Speech Generation Leveraging NLP Evaluation Metrics.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Diversity-Based Core-Set Selection for Text-to-Speech with Linguistic and Acoustic Features.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Extending Multilingual Speech Synthesis to 100+ Languages without Transcribed Data.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

NecoBERT: Self-Supervised Learning Model Trained by Masked Language Modeling on Rich Acoustic Features Derived from Neural Audio Codec.

[BibT_eX]

[DOI]

Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2024

2023

SelfRemaster: Self-Supervised Speech Restoration for Historical Audio Resources.

[BibT_eX]

[DOI]

IEEE Access, 2023

Improving robustness of spontaneous speech synthesis with linguistic speech regularization and pseudo-filled-pause insertion.

[BibT_eX]

[DOI]

Proceedings of the 12th ISCA Speech Synthesis Workshop, 2023

Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

Duration-Aware Pause Insertion Using Pre-Trained Language Model for Multi-Speaker Text-To-Speech.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-to-Speech.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Speechlmscore: Evaluating Speech Generation Using Speech Language Model.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Yodas: Youtube-Oriented Dataset for Audio and Speech.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022

Text-to-speech synthesis from dark data with evaluation-in-the-loop data selection.

[BibT_eX]

[DOI]

CoRR, 2022

Spontaneous speech synthesis with linguistic-speech consistency training using pseudo-filled pauses.

[BibT_eX]

[DOI]

CoRR, 2022

Empirical Study Incorporating Linguistic Knowledge on Filled Pauses for Personalized Spontaneous Speech Synthesis.

[BibT_eX]

[DOI]

CoRR, 2022

VTTS: Visual-Text To Speech.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Personalized Filled-pause Generation with Group-wise Prediction Models.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022

SSR7000: A Synchronized Corpus of Ultrasound Tongue Imaging for End-to-End Silent Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022

UTMOS: UTokyo-SaruLab System for VoiceMOS Challenge 2022.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

DRSpeech: Degradation-Robust Text-to-Speech Synthesis with Frame-Level and Utterance-Level Acoustic Representation Learning.

[BibT_eX]

[DOI]

Takaaki Saeki

Kentaro Tachibana

Ryuichi Yamamoto

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

SelfRemaster: Self-Supervised Speech Restoration with Analysis-by-Synthesis Approach Using Channel Modeling.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

2021

Incremental Text-to-Speech Synthesis Using Pseudo Lookahead With Large Pretrained Language Model.

[BibT_eX]

[DOI]

Takaaki Saeki

Shinnosuke Takamichi

Hiroshi Saruwatari

IEEE Signal Process. Lett., 2021

Real-Time Full-Band Voice Conversion with Sub-Band Modeling and Data-Driven Phase Estimation of Spectral Differentials.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2021

JTubeSpeech: corpus of Japanese speech collected from YouTube for speech recognition and speaker verification.

[BibT_eX]

[DOI]

CoRR, 2021

ESPnet2-TTS: Extending the Edge of TTS Research.

[BibT_eX]

[DOI]

CoRR, 2021

Low-Latency Incremental Text-to-Speech Synthesis with Distilled Context Prediction Network.

[BibT_eX]

[DOI]

Takaaki Saeki

Shinnosuke Takamichi

Hiroshi Saruwatari

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

2020

Real-Time, Full-Band, Online DNN-Based Voice Conversion System Using a Single CPU.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

End-to-End Deep Learning Speech Recognition Model for Silent Speech Challenge.

[BibT_eX]

[DOI]

Naoki Kimura

Zixiong Su

Takaaki Saeki

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Lifter Training and Sub-Band Modeling for Computationally Efficient and High-Quality Voice Conversion Using Spectral Differentials.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2010

Impact and Use of the Asymmetric Property in Bi-directional Cooperative Relaying under Asymmetric Traffic Conditions.

[BibT_eX]

[DOI]

IEICE Trans. Commun., 2010

Takaaki Saeki

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...