Tomoki Koriyama

CoRR, February, 2026

Speaker-conditioned phrase break prediction for text-to-speech with phoneme-level pre-trained language model.

[BibT_eX]

[DOI]

Speech Commun., 2026

2025

Prosody Labeling with Phoneme-BERT and Speech Foundation Models.

[BibT_eX]

[DOI]

CoRR, July, 2025

Eigenvoice Synthesis based on Model Editing for Speaker Generation.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Speaker-agnostic Emotion Vector for Cross-speaker Emotion Intensity Control.

[BibT_eX]

[DOI]

Masato Murata

Koichi Miyazaki

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

2024

Frame-Wise Breath Detection with Self-Training: An Exploration of Enhancing Breath Naturalness in Text-to-Speech.

[BibT_eX]

[DOI]

Dong Yang

Yuki Saito

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

An Attribute Interpolation Method in Speech Synthesis by Model Merging.

[BibT_eX]

[DOI]

Masato Murata

Koichi Miyazaki

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

VAE-based Phoneme Alignment Using Gradient Annealing and SSL Acoustic Features.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

2023

Duration-Aware Pause Insertion Using Pre-Trained Language Model for Multi-Speaker Text-To-Speech.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Structured State Space Decoder for Speech Recognition and Synthesis.

[BibT_eX]

[DOI]

Koichi Miyazaki

Masato Murata

Proceedings of the IEEE International Conference on Acoustics, 2023

2022

UTMOS: UTokyo-SaruLab System for VoiceMOS Challenge 2022.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Predicting VQVAE-based Character Acting Style from Quotation-Annotated Text for Audiobook Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

2021

Deep Gaussian process based multi-speaker speech synthesis with latent speaker representation.

[BibT_eX]

[DOI]

Kentaro Mitsui

Speech Commun., 2021

Accent Modeling of Low-Resourced Dialect in Pitch Accent Language Using Variational Autoencoder.

[BibT_eX]

[DOI]

Proceedings of the 11th ISCA Speech Synthesis Workshop, 2021

Audiobook Speech Synthesis Conditioned by Cross-Sentence Context-Aware Word Embeddings.

[BibT_eX]

[DOI]

Proceedings of the 11th ISCA Speech Synthesis Workshop, 2021

Cross-Lingual Speaker Adaptation Using Domain Adaptation and Speaker Consistency Loss for Text-To-Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Sequence-to-Sequence Learning for Deep Gaussian Process Based Speech Synthesis Using Self-Attention GP Layer.

[BibT_eX]

[DOI]

Taiki Nakamura

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Harmonic WaveGAN: GAN-Based Speech Waveform Generation Model with Harmonic Structure Discriminator.

[BibT_eX]

[DOI]

Kazuki Mizuta

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Emotion-Controllable Speech Synthesis Using Emotion Soft Labels and Fine-Grained Prosody Factors.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

2020

Generative Moment Matching Network-Based Neural Double-Tracking for Synthesized and Natural Singing Voices.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2020

DNN-based Speech Synthesis Using Abundant Tags of Spontaneous Speech Corpus.

[BibT_eX]

[DOI]

Proceedings of The 12th Language Resources and Evaluation Conference, 2020

Investigating Effective Additional Contextual Factors in DNN-Based Spontaneous Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Cross-Lingual Text-To-Speech Synthesis via Domain Adaptation and Perceptual Similarity Regression in Speaker Space.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Multi-Speaker Text-to-Speech Synthesis Using Deep Gaussian Processes.

[BibT_eX]

[DOI]

Kentaro Mitsui

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Utterance-Level Sequential Modeling for Deep Gaussian Process Based Speech Synthesis Using Simple Recurrent Unit.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

Statistical Parametric Speech Synthesis Using Deep Gaussian Processes.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2019

JVS corpus: free Japanese multi-speaker voice corpus.

[BibT_eX]

[DOI]

CoRR, 2019

Sparse Approximation of Gram Matrices for GMMN-based Speech Synthesis.

[BibT_eX]

[DOI]

Shinnosuke Takamichi

Proceedings of the 10th ISCA Speech Synthesis Workshop, 2019

Semi-Supervised Prosody Modeling Using Deep Gaussian Process Latent Variable Model.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Generative Moment Matching Network-based Random Modulation Post-filter for DNN-based Singing Voice Synthesis and Neural Double-tracking.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

A Training Method Using DNN-guided Layerwise Pretraining for Deep Gaussian Processes.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

2018

GPR-based Thai speech synthesis using multi-level duration prediction.

[BibT_eX]

[DOI]

Speech Commun., 2018

2017

Sampling-Based Speech Parameter Generation Using Moment-Matching Networks.

[BibT_eX]

[DOI]

Shinnosuke Takamichi

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Duration prediction using multiple Gaussian process experts for GPR-based speech synthesis.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Enhanced F0 generation for GPR-based speech synthesis considering syllable-based prosodic features.

[BibT_eX]

[DOI]

Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

Speech emotion recognition using convolutional long short-term memory neural network and support vector machines.

[BibT_eX]

[DOI]

Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

2016

Unsupervised Stress Information Labeling Using Gaussian Process Latent Variable Model for Statistical Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

A speaker adaptation technique for Gaussian process regression based speech synthesis using feature space transform.

[BibT_eX]

[DOI]

Syohei Oshio

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2015

HMM-based expressive singing voice synthesis with singing style control and robust pitch modeling.

[BibT_eX]

[DOI]

Comput. Speech Lang., 2015

Duration prediction using multi-level model for GPR-based speech synthesis.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

A comparison of speech synthesis systems based on GPR, HMM, and DNN with a small amount of training data.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Prosody generation using frame-based Gaussian process regression and classification for statistical parametric speech synthesis.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

2014

Prosodic variation enhancement using unsupervised context labeling for HMM-based expressive speech synthesis.

[BibT_eX]

[DOI]

Speech Commun., 2014

Statistical Parametric Speech Synthesis Based on Gaussian Process Regression.

[BibT_eX]

[DOI]

IEEE J. Sel. Top. Signal Process., 2014

Parametric speech synthesis using local and global sparse Gaussian processes.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Workshop on Machine Learning for Signal Processing, 2014

Transform mapping using shared decision tree context clustering for HMM-based cross-lingual speech synthesis.

[BibT_eX]

[DOI]

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Accent type and phrase boundary estimation using acoustic and language models for automatic prosodic labeling.

[BibT_eX]

[DOI]

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Parametric speech synthesis based on Gaussian process regression using global variance and hyperparameter optimization.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2014

HMM-based Thai speech synthesis using unsupervised stress context labeling.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2014

2013

A style control technique for singing voice synthesis based on multiple-regression HSMM.

[BibT_eX]

[DOI]

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Statistical nonparametric speech synthesis using sparse Gaussian processes.

[BibT_eX]

[DOI]

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

HMM-based expressive speech synthesis based on phrase-level F0 context labeling.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2013

Frame-level acoustic modeling based on Gaussian process regression for statistical nonparametric speech synthesis.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2013

2012

Discontinuous Observation HMM for Prosodic-Event-Based F0 Generation.

[BibT_eX]

[DOI]

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

An F0 modeling technique based on prosodic events for spontaneous speech synthesis.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

2011

On the Use of Extended Context for HMM-Based Spontaneous Conversational Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

2010

Conversational spontaneous speech synthesis using average voice model.

[BibT_eX]

[DOI]