Yuki Saito

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

Frame-Wise Breath Detection with Self-Training: An Exploration of Enhancing Breath Naturalness in Text-to-Speech.

[BibT_eX]

[DOI]

Dong Yang

Tomoki Koriyama

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Spatial Voice Conversion: Voice Conversion Preserving Spatial Information and Non-target Signals.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

SRC4VC: Smartphone-Recorded Corpus for Voice Conversion Benchmark.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Noise-Robust Voice Conversion by Conditional Denoising Training Using Latent Variables of Recording Quality and Environment.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

STYLECAP: Automatic Speaking-Style Captioning from Speech Based on Speech and Language Self-Supervised Learning Models.

[BibT_eX]

[DOI]

Kazuki Yamauchi

Yusuke Ijima

Proceedings of the IEEE International Conference on Acoustics, 2024

NecoBERT: Self-Supervised Learning Model Trained by Masked Language Modeling on Rich Acoustic Features Derived from Neural Audio Codec.

[BibT_eX]

[DOI]

Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2024

Real-Time Noise Estimation for Lombard-Effect Speech Synthesis in Human-Avatar Dialogue Systems.

[BibT_eX]

[DOI]

Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2024

2023

JVNV: A Corpus of Japanese Emotional Speech with Verbal Content and Nonverbal Expressions.

[BibT_eX]

[DOI]

Dataset, October, 2023

Federated Learning for Human-in-the-Loop Many-to-Many Voice Conversion.

[BibT_eX]

[DOI]

Ryunosuke Hirai

Proceedings of the 12th ISCA Speech Synthesis Workshop, 2023

HumanDiffusion: diffusion model using perceptual gradients.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

ChatGPT-EDSS: Empathetic Dialogue Speech Synthesis Trained from ChatGPT-derived Context Word Embeddings.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

CALLS: Japanese Empathetic Dialogue Speech Corpus of Complaint Handling and Attentive Listening in Customer Center.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Duration-Aware Pause Insertion Using Pre-Trained Language Model for Multi-Speaker Text-To-Speech.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

MID-Attribute Speaker Generation Using Optimal-Transport-Based Interpolation of Gaussian Mixture Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

COCO-NUT: Corpus of Japanese Utterance and Voice Characteristics Description for Prompt-Based Control.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022

Multi-Task Adversarial Training Algorithm for Multi-Speaker Neural Text-to-Speech.

[BibT_eX]

[DOI]

CoRR, 2022

Human-in-the-loop Speaker Adaptation for DNN-based Multi-speaker TTS.

[BibT_eX]

[DOI]

Kenta Udagawa

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

STUDIES: Corpus of Japanese Empathetic Dialogue Speech Towards Friendly Voice Agent.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Acoustic Modeling for End-to-End Empathetic Dialogue Speech Synthesis Using Linguistic and Prosodic Contexts of Dialogue History.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Predicting VQVAE-based Character Acting Style from Quotation-Annotated Text for Audiobook Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

2021

Perceptual-Similarity-Aware Deep Speaker Representation Learning for Multi-Speaker Generative Modeling.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2021

Real-Time Full-Band Voice Conversion with Sub-Band Modeling and Data-Driven Phase Estimation of Spectral Differentials.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2021

DNN-Based Low-Musical-Noise Single-Channel Speech Enhancement Based on Higher-Order-Moments Matching.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2021

Cross-Lingual Speaker Adaptation Using Domain Adaptation and Speaker Consistency Loss for Text-To-Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Humanacgan: Conditional Generative Adversarial Network with Human-Based Auxiliary Classifier and its Evaluation in Phoneme Perception.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Emotion-Controllable Speech Synthesis Using Emotion Soft Labels and Fine-Grained Prosody Factors.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

2020

Phase reconstruction from amplitude spectrograms based on directional-statistics deep neural networks.

[BibT_eX]

[DOI]

Signal Process., 2020

Joint Adversarial Training of Speech Recognition and Synthesis Models for Many-to-One Voice Conversion Using Phonetic Posteriorgrams.

[BibT_eX]

[DOI]

Kei Akuzawa

Kentaro Tachibana

IEICE Trans. Inf. Syst., 2020

Generative Moment Matching Network-Based Neural Double-Tracking for Synthesized and Natural Singing Voices.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2020

DNN-based Speech Synthesis Using Abundant Tags of Spontaneous Speech Corpus.

[BibT_eX]

[DOI]

Proceedings of The 12th Language Resources and Evaluation Conference, 2020

SMASH Corpus: A Spontaneous Speech Corpus Recording Third-person Audio Commentaries on Gameplay.

[BibT_eX]

[DOI]

Proceedings of The 12th Language Resources and Evaluation Conference, 2020

Investigating Effective Additional Contextual Factors in DNN-Based Spontaneous Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Cross-Lingual Text-To-Speech Synthesis via Domain Adaptation and Perceptual Similarity Regression in Speaker Space.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Real-Time, Full-Band, Online DNN-Based Voice Conversion System Using a Single CPU.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Face2Speech: Towards Multi-Speaker Text-to-Speech Synthesis Using an Embedding Vector Predicted from a Face Image.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Lifter Training and Sub-Band Modeling for Computationally Efficient and High-Quality Voice Conversion Using Spectral Differentials.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Humangan: Generative Adversarial Network With Human-Based Discriminator And Its Evaluation In Speech Perception Modeling.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

Vocoder-free text-to-speech synthesis incorporating generative adversarial networks using low-/multi-frequency STFT amplitude spectra.

[BibT_eX]

[DOI]

Comput. Speech Lang., 2019

JVS corpus: free Japanese multi-speaker voice corpus.

[BibT_eX]

[DOI]

CoRR, 2019

DNN-based Speaker Embedding Using Subjective Inter-speaker Similarity for Multi-speaker Modeling in Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 10th ISCA Speech Synthesis Workshop, 2019

V2S attack: building DNN-based voice conversion from automatic speaker verification.

[BibT_eX]

[DOI]

Proceedings of the 10th ISCA Speech Synthesis Workshop, 2019

Generative Moment Matching Network-based Random Modulation Post-filter for DNN-based Singing Voice Synthesis and Neural Double-tracking.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

2018

Statistical Parametric Speech Synthesis Incorporating Generative Adversarial Networks.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2018

Phase Reconstruction from Amplitude Spectrograms Based on Von-Mises-Distribution Deep Neural Network.

[BibT_eX]

[DOI]

Proceedings of the 16th International Workshop on Acoustic Signal Enhancement, 2018

Text-to-Speech Synthesis Using STFT Spectra Based on Low-/Multi-Resolution Generative Adversarial Networks.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Non-Parallel Voice Conversion Using Variational Autoencoders Conditioned by Phonetic Posteriorgrams and D-Vectors.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Generative approach using the noise generation models for DNN-based speech synthesis trained from noisy speech.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018

2017

Voice Conversion Using Input-to-Output Highway Networks.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2017

Voice Conversion Using Sequence-to-Sequence Learning of Context Posterior Probabilities.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Training algorithm to deceive Anti-Spoofing Verification for DNN-based speech synthesis.

[BibT_eX]

[DOI]