We stand with Ukraine

We stand with Ukraine

Yusuke Ijima

According to our database¹, Yusuke Ijima authored at least 60 papers between 2008 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2025

Multi-interaction TTS toward professional recording reproduction.

[BibT_eX]

[DOI]

Hiroki Kanagawa

,

,

,

CoRR, July, 2025

Voice Impression Control in Zero-Shot TTS.

[BibT_eX]

[DOI]

Keinichi Fujita

,

Shota Horiguchi

,

CoRR, June, 2025

Voice Impression Control in Zero-Shot TTS.

[BibT_eX]

[DOI]

,

Shota Horiguchi

,

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

2024

Speech Rhythm-Based Speaker Embeddings Extraction from Phonemes and Phoneme Duration for Multi-Speaker Speech Synthesis.

[BibT_eX]

[DOI]

,

,

IEICE Trans. Inf. Syst., January, 2024

Unveiling the Linguistic Capabilities of a Self-Supervised Speech Model Through Cross-Lingual Benchmark and Layer- Wise Similarity Analysis.

[BibT_eX]

[DOI]

Takanori Ashihara

,

,

,

IEEE Access, 2024

Pre-training Neural Transducer-based Streaming Voice Conversion for Faster Convergence and Alignment-free Training.

[BibT_eX]

[DOI]

Hiroki Kanagawa

,

Takafumi Moriya

,

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Knowledge Distillation from Self-Supervised Representation Learning Model with Discrete Speech Units for Any-to-Any Streaming Voice Conversion.

[BibT_eX]

[DOI]

Hiroki Kanagawa

,

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Lightweight Zero-shot Text-to-Speech with Mixture of Adapters.

[BibT_eX]

[DOI]

,

Takanori Ashihara

,

,

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

STYLECAP: Automatic Speaking-Style Captioning from Speech Based on Speech and Language Self-Supervised Learning Models.

[BibT_eX]

[DOI]

Kazuki Yamauchi

,

,

Proceedings of the IEEE International Conference on Acoustics, 2024

Noise-Robust Zero-Shot Text-to-Speech Synthesis Conditioned on Self-Supervised Speech-Representation Model with Adapters.

[BibT_eX]

[DOI]

,

,

Takanori Ashihara

,

Hiroki Kanagawa

,

,

Takafumi Moriya

,

Proceedings of the IEEE International Conference on Acoustics, 2024

What Do Self-Supervised Speech and Speaker Models Learn? New Findings from a Cross Model Layer-Wise Analysis.

[BibT_eX]

[DOI]

Takanori Ashihara

,

,

Takafumi Moriya

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2024

2023

Influence of Personal Traits on Impressions of One's Own Voice.

[BibT_eX]

[DOI]

Hikaru Yanagida

,

,

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

A stimulus-organism-response model of willingness to buy from advertising speech using voice quality.

[BibT_eX]

[DOI]

,

,

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

VC-T: Streaming Voice Conversion Based on Neural Transducer.

[BibT_eX]

[DOI]

Hiroki Kanagawa

,

Takafumi Moriya

,

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

SpeechGLUE: How Well Can Self-Supervised Speech Models Capture Linguistic Knowledge?

[BibT_eX]

[DOI]

Takanori Ashihara

,

Takafumi Moriya

,

,

Tomohiro Tanaka

,

,

,

,

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Enhancement of Text-Predicting Style Token With Generative Adversarial Network for Expressive Speech Synthesis.

[BibT_eX]

[DOI]

Hiroki Kanagawa

,

Proceedings of the IEEE International Conference on Acoustics, 2023

Zero-Shot Text-to-Speech Synthesis Conditioned Using Self-Supervised Speech Representation Model.

[BibT_eX]

[DOI]

,

Takanori Ashihara

,

Hiroki Kanagawa

,

Takafumi Moriya

,

Proceedings of the IEEE International Conference on Acoustics, 2023

2022

SIMD-Size Aware Weight Regularization for Fast Neural Vocoding on CPU.

[BibT_eX]

[DOI]

Hiroki Kanagawa

,

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Automated Recognition of Off Phenomenon in Parkinson's Disease During Walking : - Measurement in Daily Life with Wearable Device -.

[BibT_eX]

[DOI]

,

,

,

,

Sadayoshi Mikami

Proceedings of the 4th IEEE Global Conference on Life Sciences and Technologies, 2022

Predicting VQVAE-based Character Acting Style from Quotation-Annotated Text for Audiobook Speech Synthesis.

[BibT_eX]

[DOI]

,

Tomoki Koriyama

,

Shinnosuke Takamichi

,

,

,

,

Hiroshi Saruwatari

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Joint Modeling of Multi-Sample and Subband Signals for Fast Neural Vocoding on CPU.

[BibT_eX]

[DOI]

Hiroki Kanagawa

,

,

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Multi-Sample Subband Wavernn Via Multivariate Gaussian.

[BibT_eX]

[DOI]

Hiroki Kanagawa

,

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

Model architectures to extrapolate emotional expressions in DNN-based text-to-speech.

[BibT_eX]

[DOI]

,

,

,

,

Speech Commun., 2021

Audiobook Speech Synthesis Conditioned by Cross-Sentence Context-Aware Word Embeddings.

[BibT_eX]

[DOI]

,

Tomoki Koriyama

,

Shinnosuke Takamichi

,

,

,

,

Hiroshi Saruwatari

Proceedings of the 11th ISCA Speech Synthesis Workshop, 2021

Impact of Emotional State on Estimation of Willingness to Buy from Advertising Speech.

[BibT_eX]

[DOI]

,

,

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Phonetic and Prosodic Information Estimation from Texts for Genuine Japanese End-to-End Text-to-Speech.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Phoneme Duration Modeling Using Speech Rhythm-Based Speaker Embeddings for Multi-Speaker Speech Synthesis.

[BibT_eX]

[DOI]

,

,

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Simpleflat: A Simple Whole-Network Pre-Training Approach for RNN Transducer-Based End-to-End Speech Recognition.

[BibT_eX]

[DOI]

Takafumi Moriya

,

Takanori Ashihara

,

Tomohiro Tanaka

,

,

,

,

,

,

Yusuke Shinohara

Proceedings of the IEEE International Conference on Acoustics, 2021

Speech Emotion Recognition Based on Listener Adaptive Models.

[BibT_eX]

[DOI]

,

,

,

Takafumi Moriya

,

Takanori Ashihara

,

,

Proceedings of the IEEE International Conference on Acoustics, 2021

Robust Speech-Age Estimation Using Local Maximum Mean Discrepancy Under Mismatched Recording Conditions.

[BibT_eX]

[DOI]

,

,

,

Hosana Kamiyama

,

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

2020

DNN-based Speech Synthesis Using Abundant Tags of Spontaneous Speech Corpus.

[BibT_eX]

[DOI]

,

Tomoki Koriyama

,

,

Shinnosuke Takamichi

,

,

,

Hiroshi Saruwatari

Proceedings of The 12th Language Resources and Evaluation Conference, 2020

Investigating Effective Additional Contextual Factors in DNN-Based Spontaneous Speech Synthesis.

[BibT_eX]

[DOI]

,

Tomoki Koriyama

,

,

Shinnosuke Takamichi

,

,

,

Hiroshi Saruwatari

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Lightweight LPCNet-Based Neural Vocoder with Tensor Decomposition.

[BibT_eX]

[DOI]

Hiroki Kanagawa

,

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

2019

V2S attack: building DNN-based voice conversion from automatic speaker verification.

[BibT_eX]

[DOI]

,

,

Shinnosuke Takamichi

,

,

Hiroshi Saruwatari

Proceedings of the 10th ISCA Speech Synthesis Workshop, 2019

Multi-Speaker Modeling for DNN-based Speech Synthesis Incorporating Generative Adversarial Networks.

[BibT_eX]

[DOI]

Hiroki Kanagawa

,

Proceedings of the 10th ISCA Speech Synthesis Workshop, 2019

End-to-End Automatic Speech Recognition with a Reconstruction Criterion Using Speech-to-Text and Text-to-Speech Encoder-Decoders.

[BibT_eX]

[DOI]

,

,

Tomohiro Tanaka

,

Takafumi Moriya

,

,

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Can We Simulate Generative Process of Acoustic Modeling Data? Towards Data Restoration for Acoustic Modeling.

[BibT_eX]

[DOI]

,

,

Satoshi Kobashikawa

,

,

Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

2018

DNN-Based Speech Synthesis Using Speaker Codes.

[BibT_eX]

[DOI]

,

,

Hideyuki Mizuno

IEICE Trans. Inf. Syst., 2018

Non-Parallel Voice Conversion Using Variational Autoencoders Conditioned by Phonetic Posteriorgrams and D-Vectors.

[BibT_eX]

[DOI]

,

,

Kyosuke Nishida

,

Shinnosuke Takamichi

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Neural Confnet Classification: Fully Neural Network Based Spoken Utterance Classification Using Word Confusion Networks.

[BibT_eX]

[DOI]

,

,

,

Hirokazu Masataki

,

Ryuichiro Higashinaka

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Soft-Target Training with Ambiguous Emotional Utterances for DNN-Based Speech Emotion Classification.

[BibT_eX]

[DOI]

,

Satoshi Kobashikawa

,

Hosana Kamiyama

,

,

,

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017

Prosody Aware Word-Level Encoder Based on BLSTM-RNNs for DNN-Based Speech Synthesis.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

DNN-SPACE: DNN-HMM-Based Generative Model of Voice F<sub>0</sub> Contours for Statistical Phrase/Accent Command Estimation.

[BibT_eX]

[DOI]

,

Yasuhito Ohsugi

,

,

Hirokazu Kameoka

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Generative adversarial network-based postfilter for statistical parametric speech synthesis.

[BibT_eX]

[DOI]

Takuhiro Kaneko

,

Hirokazu Kameoka

,

,

,

Kaoru Hiramatsu

,

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

An investigation to transplant emotional expressions in DNN-based TTS synthesis.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

2016

Objective Evaluation Using Association Between Dimensions Within Spectral Features for Statistical Parametric Speech Synthesis.

[BibT_eX]

[DOI]

,

,

Hideyuki Mizuno

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

An Investigation of DNN-Based Speech Synthesis Using Speaker Codes.

[BibT_eX]

[DOI]

,

,

Hideyuki Mizuno

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

2015

Statistical model training technique based on speaker clustering approach for HMM-based speech synthesis.

[BibT_eX]

[DOI]

,

Noboru Miyazaki

,

Hideyuki Mizuno

,

Sumitaka Sakauchi

Speech Commun., 2015

Similar Speaker Selection Technique Based on Distance Metric Learning Using Highly Correlated Acoustic Features with Perceptual Voice Quality Similarity.

[BibT_eX]

[DOI]

,

Hideyuki Mizuno

IEICE Trans. Inf. Syst., 2015

Sub-band text-to-speech combining sample-based spectrum with statistically generated spectrum.

[BibT_eX]

[DOI]

,

,

,

,

Noboru Miyazaki

,

Hideyuki Mizuno

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

2014

Prosodic variation enhancement using unsupervised context labeling for HMM-based expressive speech synthesis.

[BibT_eX]

[DOI]

,

,

Takao Kobayashi

,

Tomoki Koriyama

,

,

Hideharu Nakajima

,

Hideyuki Mizuno

,

Speech Commun., 2014

2013

Statistical model training technique for speech synthesis based on speaker class.

[BibT_eX]

[DOI]

,

Noboru Miyazaki

,

Hideyuki Mizuno

Proceedings of the Eighth ISCA Tutorial and Research Workshop on Speech Synthesis, 2013

HMM-based expressive speech synthesis based on phrase-level F0 context labeling.

[BibT_eX]

[DOI]

,

,

Takao Kobayashi

,

Tomoki Koriyama

,

,

Hideharu Nakajima

,

Hideyuki Mizuno

,

Proceedings of the IEEE International Conference on Acoustics, 2013

2012

Similar Speaker Selection Technique Based on Distance Metric Learning with Perceptual Voice Quality Similarity.

[BibT_eX]

[DOI]

,

Mitsuaki Isogai

,

Hideyuki Mizuno

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

2011

HMM-Based Emphatic Speech Synthesis Using Unsupervised Context Labeling.

[BibT_eX]

[DOI]

,

,

Takao Kobayashi

,

,

Hideharu Nakajima

,

Hideyuki Mizuno

,

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Correlation Analysis of Acoustic Features with Perceptual Voice Quality Similarity for Similar Speaker Selection.

[BibT_eX]

[DOI]

,

Mitsuaki Isogai

,

Hideyuki Mizuno

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

2010

A Rapid Model Adaptation Technique for Emotional Speech Recognition with Style Estimation Based on Multiple-Regression HMM.

[BibT_eX]

[DOI]

,

,

Makoto Tachibana

,

Takao Kobayashi

IEICE Trans. Inf. Syst., 2010

2009

Speaking style adaptation for spontaneous speech recognition using multiple-regression HMM.

[BibT_eX]

[DOI]

,

Takeshi Matsubara

,

,

Takao Kobayashi

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Emotional speech recognition based on style estimation and adaptation with multiple-regression HMM.

[BibT_eX]

[DOI]

,

Makoto Tachibana

,

,

Takao Kobayashi

Proceedings of the IEEE International Conference on Acoustics, 2009

2008

An on-line adaptation technique for emotional speech recognition using style estimation with multiple-regression HMM.

[BibT_eX]

[DOI]

,

Makoto Tachibana

,

,

Takao Kobayashi

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Loading...