Kou Tanaka

Orcid: 0009-0003-7107-607X

According to our database¹, Kou Tanaka authored at least 63 papers between 2013 and 2026.

Collaborative distances:

Dijkstra number² of three.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

MeanVoiceFlow: One-step Nonparallel Voice Conversion with Mean Flows.

[BibT_eX]

[DOI]

CoRR, February, 2026

2025

LatentVoiceGrad: Nonparallel Voice Conversion with Latent Diffusion/Flow-Matching Models.

[BibT_eX]

[DOI]

CoRR, September, 2025

Rethinking Mean Opinion Scores in Speech Quality Assessment: Aggregation through Quantized Distribution Fitting.

[BibT_eX]

[DOI]

CoRR, June, 2025

JIS: A Speech Corpus of Japanese Idol Speakers with Various Speaking Styles.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Vocoder-Projected Feature Discriminator.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

FasterVoiceGrad: Faster One-step Diffusion-Based Voice Conversion with Adversarial Diffusion Conversion Distillation.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Rethinking Mean Opinion Scores in Speech Quality Assessment: Score Aggregation through Quantized Distribution Fitting.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

2024

VoiceGrad: Non-Parallel Any-to-Many Voice Conversion With Annealed Langevin Dynamics.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2024

PRVAE-VC2: Non-Parallel Voice Conversion by Distillation of Speech Representations.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

FastVoiceGrad: One-step Diffusion-Based Voice Conversion with Adversarial Conditional Diffusion Distillation.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Selecting N-Lowest Scores for Training MOS Prediction Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Training Generative Adversarial Network-Based Vocoder with Limited Data Using Augmentation-Conditional Discriminator.

[BibT_eX]

[DOI]

Takuhiro Kaneko

Hirokazu Kameoka

Kou Tanaka

Proceedings of the IEEE International Conference on Acoustics, 2024

Learning to Assess Subjective Impressions from Speech.

[BibT_eX]

[DOI]

Proceedings of the 32nd European Signal Processing Conference, 2024

2023

Non-Parallel Whisper-to-Normal Speaking Style Conversion Using Auxiliary Classifier Variational Autoencoder.

[BibT_eX]

[DOI]

IEEE Access, 2023

PRVAE-VC: Non-Parallel Many-to-Many Voice Conversion with Perturbation-Resistant Variational Autoencoder.

[BibT_eX]

[DOI]

Kou Tanaka

Hirokazu Kameoka

Takuhiro Kaneko

Proceedings of the 12th ISCA Speech Synthesis Workshop, 2023

CFVC: Conditional Filtering for Controllable Voice Conversion.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

iSTFTNet2: Faster and More Lightweight iSTFT-Based Neural Vocoder Using 1D-2D CNN.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

JSV-VC: Jointly Trained Speaker Verification and Voice Conversion Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Wave-U-Net Discriminator: Fast and Lightweight Discriminator for Generative Adversarial Network-Based Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

W2N-AVSC: Audiovisual Extension For Whisper-To-Normal Speech Conversion.

[BibT_eX]

[DOI]

Proceedings of the 31st European Signal Processing Conference, 2023

2022

Distilling Sequence-to-Sequence Voice Conversion Models for Streaming Conversion Applications.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

MISRNet: Lightweight Neural Vocoder Using Multi-Input Single Shared Residual Blocks.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

CAUSE: Crossmodal Action Unit Sequence Estimation from Speech.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

ISTFTNET: Fast and Lightweight Mel-Spectrogram Vocoder Incorporating Inverse Short-Time Fourier Transform.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

Many-to-Many Voice Transformer Network.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2021

FastS2S-VC: Streaming Non-Autoregressive Sequence-to-Sequence Voice Conversion.

[BibT_eX]

[DOI]

Hirokazu Kameoka

Kou Tanaka

Takuhiro Kaneko

CoRR, 2021

Maskcyclegan-VC: Learning Non-Parallel Voice Conversion with Filling in Frames.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

2020

ConvS2S-VC: Fully Convolutional Sequence-to-Sequence Voice Conversion.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2020

Nonparallel Voice Conversion With Augmented Classifier Star Generative Adversarial Networks.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2020

Collection and Analysis of Dialogues Provided by Two Speakers Acting as One.

[BibT_eX]

[DOI]

Tsunehiro Arimoto

Ryuichiro Higashinaka

Proceedings of the 21th Annual Meeting of the Special Interest Group on Discourse and Dialogue, 2020

CycleGAN-VC3: Examining and Improving CycleGAN-VCs for Mel-Spectrogram Conversion.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Phoneme Embeddings on Predicting Fundamental Frequency Pattern for Electrolaryngeal Speech.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2020

2019

ACVAE-VC: Non-Parallel Voice Conversion With Auxiliary Classifier Variational Autoencoder.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2019

The ASVspoof 2019 database.

[BibT_eX]

[DOI]

CoRR, 2019

Crossmodal Voice Conversion.

[BibT_eX]

[DOI]

CoRR, 2019

WaveCycleGAN2: Time-domain Neural Post-filter for Speech Waveform Generation.

[BibT_eX]

[DOI]

CoRR, 2019

An Investigation of Features for Fundamental Frequency Pattern Prediction in Electrolaryngeal Speech Enhancement.

[BibT_eX]

[DOI]

Proceedings of the 10th ISCA Speech Synthesis Workshop, 2019

StarGAN-VC2: Rethinking Conditional Methods for StarGAN-Based Voice Conversion.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

ATTS2S-VC: Sequence-to-sequence Voice Conversion with Attention and Context Preservation Mechanisms.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Cyclegan-VC2: Improved Cyclegan-based Non-parallel Voice Conversion.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

2018

ConvS2S-VC: Fully convolutional sequence-to-sequence voice conversion.

[BibT_eX]

[DOI]

CoRR, 2018

WaveCycleGAN: Synthetic-to-natural speech waveform conversion using cycle-consistent adversarial networks.

[BibT_eX]

[DOI]

CoRR, 2018

ACVAE-VC: Non-parallel many-to-many voice conversion with auxiliary classifier variational autoencoder.

[BibT_eX]

[DOI]

CoRR, 2018

StarGAN-VC: Non-parallel many-to-many voice conversion with star generative adversarial networks.

[BibT_eX]

[DOI]

CoRR, 2018

Generative adversarial network-based approach to signal reconstruction from magnitude spectrograms.

[BibT_eX]

[DOI]

CoRR, 2018

Synthetic-to-Natural Speech Waveform Conversion Using Cycle-Consistent Adversarial Networks.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

StarGAN-VC: non-parallel many-to-many Voice Conversion Using Star Generative Adversarial Networks.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Vae-Space: Deep Generative Model of Voice Fundamental Frequency Contours.

[BibT_eX]

[DOI]

Kou Tanaka

Hirokazu Kameoka

Kazuho Morikawa

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Generative adversarial network-based approach to signal reconstruction from magnitude spectrogram.

[BibT_eX]

[DOI]

Proceedings of the 26th European Signal Processing Conference, 2018

Automatic Speech Pronunciation Correction with Dynamic Frequency Warping-Based Spectral Conversion.

[BibT_eX]

[DOI]

Proceedings of the 26th European Signal Processing Conference, 2018

2017

A Vibration Control Method of an Electrolarynx Based on Statistical F0 Pattern Prediction.

[BibT_eX]

[DOI]

Kou Tanaka

Tomoki Toda

Satoshi Nakamura

IEICE Trans. Inf. Syst., 2017

Physically Constrained Statistical F0 Prediction for Electrolaryngeal Speech Enhancement.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

2016

Statistical F0 prediction for electrolaryngeal speech enhancement considering generative process of F0 contours within product of experts framework.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Real-time vibration control of an electrolarynx based on statistical F0 contour prediction.

[BibT_eX]

[DOI]

Proceedings of the 24th European Signal Processing Conference, 2016

2015

Non-audible murmur enhancement based on statistical conversion using air- and body-conductive microphones in noisy environments.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

The NAIST Text-to-Speech System for the Blizzard Challenge 2015.

[BibT_eX]

[DOI]

Proceedings of the Blizzard Challenge 2015, 2015

An Enhanced Electrolarynx with Automatic Fundamental Frequency Control based on Statistical Prediction.

[BibT_eX]

[DOI]

Proceedings of the 17th International ACM SIGACCESS Conference on Computers & Accessibility, 2015

2014

A Hybrid Approach to Electrolaryngeal Speech Enhancement Based on Noise Reduction and Statistical Excitation Generation.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2014

Direct F0 control of an electrolarynx based on statistical excitation feature prediction and its evaluation through simulation.

[BibT_eX]

[DOI]

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

An evaluation of excitation feature prediction in a hybrid approach to electrolaryngeal speech enhancement.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2014

An evaluation of target speech for a nonaudible murmur enhancement system in noisy environments.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2014

An inter-speaker evaluation through simulation of electrolarynx control based on statistical F0 prediction.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2014

2013

A hybrid approach to electrolaryngeal speech enhancement based on spectral subtraction and statistical voice conversion.

[BibT_eX]

[DOI]

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Kou Tanaka

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...