Takuhiro Kaneko

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024

VoiceGrad: Non-Parallel Any-to-Many Voice Conversion With Annealed Langevin Dynamics.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2024

Unsupervised Intrinsic Image Decomposition with LiDAR Intensity Enhanced Training.

[BibT_eX]

[DOI]

CoRR, 2024

PRVAE-VC2: Non-Parallel Voice Conversion by Distillation of Speech Representations.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

FastVoiceGrad: One-step Diffusion-Based Voice Conversion with Adversarial Conditional Diffusion Distillation.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Selecting N-Lowest Scores for Training MOS Prediction Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Training Generative Adversarial Network-Based Vocoder with Limited Data Using Augmentation-Conditional Discriminator.

[BibT_eX]

[DOI]

Kou Tanaka

Proceedings of the IEEE International Conference on Acoustics, 2024

Learning to Assess Subjective Impressions from Speech.

[BibT_eX]

[DOI]

Proceedings of the 32nd European Signal Processing Conference, 2024

Improving Physics-Augmented Continuum Neural Radiance Field-Based Geometry-Agnostic System Identification with Lagrangian Particle Optimization.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

Non-Parallel Whisper-to-Normal Speaking Style Conversion Using Auxiliary Classifier Variational Autoencoder.

[BibT_eX]

[DOI]

IEEE Access, 2023

PRVAE-VC: Non-Parallel Many-to-Many Voice Conversion with Perturbation-Resistant Variational Autoencoder.

[BibT_eX]

[DOI]

Kou Tanaka

Proceedings of the 12th ISCA Speech Synthesis Workshop, 2023

CFVC: Conditional Filtering for Controllable Voice Conversion.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

iSTFTNet2: Faster and More Lightweight iSTFT-Based Neural Vocoder Using 1D-2D CNN.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Frame-Level Event Representation Learning for Semantic-Level Generation and Editing of Avatar Motion.

[BibT_eX]

[DOI]

Ayaka Ideno

Proceedings of the 25th International Conference on Multimodal Interaction, 2023

MIMO-NeRF: Fast Neural Rendering with Multi-input Multi-output Neural Radiance Fields.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

JSV-VC: Jointly Trained Speaker Verification and Voice Conversion Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Wave-U-Net Discriminator: Fast and Lightweight Discriminator for Generative Adversarial Network-Based Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

W2N-AVSC: Audiovisual Extension For Whisper-To-Normal Speech Conversion.

[BibT_eX]

[DOI]

Proceedings of the 31st European Signal Processing Conference, 2023

Unsupervised Intrinsic Image Decomposition with LiDAR Intensity.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

Distilling Sequence-to-Sequence Voice Conversion Models for Streaming Conversion Applications.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

MISRNet: Lightweight Neural Vocoder Using Multi-Input Single Shared Residual Blocks.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

CAUSE: Crossmodal Action Unit Sequence Estimation from Speech.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

ISTFTNET: Fast and Lightweight Mel-Spectrogram Vocoder Incorporating Inverse Short-Time Fourier Transform.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

AR-NeRF: Unsupervised Learning of Depth and Defocus Effects from Natural Images with Aperture Rendering Neural Radiance Fields.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021

Many-to-Many Voice Transformer Network.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2021

FastS2S-VC: Streaming Non-Autoregressive Sequence-to-Sequence Voice Conversion.

[BibT_eX]

[DOI]

Kou Tanaka

CoRR, 2021

Maskcyclegan-VC: Learning Non-Parallel Voice Conversion with Filling in Frames.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Blur, Noise, and Compression Robust Generative Adversarial Networks.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Unsupervised Learning of Depth and Depth-of-Field Effect From Natural Images With Aperture Rendering Generative Adversarial Networks.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020

ConvS2S-VC: Fully Convolutional Sequence-to-Sequence Voice Conversion.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2020

Nonparallel Voice Conversion With Augmented Classifier Star Generative Adversarial Networks.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2020

CycleGAN-VC3: Examining and Improving CycleGAN-VCs for Mel-Spectrogram Conversion.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Noise Robust Generative Adversarial Networks.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019

ACVAE-VC: Non-Parallel Voice Conversion With Auxiliary Classifier Variational Autoencoder.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2019

Label-Noise Robust Multi-Domain Image-to-Image Translation.

[BibT_eX]

[DOI]

CoRR, 2019

Crossmodal Voice Conversion.

[BibT_eX]

[DOI]

CoRR, 2019

WaveCycleGAN2: Time-domain Neural Post-filter for Speech Waveform Generation.

[BibT_eX]

[DOI]

CoRR, 2019

StarGAN-VC2: Rethinking Conditional Methods for StarGAN-Based Voice Conversion.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

ATTS2S-VC: Sequence-to-sequence Voice Conversion with Attention and Context Preservation Mechanisms.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Cyclegan-VC2: Improved Cyclegan-based Non-parallel Voice Conversion.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Label-Noise Robust Generative Adversarial Networks.

[BibT_eX]

[DOI]

Yoshitaka Ushiku

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Class-Distinct and Class-Mutual Image Generation with GANs.

[BibT_eX]

[DOI]

Yoshitaka Ushiku

Proceedings of the 30th British Machine Vision Conference 2019, 2019

2018

ConvS2S-VC: Fully convolutional sequence-to-sequence voice conversion.

[BibT_eX]

[DOI]

CoRR, 2018

WaveCycleGAN: Synthetic-to-natural speech waveform conversion using cycle-consistent adversarial networks.

[BibT_eX]

[DOI]

CoRR, 2018

ACVAE-VC: Non-parallel many-to-many voice conversion with auxiliary classifier variational autoencoder.

[BibT_eX]

[DOI]

CoRR, 2018

StarGAN-VC: Non-parallel many-to-many voice conversion with star generative adversarial networks.

[BibT_eX]

[DOI]

CoRR, 2018

Generative adversarial network-based approach to signal reconstruction from magnitude spectrograms.

[BibT_eX]

[DOI]

CoRR, 2018

Synthetic-to-Natural Speech Waveform Conversion Using Cycle-Consistent Adversarial Networks.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

StarGAN-VC: non-parallel many-to-many Voice Conversion Using Star Generative Adversarial Networks.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Generative adversarial network-based approach to signal reconstruction from magnitude spectrogram.

[BibT_eX]

[DOI]

Proceedings of the 26th European Signal Processing Conference, 2018

CycleGAN-VC: Non-parallel Voice Conversion Using Cycle-Consistent Adversarial Networks.

[BibT_eX]

[DOI]

Proceedings of the 26th European Signal Processing Conference, 2018

Automatic Speech Pronunciation Correction with Dynamic Frequency Warping-Based Spectral Conversion.

[BibT_eX]

[DOI]

Proceedings of the 26th European Signal Processing Conference, 2018

Generative Adversarial Image Synthesis With Decision Tree Latent Controller.

[BibT_eX]

[DOI]

Kaoru Hiramatsu

Kunio Kashino

Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

2017

Parallel-Data-Free Voice Conversion Using Cycle-Consistent Adversarial Networks.

[BibT_eX]

[DOI]

CoRR, 2017

Generative Adversarial Network-Based Postfilter for STFT Spectrograms.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Sequence-to-Sequence Voice Conversion with Similarity Metric Learned Using Generative Adversarial Networks.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Generative adversarial network-based postfilter for statistical parametric speech synthesis.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Generative Attribute Controller with Conditional Filtered Generative Adversarial Networks.

[BibT_eX]

[DOI]

Kaoru Hiramatsu

Kunio Kashino

Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

Non-native speech conversion with consistency-aware recursive network and generative adversarial network.

[BibT_eX]

[DOI]

Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

2016

Collective activity localization by spatiality preservation search.

[BibT_eX]

[DOI]

Adv. Robotics, 2016

Adaptive Visual Feedback Generation for Facial Expression Improvement with Multi-task Deep Neural Networks.

[BibT_eX]

[DOI]

Kaoru Hiramatsu

Kunio Kashino

Proceedings of the 2016 ACM Conference on Multimedia Conference, 2016

2014

A fully connected model for consistent collective activity recognition in videos.

[BibT_eX]

[DOI]

Pattern Recognit. Lett., 2014

Modeling risk anticipation and defensive driving on residential roads with inverse reinforcement learning.

[BibT_eX]

[DOI]

Masamichi Shimosaka