Junichi Yamagishi

Orcid: 0000-0003-2752-3955

Affiliations:
  • National Institute of Informatics, Tokyo, Japan
  • University of Edinburgh, Scotland, UK (former)


According to our database1, Junichi Yamagishi authored at least 405 papers between 2002 and 2025.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
Speech Generation for Indigenous Language Education.
Comput. Speech Lang., 2025

2024
The VoicePrivacy 2022 Challenge: Progress and Perspectives in Voice Anonymisation.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

ZMM-TTS: Zero-Shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-Supervised Discrete Speech Representations.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

Joint speaker encoder and neural back-end model for fully end-to-end automatic speaker verification with multiple enrollment utterances.
Comput. Speech Lang., 2024

Improving curriculum learning for target speaker extraction with synthetic speakers.
CoRR, 2024

AfriHuBERT: A self-supervised speech representation model for African languages.
CoRR, 2024

The VoiceMOS Challenge 2024: Beyond Speech Quality Prediction.
CoRR, 2024

Spoofing-Aware Speaker Verification Robust Against Domain and Channel Mismatches.
CoRR, 2024

Disentangling the Prosody and Semantic Information with Pre-trained Model for In-Context Learning based Zero-Shot Voice Conversion.
CoRR, 2024

A Preliminary Case Study on Long-Form In-the-Wild Audio Spoofing Detection.
CoRR, 2024

ASVspoof 5: Crowdsourced Speech Data, Deepfakes, and Adversarial Attacks at Scale.
CoRR, 2024

Revisiting and Improving Scoring Fusion for Spoofing-aware Speaker Verification Using Compositional Data Analysis.
CoRR, 2024

An Initial Investigation of Language Adaptation for TTS Systems under Low-resource Scenarios.
CoRR, 2024

Generating Speakers by Prompting Listener Impressions for Pre-trained Multi-Speaker Text-to-Speech Systems.
CoRR, 2024

Target Speaker Extraction with Curriculum Learning.
CoRR, 2024

Spoof Diarization: "What Spoofed When" in Partially Spoofed Audio.
CoRR, 2024

To what extent can ASV systems naturally defend against spoofing attacks?
CoRR, 2024

Exploring Self-Supervised Vision Transformers for Deepfake Detection: A Comparative Analysis.
CoRR, 2024

The VoicePrivacy 2024 Challenge Evaluation Plan.
CoRR, 2024

Analysis of Fine-Grained Counting Methods for Masked Face Counting: A Comparative Study.
IEEE Access, 2024

eKYC-DF: A Large-Scale Deepfake Dataset for Developing and Evaluating eKYC Systems.
IEEE Access, 2024

Uncertainty as a Predictor: Leveraging Self-Supervised Learning for Zero-Shot MOS Prediction.
Proceedings of the IEEE International Conference on Acoustics, 2024

Synvox2: Towards A Privacy-Friendly Voxceleb2 Dataset.
Proceedings of the IEEE International Conference on Acoustics, 2024

Spoofing Attack Augmentation: Can Differently-Trained Attack Models Improve Generalisation?
Proceedings of the IEEE International Conference on Acoustics, 2024

Can Large-Scale Vocoded Spoofed Data Improve Speech Spoofing Countermeasure with a Self-Supervised Front End?
Proceedings of the IEEE International Conference on Acoustics, 2024

Bridging Textual and Tabular Worlds for Fact Verification: A Lightweight, Attention-Based Model.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

2023
Model checkpoints for "XFEVER: Exploring Fact Verification across Languages".
Dataset, October, 2023

BodyFormer: Semantics-guided 3D Body Gesture Synthesis with Transformer.
ACM Trans. Graph., August, 2023

The PartialSpoof Database and Countermeasures for the Detection of Short Fake Speech Segments Embedded in an Utterance.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

Speaker Anonymization Using Orthogonal Householder Neural Network.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

ASVspoof 2021: Towards Spoofed and Deepfake Speech Detection in the Wild.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

Speaker-Text Retrieval via Contrastive Learning.
CoRR, 2023

XFEVER: Exploring Fact Verification across Languages.
CoRR, 2023

DDSP-based Neural Waveform Synthesis of Polyphonic Guitar Performance from String-wise MIDI Input.
CoRR, 2023

Language-independent speaker anonymization using orthogonal Householder neural network.
CoRR, 2023

Cyber Vaccine for Deepfake Immunity.
IEEE Access, 2023

Analysis of Master Vein Attacks on Finger Vein Recognition Systems.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023

Range-Based Equal Error Rate for Spoof Localization.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Improving Generalization Ability of Countermeasures for New Mismatch Scenario by Combining Multiple Advanced Regularization Terms.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Towards Single Integrated Spoofing-aware Speaker Verification Embeddings.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Controlling Multi-Class Human Vocalization Generation via a Simple Segment-based Labeling Scheme.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Investigating Range-Equalizing Bias in Mean Opinion Score Ratings of Synthesized Speech.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

How Close Are Other Computer Vision Tasks to Deepfake Detection?
Proceedings of the IEEE International Joint Conference on Biometrics, 2023

Spoofed Training Data for Speech Spoofing Countermeasure Can Be Efficiently Created Using Neural Vocoders.
Proceedings of the IEEE International Conference on Acoustics, 2023

Can Knowledge of End-to-End Text-to-Speech Models Improve Neural Midi-to-Audio Synthesis Systems?
Proceedings of the IEEE International Conference on Acoustics, 2023

Hiding Speaker's Sex in Speech Using Zero-Evidence Speaker Representation in an Analysis/Synthesis Pipeline.
Proceedings of the IEEE International Conference on Acoustics, 2023

Joint Noise Reduction and Listening Enhancement for Full-End Speech Enhancement.
Proceedings of the IEEE International Conference on Acoustics, 2023

Partial Rank Similarity Minimization Method for Quality MOS Prediction of Unseen Speech Synthesis Systems in Zero-Shot and Semi-Supervised Setting.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

The Voicemos Challenge 2023: Zero-Shot Subjective Speech Quality Prediction for Multiple Domains.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Exploring Isolated Musical Notes as Pre-training Data for Predominant Instrument Recognition in Polyphonic Music.
Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2023

Revisiting Pathologies of Neural Models under Input Reduction.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

2022
Device Recorded VCTK (DR-VCTK).
Dataset, June, 2022

Master Face Attacks on Face Recognition Systems.
IEEE Trans. Biom. Behav. Identity Sci., 2022

Privacy and Utility of X-Vector Based Speaker Anonymization.
IEEE ACM Trans. Audio Speech Lang. Process., 2022

Use of Speaker Recognition Approaches for Learning and Evaluating Embedding Representations of Musical Instrument Sounds.
IEEE ACM Trans. Audio Speech Lang. Process., 2022

Optimizing Tandem Speaker Verification and Anti-Spoofing Systems.
IEEE ACM Trans. Audio Speech Lang. Process., 2022

SVSNet: An End-to-End Speaker Voice Similarity Assessment Model.
IEEE Signal Process. Lett., 2022

Effects of Image Processing Operations on Adversarial Noise and Their Use in Detecting and Correcting Adversarial Images.
IEICE Trans. Inf. Syst., 2022

The VoicePrivacy 2020 Challenge: Results and findings.
Comput. Speech Lang., 2022

The VoicePrivacy 2020 Challenge Evaluation Plan.
CoRR, 2022

The PartialSpoof Database and Countermeasures for the Detection of Short Generated Audio Segments Embedded in a Speech Utterance.
CoRR, 2022

The VoicePrivacy 2022 Challenge Evaluation Plan.
CoRR, 2022

Robust Deepfake On Unrestricted Media: Generation And Detection.
CoRR, 2022

A Practical Guide to Logical Access Voice Presentation Attack Detection.
CoRR, 2022

Investigating Active-Learning-Based Training Data Selection for Speech Spoofing Countermeasure.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Automatic Speaker Verification Spoofing and Deepfake Detection Using Wav2vec 2.0 and Data Augmentation.
Proceedings of the Odyssey 2022: The Speaker and Language Recognition Workshop, 28 June, 2022

Language-Independent Speaker Anonymization Approach Using Self-Supervised Pre-Trained Models.
Proceedings of the Odyssey 2022: The Speaker and Language Recognition Workshop, 28 June, 2022

Investigating Self-Supervised Front Ends for Speech Spoofing Countermeasures.
Proceedings of the Odyssey 2022: The Speaker and Language Recognition Workshop, 28 June, 2022

Lessons Learned from ASVSpoof and Remaining Challenges.
Proceedings of the DDAM@MM 2022: Proceedings of the 1st International Workshop on Deepfake Detection for Audio Multimedia, 2022

Spoofing-Aware Attention based ASV Back-end with Multiple Enrollment Utterances and a Sampling Strategy for the SASV Challenge 2022.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Analyzing Language-Independent Speaker Anonymization Framework under Unseen Conditions.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

DDS: A new device-degraded speech dataset for speech enhancement.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

The VoiceMOS Challenge 2022.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Outlier-Aware Training for Improving Group Accuracy Disparities.
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing, 2022

Attention Back-End for Automatic Speaker Verification with Multiple Enrollment Utterances.
Proceedings of the IEEE International Conference on Acoustics, 2022

Estimating the Confidence of Speech Spoofing Countermeasure.
Proceedings of the IEEE International Conference on Acoustics, 2022

On the Interplay between Sparsity, Naturalness, Intelligibility, and Prosody in Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2022

LDNet: Unified Listener Dependent Modeling in MOS Prediction for Synthetic Speech.
Proceedings of the IEEE International Conference on Acoustics, 2022

Generalization Ability of MOS Prediction Networks.
Proceedings of the IEEE International Conference on Acoustics, 2022

Mitigating the Diminishing Effect of Elastic Weight Consolidation.
Proceedings of the 29th International Conference on Computational Linguistics, 2022


Capsule-Forensics Networks for Deepfake Detection.
Proceedings of the Handbook of Digital Face Manipulation and Detection, 2022

2021
ASVspoof 2019: Spoofing Countermeasures for the Detection of Synthesized, Converted and Replayed Speech.
IEEE Trans. Biom. Behav. Identity Sci., 2021

An Overview of Voice Conversion and Its Challenges: From Statistical Modeling to Deep Learning.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Multi-Metric Optimization Using Generative Adversarial Networks for Near-End Speech Intelligibility Enhancement.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Generation and Detection of Media Clones.
IEICE Trans. Inf. Syst., 2021

Preventing Fake Information Generation Against Media Clone Attacks.
IEICE Trans. Inf. Syst., 2021

Investigation of learning abilities on linguistic features in sequence-to-sequence text-to-speech synthesis.
Comput. Speech Lang., 2021

Effectiveness of Detection-based and Regression-based Approaches for Estimating Mask-Wearing Ratio.
CoRR, 2021

LaughNet: synthesizing laughter utterances from waveform silhouettes and a single laughter example.
CoRR, 2021

ASVspoof 2021: accelerating progress in spoofed and deepfake speech detection.
CoRR, 2021

ASVspoof 2021: Automatic Speaker Verification Spoofing and Countermeasures Challenge Evaluation Plan.
CoRR, 2021

Benchmarking and challenges in security and privacy for voice biometrics.
CoRR, 2021

Multi-Task Learning in Utterance-Level and Segmental-Level Spoof Detection.
CoRR, 2021

Use of speaker recognition approaches for learning timbre representations of musical instrument sounds from raw waveforms.
CoRR, 2021

Attention Back-end for Automatic Speaker Verification with Multiple Enrollment Utterances.
CoRR, 2021

Preliminary study on using vector quantization latent spaces for TTS/VC systems with consistent performance.
Proceedings of the 11th ISCA Speech Synthesis Workshop, 2021

How do Voices from Past Speech Synthesis Challenges Compare Today?
Proceedings of the 11th ISCA Speech Synthesis Workshop, 2021

Text-to-Speech Synthesis Techniques for MIDI-to-Audio Synthesis.
Proceedings of the 11th ISCA Speech Synthesis Workshop, 2021

Exploring Disentanglement with Multilingual and Monolingual VQ-VAE.
Proceedings of the 11th ISCA Speech Synthesis Workshop, 2021

Enhancing Low-Quality Voice Recordings Using Disentangled Channel Factor and Neural Waveform Model.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Denoising-and-Dereverberation Hierarchical Neural Vocoder for Robust Waveform Generation.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

An Initial Investigation for Detecting Partially Spoofed Audio.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

A Comparative Study on Recent Neural Spoofing Countermeasures for Synthetic Speech Detection.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Visualizing Classifier Adjacency Relations: A Case Study in Speaker Verification and Voice Anti-Spoofing.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

OpenForensics: Large-Scale Challenging Dataset For Multi-Face Forgery Detection And Segmentation In-The-Wild.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

End-to-End Text-to-Speech Using Latent Duration Based on VQ-VAE.
Proceedings of the IEEE International Conference on Acoustics, 2021

Learning Disentangled Phone and Speaker Representations in a Semi-Supervised VQ-VAE Paradigm.
Proceedings of the IEEE International Conference on Acoustics, 2021

How Similar or Different is Rakugo Speech Synthesizer to Professional Performers?
Proceedings of the IEEE International Conference on Acoustics, 2021

Effectiveness of Detection-based and Regression-based Approaches for Estimating Mask-Wearing Ratio.
Proceedings of the 16th IEEE International Conference on Automatic Face and Gesture Recognition, 2021

Fashion-Guided Adversarial Attack on Person Segmentation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2021

A Multi-Level Attention Model for Evidence-Based Fact Checking.
Proceedings of the Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, 2021

2020
A Vector Quantized Variational Autoencoder (VQ-VAE) Autoregressive Neural F<sub>0</sub> Model for Statistical Parametric Speech Synthesis.
IEEE ACM Trans. Audio Speech Lang. Process., 2020

Neural Source-Filter Waveform Models for Statistical Parametric Speech Synthesis.
IEEE ACM Trans. Audio Speech Lang. Process., 2020

NAUTILUS: A Versatile Voice Cloning System.
IEEE ACM Trans. Audio Speech Lang. Process., 2020

Tandem Assessment of Spoofing Countermeasures and Automatic Speaker Verification: Fundamentals.
IEEE ACM Trans. Audio Speech Lang. Process., 2020

ASVspoof 2019: A large-scale public database of synthesized, converted and replayed speech.
Comput. Speech Lang., 2020

Introduction to the special issue "Speaker and language characterization and recognition: Voice modeling, conversion, synthesis and ethical aspects".
Comput. Speech Lang., 2020

Pretraining Strategies, Waveform Model Choice, and Acoustic Configurations for Multi-Speaker End-to-End Speech Synthesis.
CoRR, 2020

Grapheme or phoneme? An Analysis of Tacotron's Embedded Representations.
CoRR, 2020

Viable Threat on News Reading: Generating Biased News Using Natural Language Models.
CoRR, 2020

Modeling of Rakugo Speech and Its Limitations: Toward Speech Synthesis That Entertains Audiences.
IEEE Access, 2020

An Initial Investigation on Optimizing Tandem Speaker Verification and Countermeasure Systems Using Reinforcement Learning.
Proceedings of the Odyssey 2020: The Speaker and Language Recognition Workshop, 2020

Using Cyclic Noise as the Source Signal for Neural Source-Filter-Based Speech Waveform Model.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Introducing the VoicePrivacy Initiative.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Design Choices for X-Vector Based Speaker Anonymization.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

The Privacy ZEBRA: Zero Evidence Biometric Recognition Assessment.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Noise Tokens: Learning Neural Noise Templates for Environment-Aware Speech Enhancement.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

iMetricGAN: Intelligibility Enhancement for Speech-in-Noise Using Generative Adversarial Network-Based Metric Learning.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Can Speaker Augmentation Improve Multi-Speaker End-to-End TTS?
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Reverberation Modeling for Source-Filter-Based Neural Vocoder.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Improved Prosody from Learned F0 Codebook Representations for VQ-VAE Speech Waveform Reconstruction.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Security of Facial Forensics Models Against Adversarial Attacks.
Proceedings of the IEEE International Conference on Image Processing, 2020

Generating Master Faces for Use in Performing Wolf Attacks on Face Recognition Systems.
Proceedings of the 2020 IEEE International Joint Conference on Biometrics, 2020

Transferring Neural Speech Waveform Synthesizers to Musical Instrument Sounds Generation.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Effect of Choice of Probability Distribution, Randomness, and Search Methods for Alignment Modeling in Sequence-to-Sequence Text-to-Speech Synthesis Using Hard Alignment.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Zero-Shot Multi-Speaker Text-To-Speech with State-Of-The-Art Neural Speaker Embeddings.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Latent linguistic embedding for cross-lingual text-to-speech and voice conversion.
Proceedings of the Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 2020

Predictions of Subjective Ratings and Spoofing Assessments of Voice Conversion Challenge 2020 Submissions.
Proceedings of the Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 2020

Voice Conversion Challenge 2020 -- Intra-lingual semi-parallel and cross-lingual voice conversion --.
Proceedings of the Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 2020

Color Transfer to Anonymized Gait Images While Maintaining Anonymization.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2020

A Method for Identifying Origin of Digital Images Using a Convolutional Neural Network.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2020

Generating Sentiment-Preserving Fake Online Reviews Using Neural Language Models and Their Human- and Machine-Based Detection.
Proceedings of the Advanced Information Networking and Applications, 2020

2019
Introduction to Voice Presentation Attack Detection and Recent Advances.
Proceedings of the Handbook of Biometric Anti-Spoofing, 2019

Complex-Valued Restricted Boltzmann Machine for Speaker-Dependent Speech Parameterization From Complex Spectra.
IEEE ACM Trans. Audio Speech Lang. Process., 2019

Spatio-temporal generative adversarial network for gait anonymization.
J. Inf. Secur. Appl., 2019

Detecting and Correcting Adversarial Images Using Image Processing Operations.
CoRR, 2019

Transformation of low-quality device-recorded speech to high-quality speech using improved SEGAN model.
CoRR, 2019

The ASVspoof 2019 database.
CoRR, 2019

A Method for Identifying Origin of Digital Images Using a Convolution Neural Network.
CoRR, 2019

Use of a Capsule Network to Detect Fake Images and Videos.
CoRR, 2019

Initial investigation of an encoder-decoder end-to-end TTS framework using marginalization of monotonic hard latent alignments.
CoRR, 2019

A Unified Speaker Adaptation Method for Speech Synthesis using Transcribed and Untranscribed Speech with Backpropagation.
CoRR, 2019

Training a Neural Speech Waveform Model using Spectral Losses of Short-Time Fourier Transform and Continuous Wavelet Transform.
CoRR, 2019

Introduction to Voice Presentation Attack Detection and Recent Advances.
CoRR, 2019

Initial investigation of encoder-decoder end-to-end TTS using marginalization of monotonic hard alignments.
Proceedings of the 10th ISCA Speech Synthesis Workshop, 2019

Rakugo speech synthesis using segment-to-segment neural transduction and style tokens - toward speech synthesis for entertaining audiences.
Proceedings of the 10th ISCA Speech Synthesis Workshop, 2019

Speaker Anonymization Using X-vector and Neural Waveform Models.
Proceedings of the 10th ISCA Speech Synthesis Workshop, 2019

Neural Harmonic-plus-Noise Waveform Model with Trainable Maximum Voice Frequency for Text-to-Speech Synthesis.
Proceedings of the 10th ISCA Speech Synthesis Workshop, 2019

ASVspoof 2019: Future Horizons in Spoofed and Fake Audio Detection.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Training Multi-Speaker Neural Text-to-Speech Systems Using Speaker-Imbalanced Speech Corpora.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

MOSNet: Deep Learning-Based Objective Assessment for Voice Conversion.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

GELP: GAN-Excited Linear Prediction for Speech Synthesis from Mel-Spectrogram.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Does the Lombard Effect Improve Emotional Communication in Noise? - Analysis of Emotional Speech Acted in Noise.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Joint Training Framework for Text-to-Speech and Voice Conversion Using Multi-Source Tacotron and WaveNet.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Investigation of Enhanced Tacotron Text-to-speech Synthesis Systems with Self-attention for Pitch Accent Language.
Proceedings of the IEEE International Conference on Acoustics, 2019

Neural Source-filter-based Waveform Model for Statistical Parametric Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2019

STFT Spectral Loss for Training a Neural Speech Waveform Model.
Proceedings of the IEEE International Conference on Acoustics, 2019

Cycle-consistent Adversarial Networks for Non-parallel Vocal Effort Based Speaking Style Conversion.
Proceedings of the IEEE International Conference on Acoustics, 2019

Capsule-forensics: Using Capsule Networks to Detect Forged Images and Videos.
Proceedings of the IEEE International Conference on Acoustics, 2019

Attentive Filtering Networks for Audio Replay Attack Detection.
Proceedings of the IEEE International Conference on Acoustics, 2019

Waveform Generation for Text-to-speech Synthesis Using Pitch-synchronous Multi-scale Generative Adversarial Networks.
Proceedings of the IEEE International Conference on Acoustics, 2019

Audiovisual Speaker Conversion: Jointly and Simultaneously Transforming Facial Expression and Acoustic Characteristics.
Proceedings of the IEEE International Conference on Acoustics, 2019

Multi-task Learning for Detecting and Segmenting Manipulated Facial Images and Videos.
Proceedings of the 10th IEEE International Conference on Biometrics Theory, 2019

Bootstrapping Non-Parallel Voice Conversion from Speaker-Adaptive Text-to-Speech.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

An RGB Gait Anonymization Model for Low-Quality Silhouettes.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

2018
Autoregressive Neural F0 Model for Statistical Parametric Speech Synthesis.
IEEE ACM Trans. Audio Speech Lang. Process., 2018

Speech Enhancement of Noisy and Reverberant Speech for Text-to-Speech.
IEEE ACM Trans. Audio Speech Lang. Process., 2018

A Comparison Between STRAIGHT, Glottal, and Sinusoidal Vocoding in Statistical Parametric Speech Synthesis.
IEEE ACM Trans. Audio Speech Lang. Process., 2018

Investigating very deep highway networks for parametric speech synthesis.
Speech Commun., 2018

Investigating different representations for modeling and controlling multiple emotions in DNN-based speech synthesis.
Speech Commun., 2018

Deep Encoder-Decoder Models for Unsupervised Learning of Controllable Speech Synthesis.
CoRR, 2018

Complex-Valued Restricted Boltzmann Machine for Direct Speech Parameterization from Complex Spectra.
CoRR, 2018

Wasserstein GAN and Waveform Loss-Based Acoustic Model Training for Multi-Speaker Text-to-Speech Synthesis Systems Using a WaveNet Vocoder.
IEEE Access, 2018

Transforming acoustic characteristics to deceive playback spoofing countermeasures of speaker verification systems.
Proceedings of the 2018 IEEE International Workshop on Information Forensics and Security, 2018

MesoNet: a Compact Facial Video Forgery Detection Network.
Proceedings of the 2018 IEEE International Workshop on Information Forensics and Security, 2018

Scaling and Bias Codes for Modeling Speaker-Adaptive DNN-Based Speech Synthesis Systems.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Identifying Computer-Translated Paragraphs using Coherence Features.
Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation, 2018

The Voice Conversion Challenge 2018: Promoting Development of Parallel and Nonparallel Methods.
Proceedings of the Odyssey 2018: The Speaker and Language Recognition Workshop, 2018

Can we steal your vocal identity from the Internet?: Initial investigation of cloning Obama's voice using GAN, WaveNet and low-quality found data.
Proceedings of the Odyssey 2018: The Speaker and Language Recognition Workshop, 2018

A Spoofing Benchmark for the 2018 Voice Conversion Challenge: Leveraging from Spoofing Countermeasures for Speech Artifact Assessment.
Proceedings of the Odyssey 2018: The Speaker and Language Recognition Workshop, 2018

t-DCF: a Detection Cost Function for the Tandem Assessment of Spoofing Countermeasures and Automatic Speaker Verification.
Proceedings of the Odyssey 2018: The Speaker and Language Recognition Workshop, 2018

ASVspoof 2017 Version 2.0: meta-data analysis and baseline enhancements.
Proceedings of the Odyssey 2018: The Speaker and Language Recognition Workshop, 2018

Integrated Presentation Attack Detection and Automatic Speaker Verification: Common Features and Gaussian Back-end Fusion.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Multimodal Speech Synthesis Architecture for Unsupervised Speaker Adaptation.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Investigating Accuracy of Pitch-accent Annotations in Neural Network-based Speech Synthesis and Denoising Effects.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Speaker-independent Raw Waveform Model for Glottal Excitation.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Expressive Speech Synthesis Using Sentiment Embeddings.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Transformation on Computer-Generated Facial Image to Avoid Detection by Spoofing Detector.
Proceedings of the 2018 IEEE International Conference on Multimedia and Expo, 2018

A Comparison of Recent Waveform Generation and Acoustic Modeling Methods for Neural-Network-Based Speech Synthesis.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Speech Waveform Synthesis from MFCC Sequences with Generative Adversarial Networks.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Cyborg Speech: Deep Multilingual Speech Synthesis for Generating Segmental Foreign Accent with Natural Prosody.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

High-Quality Nonparallel Voice Conversion Based on Cycle-Consistent Adversarial Network.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Unsupervised Speaker Adaptation for DNN-based Speech Synthesis using Input Codes.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018

Modular Convolutional Neural Network for Discriminating between Computer-Generated Images and Photographic Images.
Proceedings of the 13th International Conference on Availability, Reliability and Security, 2018

2017
Introduction to the Issue on Spoofing and Countermeasures for Automatic Speaker Verification.
IEEE J. Sel. Top. Signal Process., 2017

ASVspoof: The Automatic Speaker Verification Spoofing and Countermeasures Challenge.
IEEE J. Sel. Top. Signal Process., 2017

Influence of speaker familiarity on blind and visually impaired children's and young adults' perception of synthetic voices.
Comput. Speech Lang., 2017

An approach for gait anonymization using deep learning.
Proceedings of the 2017 IEEE Workshop on Information Forensics and Security, 2017

Distinguishing computer graphics from natural images using convolution neural networks.
Proceedings of the 2017 IEEE Workshop on Information Forensics and Security, 2017

An RNN-Based Quantized F0 Model with Multi-Tier Feedback Links for Text-to-Speech Synthesis.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Speech Intelligibility in Cars: The Effect of Speaking Style, Noise and Listener Age.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Direct Modeling of Frequency Spectra and Waveform Generation Based on Phase Recovery for DNN-Based Speech Synthesis.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Learning Word Vector Representations Based on Acoustic Counts.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Complex-Valued Restricted Boltzmann Machine for Direct Learning of Frequency Spectra.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Misperceptions of the Emotional Content of Natural and Vocoded Speech in a Car.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

The ASVspoof 2017 Challenge: Assessing the Limits of Replay Spoofing Attack Detection.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Generative Adversarial Network-Based Postfilter for STFT Spectrograms.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Reducing Mismatch in Training of DNN-Based Glottal Excitation Models in a Statistical Parametric Text-to-Speech System.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Principles for Learning Controllable TTS from Annotated and Latent Variation.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

An autoregressive recurrent mixture density network for parametric speech synthesis.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Adapting and controlling DNN-based speech synthesis using input codes.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Non-parallel voice conversion using i-vector PLDA: towards unifying speaker verification and transformation.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Identifying computer-generated text using statistical analysis.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

User Generated Dialogue Systems: uDialogue.
Proceedings of the Human-Harmonized Information Technology, Volume 2, 2017

2016
Constructing a Deep Neural Network Based Spectral Model for Statistical Speech Synthesis.
Proceedings of the Recent Advances in Nonlinear Speech Processing, 2016

Anti-Spoofing for Text-Independent Speaker Verification: An Initial Database, Comparison of Countermeasures, and Human Performance.
IEEE ACM Trans. Audio Speech Lang. Process., 2016

Investigation of Using Continuous Representation of Various Linguistic Units in Neural Network Based Text-to-Speech Synthesis.
IEICE Trans. Inf. Syst., 2016

ALISA: An automatic lightly supervised speech segmentation and alignment tool.
Comput. Speech Lang., 2016

Multidimensional scaling of systems in the Voice Conversion Challenge 2016.
Proceedings of the 9th ISCA Speech Synthesis Workshop, 2016

A Comparative Study of the Performance of HMM, DNN, and RNN based Speech Synthesis Systems Trained on Very Large Speaker-Dependent Corpora.
Proceedings of the 9th ISCA Speech Synthesis Workshop, 2016

Investigating RNN-based speech enhancement methods for noise-robust Text-to-Speech.
Proceedings of the 9th ISCA Speech Synthesis Workshop, 2016

Speaker Adaptation of Various Components in Deep Neural Network based Speech Synthesis.
Proceedings of the 9th ISCA Speech Synthesis Workshop, 2016

Parallel and cascaded deep neural networks for text-to-speech synthesis.
Proceedings of the 9th ISCA Speech Synthesis Workshop, 2016

Development of a statistical parametric synthesis system for operatic singing in German.
Proceedings of the 9th ISCA Speech Synthesis Workshop, 2016

Voice Liveness Detection for Speaker Verification based on a Tandem Single/Double-channel Pop Noise Detector.
Proceedings of the Odyssey 2016: The Speaker and Language Recognition Workshop, 2016

A Hierarchical Predictor of Synthetic Speech Naturalness Using Neural Networks.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Analysis of the Voice Conversion Challenge 2016 Evaluation Results.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Enhance the Word Vector with Prosodic Information for the Recurrent Neural Network Based TTS System.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Applying Spectral Normalisation and Efficient Envelope Estimation and Statistical Transformation for the Voice Conversion Challenge 2016.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Speech Enhancement for a Noise-Robust Text-to-Speech Synthesis System Using Deep Recurrent Neural Networks.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

The Voice Conversion Challenge 2016.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Syllable-Level Representations of Suprasegmental Features for DNN-Based Text-to-Speech Synthesis.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Using Text and Acoustic Features in Predicting Glottal Excitation Waveforms for Parametric Speech Synthesis with Recurrent Neural Networks.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Majorisation-Minimisation Based Optimisation of the Composite Autoregressive System with Application to Glottal Inverse Filtering.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

The SIWIS Database: A Multilingual Speech Database with Acted Emphasis.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

A deep auto-encoder based low-dimensional feature extraction from FFT spectral envelopes for statistical parametric speech synthesis.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Wavelet-based decomposition of F0 as a secondary task for DNN-based speech synthesis with multi-task learning.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Deep neural network-guided unit selection synthesis.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Initial investigation of speech synthesis based on complex-valued neural networks.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Privacy-preserving sound to degrade automatic speaker verification performance.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Testing the consistency assumption: Pronunciation variant forced alignment in read and spontaneous speech synthesis.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Continuous Expressive Speaking Styles Synthesis based on CVSM and MR-HMM.
Proceedings of the COLING 2016, 2016

The NII speech synthesis entry for Blizzard Challenge 2016.
Proceedings of the Blizzard Challenge 2016, Cuppertino, CA, USA, September 16, 2016, 2016

2015
Anti-spoofing, Voice Databases.
Proceedings of the Encyclopedia of Biometrics, Second Edition, 2015

A Deep Generative Architecture for Postfiltering in Statistical Parametric Speech Synthesis.
IEEE ACM Trans. Audio Speech Lang. Process., 2015

Spoofing and countermeasures for speaker verification: A survey.
Speech Commun., 2015

Intelligibility of time-compressed synthetic speech: Compression method and speaking style.
Speech Commun., 2015

Emotion transplantation through adaptation in HMM-based speech synthesis.
Comput. Speech Lang., 2015

Deep Denoising Auto-encoder for Statistical Speech Synthesis.
CoRR, 2015

A Comparison of Manual and Automatic Voice Repair for Individual with Vocal Disabilities.
Proceedings of the 6th Workshop on Speech and Language Processing for Assistive Technologies, 2015

Automatic speaker verification spoofing and countermeasures (ASVspoof 2015): open discussion and future plans.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

ASVspoof 2015: the first automatic speaker verification spoofing and countermeasures challenge.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Human vs machine spoofing detection on wideband and narrowband data.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Multiple feed-forward deep neural networks for statistical parametric speech synthesis.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Voice liveness detection algorithms based on pop noise caused by human breath for automatic speaker verification.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

A perceptual investigation of wavelet-based decomposition of f0 for text-to-speech synthesis.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Influence of speaker familiarity on blind and visually impaired children's perception of synthetic voices in audio games.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Deep neural network context embeddings for model selection in rich-context HMM synthesis.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Reconstructing voices within the multiple-average-voice-model framework.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Fusion of multiple parameterisations for DNN-based sinusoidal speech synthesis with multi-task learning.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

SAS: A speaker verification spoofing database containing diverse attacks.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Methods for applying dynamic sinusoidal models to statistical parametric speech synthesis.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

2014
Speaker Recognition Anti-spoofing.
Proceedings of the Handbook of Biometric Anti-Spoofing, 2014

Statistical parametric speech synthesis for Ibibio.
Speech Commun., 2014

Combining Vocal Tract Length Normalization With Hierarchical Linear Transformations.
IEEE J. Sel. Top. Signal Process., 2014

Glottal Spectral Separation for Speech Synthesis.
IEEE J. Sel. Top. Signal Process., 2014

Intelligibility enhancement of HMM-generated speech in additive noise by modifying Mel cepstral coefficients to increase the glimpse proportion.
Comput. Speech Lang., 2014

Intelligibility analysis of fast synthesized speech.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Development of a genre-dependent TTS system with cross-speaker speaking-style transplantation.
Proceedings of the 2nd International Workshop on Speech, Language and Audio in Multimedia, 2014

Generating segmental foreign accent.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

An investigation of the application of dynamic sinusoidal models to statistical parametric speech synthesis.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

DNN-based stochastic postfilter for HMM-based speech synthesis.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Neural net word representations for phrase-break prediction without a part of speech tagger.
Proceedings of the IEEE International Conference on Acoustics, 2014

Multiple-average-voice-based speech synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2014

A fixed dimension and perceptually based dynamic sinusoidal model of speech.
Proceedings of the IEEE International Conference on Acoustics, 2014

Towards Cross-Lingual Emotion Transplantation.
Proceedings of the Advances in Speech and Language Technologies for Iberian Languages, 2014

2013
Articulatory Control of HMM-Based Parametric Speech Synthesis Using Feature-Space-Switched Multiple Regression.
IEEE Trans. Speech Audio Process., 2013

Speech Synthesis Based on Hidden Markov Models.
Proc. IEEE, 2013

Personalising speech-to-speech translation: Unsupervised cross-lingual speaker adaptation for HMM-based speech synthesis.
Comput. Speech Lang., 2013

Building personalised synthetic voices for individuals with severe speech impairment.
Comput. Speech Lang., 2013

Unsupervised and lightly-supervised learning for rapid construction of TTS systems in multiple languages from 'found' data: evaluation and analysis.
Proceedings of the Eighth ISCA Tutorial and Research Workshop on Speech Synthesis, 2013

Real-time control of expressive speech synthesis using kinect body tracking.
Proceedings of the Eighth ISCA Tutorial and Research Workshop on Speech Synthesis, 2013

Using neighbourhood density and selective SNR boosting to increase the intelligibility of synthetic speech in noise.
Proceedings of the Eighth ISCA Tutorial and Research Workshop on Speech Synthesis, 2013

Using adaptation to improve speech transcription alignment in noisy and reverberant environments.
Proceedings of the Eighth ISCA Tutorial and Research Workshop on Speech Synthesis, 2013

Towards speaking style transplantation in speech synthesis.
Proceedings of the Eighth ISCA Tutorial and Research Workshop on Speech Synthesis, 2013

An experimental comparison of multiple vocoder types.
Proceedings of the Eighth ISCA Tutorial and Research Workshop on Speech Synthesis, 2013

Mage - HMM-based speech synthesis reactively controlled by the articulators.
Proceedings of the Eighth ISCA Tutorial and Research Workshop on Speech Synthesis, 2013

Mage - reactive articulatory feature control of HMM-based parametric speech synthesis.
Proceedings of the Eighth ISCA Tutorial and Research Workshop on Speech Synthesis, 2013

Towards Personalised Synthesised Voices for Individuals with Vocal Disabilities: Voice Banking and Reconstruction.
Proceedings of the Fourth Workshop on Speech and Language Processing for Assistive Technologies, 2013

The voice bank corpus: Design, collection and data analysis of a large regional accent speech database.
Proceedings of the 2013 International Conference Oriental COCOSDA held jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE), 2013

Combining perceptually-motivated spectral shaping with loudness and duration modification for intelligibility enhancement of HMM-based synthetic speech in noise.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

TUNDRA: a multilingual corpus of found data for TTS research created with light supervision.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Lightly supervised discriminative training of grapheme models for improved sentence-level alignment of speech and text data.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

On the evaluation of inversion mapping performance in the acoustic domain.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Spoofing and countermeasures for automatic speaker verification.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Reactive accent interpolation through an interactive map application.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Improving intelligibility in noise of HMM-generated speech via noise-dependent and -independent methods.
Proceedings of the IEEE International Conference on Acoustics, 2013

Lightly supervised GMM VAD to use audiobook for speech synthesiser.
Proceedings of the IEEE International Conference on Acoustics, 2013

2012
Evaluation of Speaker Verification Security and Detection of HMM-Based Synthetic Speech.
IEEE Trans. Speech Audio Process., 2012

Analysis of unsupervised cross-lingual speaker adaptation for HMM-based speech synthesis using KLD-based transform mapping.
Speech Commun., 2012

Impacts of machine translation and speech synthesis on speech-to-speech translation.
Speech Commun., 2012

Synthesis and evaluation of conversational characteristics in HMM-based speech synthesis.
Speech Commun., 2012

Noise-robust whispered speech recognition using a non-audible-murmur microphone with VTS compensation.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012

Using HMM-based Speech Synthesis to Reconstruct the Voice of Individuals with Degenerative Speech Disorders.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Mel cepstral coefficient modification based on the Glimpse Proportion measure for improving the intelligibility of HMM-generated synthetic speech in noise.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Evaluating speech intelligibility enhancement for HMM-based synthetic speech in noise.
Proceedings of the ISCA Workshop on Statistical And Perceptual Audition, 2012

Towards an Unsupervised Speaking Style Voice Building Framework: Multi-Style Speaker Diarization.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Towards Glottal Source Controllability in Expressive Speech Synthesis.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Vowel Creation by Articulatory Control in HMM-based Parametric Speech Synthesis.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Synthetic Speech Discrimination using Pitch Pattern Statistics Derived from Image Analysis.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Analysis of speaker clustering strategies for HMM-based speech synthesis.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Cepstral analysis based on the glimpse proportion measure for improving the intelligibility of HMM-based synthetic speech in noise.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Combining vocal tract length normalization with hierarchial linear transformations.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

2011
HMM-Based Speech Synthesis Utilizing Glottal Inverse Filtering.
IEEE Trans. Speech Audio Process., 2011

The Romanian speech synthesis (RSS) corpus: Building a high quality HMM-based speech synthesis system using a high sampling rate.
Speech Commun., 2011

Unsupervised Continuous-Valued Word Features for Phrase-Break Prediction without a Part-of-Speech Tagger.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Can Objective Measures Predict the Intelligibility of Modified HMM-Based Synthetic Speech in Noise?
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Feature-Space Transform Tying in Unified Acoustic-Articulatory Modelling for Articulatory Control of HMM-Based Speech Synthesis.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Formant-Controlled HMM-Based Speech Synthesis.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Evaluation of objective measures for intelligibility prediction of HMM-based synthetic speech in noise.
Proceedings of the IEEE International Conference on Acoustics, 2011

Detection of synthetic speech for the problem of imposture.
Proceedings of the IEEE International Conference on Acoustics, 2011

An analysis of machine translation and speech synthesis in speech-to-speech translation system.
Proceedings of the IEEE International Conference on Acoustics, 2011

HMM-based speech synthesiser using the LF-model of the glottal source.
Proceedings of the IEEE International Conference on Acoustics, 2011

Vocal attractiveness of statistical speech synthesisers.
Proceedings of the IEEE International Conference on Acoustics, 2011

Voice banking and voice reconstruction for MND patients.
Proceedings of the 13th International ACM SIGACCESS Conference on Computers and Accessibility, 2011

2010
Thousands of Voices for HMM-Based Speech Synthesis-Analysis and Application of TTS Systems Built on Various ASR Corpora.
IEEE Trans. Speech Audio Process., 2010

Synthesis of Child Speech With HMM Adaptation and Voice Conversion.
IEEE Trans. Speech Audio Process., 2010

Modeling and interpolation of Austrian German and Viennese dialect in HMM-based speech synthesis.
Speech Commun., 2010

An Analysis of HMM-based prediction of articulatory movements.
Speech Commun., 2010

Analysis of statistical parametric and unit selection speech synthesis systems applied to emotional speech.
Speech Commun., 2010

Measuring the Gap Between HMM-Based ASR and TTS.
IEEE J. Sel. Top. Signal Process., 2010

Speaker adaptation and the evaluation of speaker similarity in the EMIME speech-to-speech translation project.
Proceedings of the Seventh ISCA Tutorial and Research Workshop on Speech Synthesis, 2010

Letter-based speech synthesis.
Proceedings of the Seventh ISCA Tutorial and Research Workshop on Speech Synthesis, 2010

An unified and automatic approach of Mandarin HTS system.
Proceedings of the Seventh ISCA Tutorial and Research Workshop on Speech Synthesis, 2010

An HMM-based speech synthesiser using glottal post-filtering.
Proceedings of the Seventh ISCA Tutorial and Research Workshop on Speech Synthesis, 2010

Utilising spontaneous conversational speech in HMM-based speech synthesis.
Proceedings of the Seventh ISCA Tutorial and Research Workshop on Speech Synthesis, 2010

Evaluation of the Vulnerability of Speaker Verification to Synthetic Speech.
Proceedings of the Odyssey 2010: The Speaker and Language Recognition Workshop, Brno, Czech Republic, June 28, 2010

Roles of the average voice in speaker-adaptive HMM-based speech synthesis.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

The role of higher-level linguistic features in HMM-based speech synthesis.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Synthesis of fast speech with interpolation of adapted HSMMs and its evaluation by blind and sighted listeners.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

HMM-based text-to-articulatory-movement prediction and analysis of critical articulators.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Simple methods for improving speaker-similarity of HMM-based speech synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2010

Unsupervised cross-lingual speaker adaptation for HMM-based speech synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2010

Revisiting the security of speaker verification systems against imposture using synthetic speech.
Proceedings of the IEEE International Conference on Acoustics, 2010

The CSTR/EMIME HTS System for Blizzard Challenge.
Proceedings of the Blizzard Challenge 2010, Kansai Science City, Japan, September 25, 2010, 2010

2009
Robust Speaker-Adaptive HMM-Based Text-to-Speech Synthesis.
IEEE Trans. Speech Audio Process., 2009

Analysis of Speaker Adaptation Algorithms for HMM-Based Speech Synthesis and a Constrained SMAPLR Adaptation Algorithm.
IEEE Trans. Speech Audio Process., 2009

Integrating Articulatory Features Into HMM-Based Parametric Speech Synthesis.
IEEE Trans. Speech Audio Process., 2009

Thousands of voices for HMM-based speech synthesis.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

HMM adaptation and voice conversion for the synthesis of child speech: a comparison.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Identification of contrast and its emphatic realization in HMM based speech synthesis.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Speech synthesis without a phone inventory.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Analysis of Unsupervised and Noise-Robust Speaker-Adaptive HMM-Based Speech Synthesis Systems toward a Unified ASR and TTS Framework.
Proceedings of the Blizzard Challenge 2009, Edinburgh, Scotland, UK, September 4, 2009, 2009

Glottal Source and Prosodic Prominence Modelling in HMM-based Speech Synthesis for the Blizzard Challenge 2009.
Proceedings of the Blizzard Challenge 2009, Edinburgh, Scotland, UK, September 4, 2009, 2009

2008
Phone duration modeling using gradient tree boosting.
Speech Commun., 2008

HMM-based synthesis of child speech.
Proceedings of the First Workshop on Child, Computer and Interaction, 2008

Robustness of HMM-based speech synthesis.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Articulatory control of HMM-based parametric speech synthesis driven by phonetic knowledge.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Unsupervised adaptation for HMM-based speech synthesis.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Speech-driven lip motion generation with a trajectory HMM.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Glottal spectral separation for parametric speech synthesis.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Performance evaluation of the speaker-independent HMM-based speech synthesis system "HTS 2007" for the Blizzard Challenge 2007.
Proceedings of the IEEE International Conference on Acoustics, 2008

The HTS-2008 System: Yet Another Evaluation of the Speaker-Adaptive HMM-based Speech Synthesis System in The 2008 Blizzard Challenge.
Proceedings of the Blizzard Challenge 2008, 2008

2007
Average-Voice-Based Speech Synthesis Using HSMM-Based Speaker Adaptation and Adaptive Training.
IEICE Trans. Inf. Syst., 2007

A Style Control Technique for HMM-Based Expressive Speech Synthesis.
IEICE Trans. Inf. Syst., 2007

The HMM-based speech synthesis system (HTS) version 2.0.
Proceedings of the Sixth ISCA Workshop on Speech Synthesis, 2007

Improved average-voice-based speech synthesis using gender-mixed modeling and a parameter generation algorithm considering GV.
Proceedings of the Sixth ISCA Workshop on Speech Synthesis, 2007

Utilization of an HMM-based feature generation module in 5 ms segment concatenative speech synthesis.
Proceedings of the Sixth ISCA Workshop on Speech Synthesis, 2007

Towards an improved modeling of the glottal source in statistical parametric speech synthesis.
Proceedings of the Sixth ISCA Workshop on Speech Synthesis, 2007

Speech driven head motion synthesis based on a trajectory model.
Proceedings of the International Conference on Computer Graphics and Interactive Techniques, 2007

Performance evaluation of HMM-based style classification with a small amount of training data.
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Model Adaptation Approach to Speech Synthesis with Diverse Voices and Styles.
Proceedings of the IEEE International Conference on Acoustics, 2007

Speaker-independent HMM-based speech synthesis system - HTS-2007 system for the Blizzard Challenge 2007.
Proceedings of the Evaluation of text-to-speech systems: Blizzard Challenge 2007, 2007

Festival <i>multisyn</i> voices for the 2007 Blizzard Challenge.
Proceedings of the Evaluation of text-to-speech systems: Blizzard Challenge 2007, 2007

2006
A Style Adaptation Technique for Speech Synthesis Using HSMM and Suprasegmental Features.
IEICE Trans. Inf. Syst., 2006

A technique for controlling voice quality of synthetic speech using multiple regression HSMM.
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

Acoustic model training based on linear transformation and MAP modification for HSMM-based speech synthesis.
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

A style control technique for speech synthesis using multiple regression HSMM.
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

Constrained structural maximum a posteriori linear regression for average-voice-based speech synthesis.
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

HSMM-Based Model Adaptation Algorithms for Average-Voice-Based Speech Synthesis.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

Developing a Test Bed of English Text-to-Speech System XIMERA for the Blizzard Challenge 2006.
Proceedings of the Blizzard Challenge 2006, Pittsburgh, PA, USA, September 16, 2006, 2006

2005
Acoustic Modeling of Speaking Styles and Emotional Expressions in HMM-Based Speech Synthesis.
IEICE Trans. Inf. Syst., 2005

Speech Synthesis with Various Emotional Expressions and Speaking Styles by Style Interpolation and Morphing.
IEICE Trans. Inf. Syst., 2005

Human Walking Motion Synthesis with Desired Pace and Stride Length Based on HSMM.
IEICE Trans. Inf. Syst., 2005

Performance evaluation of style adaptation for hidden semi-Markov model based speech synthesis.
Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

Model adaptation and adaptive training using ESAT algorithm for HMM-based speech synthesis.
Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

Adaptive Training for Hidden Semi-Markov Model.
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

HumanWalking Motion Synthesis Based on Multiple Regression Hidden Semi-Markov Model.
Proceedings of the 4th International Conference on Cyberworlds (CW 2005), 2005

2004
MLLR adaptation for hidden semi-Markov model based speech synthesis.
Proceedings of the 8th International Conference on Spoken Language Processing, 2004

Speaking style adaptation using context clustering decision tree for HMM-based speech synthesis.
Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

2003
A Training Method of Average Voice Model for HMM-Based Speech Synthesis.
IEICE Trans. Fundam. Electron. Commun. Comput. Sci., 2003

Modeling of various speaking styles and emotions for HMM-based speech synthesis.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

A training method for average voice model based on shared decision tree context clustering and speaker adaptive training.
Proceedings of the 2003 IEEE International Conference on Acoustics, 2003

2002
A context clustering technique for average voice model in HMM-based speech synthesis.
Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002


  Loading...