Heiga Zen

According to our database1, Heiga Zen authored at least 116 papers between 2002 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Extending Multilingual Speech Synthesis to 100+ Languages without Transcribed Data.
CoRR, 2024

2023
Twenty-Five Years of Evolution in Speech and Language Processing.
IEEE Signal Process. Mag., July, 2023

Extracting representative subset from extensive text data for training pre-trained language models.
Inf. Process. Manag., May, 2023

Guest Editorial: Special Issue on Affective Speech and Language Synthesis, Generation, and Conversion.
IEEE Trans. Affect. Comput., 2023

LibriTTS-R: A Restored Multi-Speaker Text-to-Speech Corpus.
CoRR, 2023

Translatotron 3: Speech to Speech Translation with Monolingual Data.
CoRR, 2023

Miipher: A Robust Speech Restoration Model Integrating Self-Supervised Speech and Text Representations.
Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2023

Lightweight, Multi-Speaker, Multi-Lingual Indic Text-to-Speech.
Proceedings of the IEEE International Conference on Acoustics, 2023

Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-to-Speech.
Proceedings of the IEEE International Conference on Acoustics, 2023

SayTap: Language to Quadrupedal Locomotion.
Proceedings of the Conference on Robot Learning, 2023

2022
Residual Adapters for Few-Shot Text-to-Speech Speaker Adaptation.
CoRR, 2022

Wavefit: an Iterative and Non-Autoregressive Neural Vocoder Based on Fixed-Point Iteration.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

CVSS Corpus and Massively Multilingual Speech-to-Speech Translation.
Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022

SpecGrad: Diffusion Probabilistic Model based Neural Vocoder with Adaptive Noise Spectral Shaping.
Proceedings of the Interspeech 2022, 2022

Training Text-To-Speech Systems From Synthetic Data: A Practical Approach For Accent Transfer Tasks.
Proceedings of the Interspeech 2022, 2022

MAESTRO: Matched Speech Text Representations through Modality Matching.
Proceedings of the Interspeech 2022, 2022

2021
Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling.
CoRR, 2021

PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

Semi-Supervision in ASR: Sequential MixMatch and Factorized TTS-Based Augmentation.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

WaveGrad: Estimating Gradients for Waveform Generation.
Proceedings of the 9th International Conference on Learning Representations, 2021

Parallel Tacotron: Non-Autoregressive and Controllable TTS.
Proceedings of the IEEE International Conference on Acoustics, 2021

2020
Non-Attentive Tacotron: Robust and Controllable Neural TTS Synthesis Including Unsupervised Duration Modeling.
CoRR, 2020

Generating diverse and natural text-to-speech samples using a quantized fine-grained VAE and auto-regressive prosody prior.
CoRR, 2020

Fully-Hierarchical Fine-Grained Prosody Modeling For Interpretable Speech Synthesis.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Generating Diverse and Natural Text-to-Speech Samples Using a Quantized Fine-Grained VAE and Autoregressive Prosody Prior.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019
Speech Processing for Digital Home Assistants: Combining signal processing with deep-learning techniques.
IEEE Signal Process. Mag., 2019

Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling.
CoRR, 2019

Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning.
Proceedings of the Interspeech 2019, 2019

LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech.
Proceedings of the Interspeech 2019, 2019

Hierarchical Generative Modeling for Controllable Speech Synthesis.
Proceedings of the 7th International Conference on Learning Representations, 2019

Sample Efficient Adaptive Text-to-Speech.
Proceedings of the 7th International Conference on Learning Representations, 2019

2018
Sequence-to-sequence Neural Network Model with 2D Attention for Learning Japanese Pitch Accents.
Proceedings of the Interspeech 2018, 2018


[Invited] Generative Model-Based Text-to-Speech Synthesis.
Proceedings of the IEEE 7th Global Conference on Consumer Electronics, 2018

2017
Speech Research at Google to Enable Universal Speech Interfaces.
Proceedings of the New Era for Robust Speech Recognition, Exploiting Deep Learning., 2017

2016
WaveNet: A Generative Model for Raw Audio.
Proceedings of the 9th ISCA Speech Synthesis Workshop, 2016

Using instantaneous frequency and aperiodicity detection to estimate F0 for high-quality speech synthesis.
Proceedings of the 9th ISCA Speech Synthesis Workshop, 2016

Fast, Compact, and High Quality LSTM-RNN Based Statistical Parametric Speech Synthesizers for Mobile Devices.
Proceedings of the Interspeech 2016, 2016

Multi-Language Multi-Speaker Acoustic Modeling for LSTM-RNN Based Statistical Parametric Speech Synthesis.
Proceedings of the Interspeech 2016, 2016

Directly modeling voiced and unvoiced components in speech waveforms by neural networks.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2015
Deep Learning for Acoustic Modeling in Parametric Speech Generation: A systematic review of existing techniques and future trends.
IEEE Signal Process. Mag., 2015

Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Directly modeling speech waveforms by neural networks for statistical parametric speech synthesis.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

2014
Deep mixture density networks for acoustic modeling in statistical parametric speech synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2014

2013
Autoregressive Models for Statistical Parametric Speech Synthesis.
IEEE Trans. Speech Audio Process., 2013

Speech Synthesis Based on Hidden Markov Models.
Proc. IEEE, 2013

Deep learning in speech synthesis.
Proceedings of the Eighth ISCA Tutorial and Research Workshop on Speech Synthesis, 2013

Statistical parametric speech synthesis using deep neural networks.
Proceedings of the IEEE International Conference on Acoustics, 2013

2012
Product of Experts for Statistical Parametric Speech Synthesis.
IEEE Trans. Speech Audio Process., 2012

Statistical Parametric Speech Synthesis Based on Speaker and Language Factorization.
IEEE Trans. Speech Audio Process., 2012

Combining multiple high quality corpora for improving HMM-TTS.
Proceedings of the INTERSPEECH 2012, 2012

Cepstral analysis based on the glimpse proportion measure for improving the intelligibility of HMM-based synthetic speech in noise.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

2011
Continuous Stochastic Feature Mapping Based on Trajectory HMMs.
IEEE Trans. Speech Audio Process., 2011

Context adaptive training with factorized decision trees for HMM-based statistical parametric speech synthesis.
Speech Commun., 2011

Bayesian Context Clustering Using Cross Validation for Speech Recognition.
IEICE Trans. Inf. Syst., 2011

The Effect of Using Normalized Models in Statistical Speech Synthesis.
Proceedings of the INTERSPEECH 2011, 2011

Gaussian Process Experts for Voice Conversion.
Proceedings of the INTERSPEECH 2011, 2011

Multipulse Sequences for Residual Signal Modeling.
Proceedings of the INTERSPEECH 2011, 2011

Estimation of Window Coefficients for Dynamic Feature Extraction for HMM-Based Speech Synthesis.
Proceedings of the INTERSPEECH 2011, 2011

Decision tree-based context clustering based on cross validation and hierarchical priors.
Proceedings of the IEEE International Conference on Acoustics, 2011

2010
A Covariance-Tying Technique for HMM-Based Speech Synthesis.
IEICE Trans. Inf. Syst., 2010

HMM-based polyglot speech synthesis by speaker and language adaptive training.
Proceedings of the Seventh ISCA Tutorial and Research Workshop on Speech Synthesis, 2010

Statistical parametric speech synthesis with joint estimation of acoustic and excitation model parameters.
Proceedings of the Seventh ISCA Tutorial and Research Workshop on Speech Synthesis, 2010

Speaker and language adaptive training for HMM-based polyglot speech synthesis.
Proceedings of the INTERSPEECH 2010, 2010

Context adaptive training with factorized decision trees for HMM-based speech synthesis.
Proceedings of the INTERSPEECH 2010, 2010

An implementation of decision tree-based context clustering on graphics processing units.
Proceedings of the INTERSPEECH 2010, 2010

Training a parametric-based logF0 model with the minimum generation error criterion.
Proceedings of the INTERSPEECH 2010, 2010

Statistical parametric speech synthesis based on product of experts.
Proceedings of the IEEE International Conference on Acoustics, 2010

2009
Robust Speaker-Adaptive HMM-Based Text-to-Speech Synthesis.
IEEE Trans. Speech Audio Process., 2009

Statistical parametric speech synthesis.
Speech Commun., 2009

Context-dependent additive log f_0 model for HMM-based speech synthesis.
Proceedings of the INTERSPEECH 2009, 2009

Tying covariance matrices to reduce the footprint of HMM-based speech synthesis systems.
Proceedings of the INTERSPEECH 2009, 2009

Stereo-based stochastic noise compensation based on trajectory GMMS.
Proceedings of the IEEE International Conference on Acoustics, 2009

A Bayesian approach to HMM-based speech synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2009

2008
The Nitech-NAIST HMM-Based Speech Synthesis System for the Blizzard Challenge 2006.
IEICE Trans. Inf. Syst., 2008

A Fully Consistent Hidden Semi-Markov Model-Based Speech Recognition System.
IEICE Trans. Inf. Syst., 2008

Probabilistic feature mapping based on trajectory HMMs.
Proceedings of the INTERSPEECH 2008, 2008

Acoustic modeling based on model structure annealing for speech recognition.
Proceedings of the INTERSPEECH 2008, 2008

Unsupervised adaptation for HMM-based speech synthesis.
Proceedings of the INTERSPEECH 2008, 2008

Bayesian context clustering using cross valid prior distribution for HMM-based speech recognition.
Proceedings of the INTERSPEECH 2008, 2008

Performance evaluation of the speaker-independent HMM-based speech synthesis system "HTS 2007" for the Blizzard Challenge 2007.
Proceedings of the IEEE International Conference on Acoustics, 2008

Acoustic modeling with contextual additive structure for HMM-based speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2008

2007
Details of the Nitech HMM-Based Speech Synthesis System for the Blizzard Challenge 2005.
IEICE Trans. Inf. Syst., 2007

A Hidden Semi-Markov Model-Based Speech Synthesis System.
IEICE Trans. Inf. Syst., 2007

State Duration Modeling for HMM-Based Speech Synthesis.
IEICE Trans. Inf. Syst., 2007

Reformulating the HMM as a trajectory model by imposing explicit relationships between static and dynamic feature vector sequences.
Comput. Speech Lang., 2007

The HMM-based speech synthesis system (HTS) version 2.0.
Proceedings of the Sixth ISCA Workshop on Speech Synthesis, 2007

Improved average-voice-based speech synthesis using gender-mixed modeling and a parameter generation algorithm considering GV.
Proceedings of the Sixth ISCA Workshop on Speech Synthesis, 2007

An excitation model for HMM-based speech synthesis based on residual modeling.
Proceedings of the Sixth ISCA Workshop on Speech Synthesis, 2007

Model-space MLLR for trajectory HMMs.
Proceedings of the INTERSPEECH 2007, 2007

A trainable excitation model for HMM-based speech synthesis.
Proceedings of the INTERSPEECH 2007, 2007

2006
Speaker adaptation of trajectory HMMs using feature-space MLLR.
Proceedings of the INTERSPEECH 2006, 2006

An HMM-based singing voice synthesis system.
Proceedings of the INTERSPEECH 2006, 2006

Estimating Trajectory Hmm Parameters Using Monte Carlo Em With Gibbs Sampler.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

Hidden Semi-Markov Model Based Speech Recognition System using Weighted Finite-State Transducer.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

2005
Simultaneous clustering of phonetic context, dimension, and state position for acoustic modeling using decision trees.
Syst. Comput. Jpn., 2005

Continuous Speech Recognition Based on General Factor Dependent Acoustic Models.
IEICE Trans. Inf. Syst., 2005

Applying Sparse KPCA for Feature Extraction in Speech Recognition.
IEICE Trans. Inf. Syst., 2005

Deterministic Annealing EM Algorithm in Acoustic Modeling for Speaker and Speech Recognition.
IEICE Trans. Inf. Syst., 2005

An overview of nitech HMM-based speech synthesis system for blizzard challenge 2005.
Proceedings of the INTERSPEECH 2005, 2005

On building a concatenative speech synthesis system from the blizzard challenge speech databases.
Proceedings of the INTERSPEECH 2005, 2005

Sparse KPCA for Feature Extraction in Speech Recognition.
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

2004
On the Use of Kernel PCA for Feature Extraction in Speech Recognition.
IEICE Trans. Inf. Syst., 2004

An introduction of trajectory model into HMM-based speech synthesis.
Proceedings of the Fifth ISCA ITRW on Speech Synthesis, 2004

Hidden semi-Markov model based speech synthesis.
Proceedings of the INTERSPEECH 2004, 2004

Constructing emotional speech synthesizers with limited speech database.
Proceedings of the INTERSPEECH 2004, 2004

Deterministic annealing EM algorithm in parameter estimation for acoustic model.
Proceedings of the INTERSPEECH 2004, 2004

A Viterbi algorithm for a trajectory model derived from HMM with explicit relationship between static and dynamic features.
Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

2003
Decision tree-based simultaneous clustering of phonetic contexts, dimensions, and state positions for acoustic modeling.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Trajectory modeling based on HMMs with the explicit relationship between static and dynamic features.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Towards the development of a brazilian portuguese text-to-speech system based on HMM.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Speech recognition using voice-characteristic-dependent acoustic models.
Proceedings of the 2003 IEEE International Conference on Acoustics, 2003

Improving the performance of HMM-based very low bit rate speech coding.
Proceedings of the 2003 IEEE International Conference on Acoustics, 2003

2002
Decision tree distribution tying based on a dimensional split technique.
Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002


  Loading...