Heiga Zen

Michelle Tadmor Ramanovich

Michiel Bacchiani

Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2025

Miipher-2: A Universal Speech Restoration Model for Million-Hour Scale Data Restoration.

[BibT_eX]

[DOI]

Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2025

SimulTron: On-Device Simultaneous Speech to Speech Translation.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

2024

Improving Dynamic Object Interactions in Text-to-Video Generation with AI Feedback.

[BibT_eX]

[DOI]

CoRR, 2024

Geometric-Averaged Preference Optimization for Soft Preference Labels.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

FLEURS-R: A Restored Multilingual Speech Corpus for Generation Tasks.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Extending Multilingual Speech Synthesis to 100+ Languages without Transcribed Data.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Translatotron 3: Speech to Speech Translation with Monolingual Data.

[BibT_eX]

[DOI]

Eliya Nachmani

Alon Levkovitch

Yifan Ding

Chulayuth Asawaroengchai

Michelle Tadmor Ramanovich

Proceedings of the IEEE International Conference on Acoustics, 2024

2023

Twenty-Five Years of Evolution in Speech and Language Processing.

[BibT_eX]

[DOI]

IEEE Signal Process. Mag., July, 2023

Extracting representative subset from extensive text data for training pre-trained language models.

[BibT_eX]

[DOI]

Jun Suzuki

Michelle Tadmor Ramanovich

Hideto Kazawa

Inf. Process. Manag., May, 2023

Guest Editorial: Special Issue on Affective Speech and Language Synthesis, Generation, and Conversion.

[BibT_eX]

[DOI]

IEEE Trans. Affect. Comput., 2023

Miipher: A Robust Speech Restoration Model Integrating Self-Supervised Speech and Text Representations.

[BibT_eX]

[DOI]

Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2023

FiPPiE: A Computationally Efficient Differentiable method for Estimating Fundamental Frequency From Spectrograms.

[BibT_eX]

[DOI]

Proceedings of the 12th ISCA Speech Synthesis Workshop, 2023

LibriTTS-R: A Restored Multi-Speaker Text-to-Speech Corpus.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Lightweight, Multi-Speaker, Multi-Lingual Indic Text-to-Speech.

[BibT_eX]

[DOI]

Mark Hasegawa-Johnson

Philipp Olbrich

Proceedings of the IEEE International Conference on Acoustics, 2023

Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-to-Speech.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

SayTap: Language to Quadrupedal Locomotion.

[BibT_eX]

[DOI]

Proceedings of the Conference on Robot Learning, 2023

2022

Residual Adapters for Few-Shot Text-to-Speech Speaker Adaptation.

[BibT_eX]

[DOI]

CoRR, 2022

Wavefit: an Iterative and Non-Autoregressive Neural Vocoder Based on Fixed-Point Iteration.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

CVSS Corpus and Massively Multilingual Speech-to-Speech Translation.

[BibT_eX]

[DOI]

Ye Jia

Quan Wang

Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022

SpecGrad: Diffusion Probabilistic Model based Neural Vocoder with Adaptive Noise Spectral Shaping.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Training Text-To-Speech Systems From Synthetic Data: A Practical Approach For Accent Transfer Tasks.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

MAESTRO: Matched Speech Text Representations through Modality Matching.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

2021

PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Semi-Supervision in ASR: Sequential MixMatch and Factorized TTS-Based Augmentation.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

WaveGrad: Estimating Gradients for Waveform Generation.

[BibT_eX]

[DOI]

Proceedings of the 9th International Conference on Learning Representations, 2021

Parallel Tacotron: Non-Autoregressive and Controllable TTS.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

2020

Non-Attentive Tacotron: Robust and Controllable Neural TTS Synthesis Including Unsupervised Duration Modeling.

[BibT_eX]

[DOI]

CoRR, 2020

Generating diverse and natural text-to-speech samples using a quantized fine-grained VAE and auto-regressive prosody prior.

[BibT_eX]

[DOI]

CoRR, 2020

Fully-Hierarchical Fine-Grained Prosody Modeling For Interpretable Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Generating Diverse and Natural Text-to-Speech Samples Using a Quantized Fine-Grained VAE and Autoregressive Prosody Prior.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

Speech Processing for Digital Home Assistants: Combining signal processing with deep-learning techniques.

[BibT_eX]

[DOI]

IEEE Signal Process. Mag., 2019

Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling.

[BibT_eX]

[DOI]

CoRR, 2019

Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Hierarchical Generative Modeling for Controllable Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 7th International Conference on Learning Representations, 2019

Sample Efficient Adaptive Text-to-Speech.

[BibT_eX]

[DOI]

Proceedings of the 7th International Conference on Learning Representations, 2019

2018

Sequence-to-sequence Neural Network Model with 2D Attention for Learning Japanese Pitch Accents.

[BibT_eX]

[DOI]

Antoine Bruguier

Arkady Arkhangorodsky

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Parallel WaveNet: Fast High-Fidelity Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 35th International Conference on Machine Learning, 2018

[Invited] Generative Model-Based Text-to-Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the IEEE 7th Global Conference on Consumer Electronics, 2018

2017

Speech Research at Google to Enable Universal Speech Interfaces.

[BibT_eX]

[DOI]

Proceedings of the New Era for Robust Speech Recognition, Exploiting Deep Learning., 2017

2016

WaveNet: A Generative Model for Raw Audio.

[BibT_eX]

[DOI]

Proceedings of the 9th ISCA Speech Synthesis Workshop, 2016

Using instantaneous frequency and aperiodicity detection to estimate F0 for high-quality speech synthesis.

[BibT_eX]

[DOI]

Hideki Kawahara

Yannis Agiomyrgiannakis

Proceedings of the 9th ISCA Speech Synthesis Workshop, 2016

Fast, Compact, and High Quality LSTM-RNN Based Statistical Parametric Speech Synthesizers for Mobile Devices.

[BibT_eX]

[DOI]

Yannis Agiomyrgiannakis

Niels Egberts

Fergus Henderson

Przemyslaw Szczepaniak

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Multi-Language Multi-Speaker Acoustic Modeling for LSTM-RNN Based Statistical Parametric Speech Synthesis.

[BibT_eX]

[DOI]

Bo Li

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Directly modeling voiced and unvoiced components in speech waveforms by neural networks.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2015

Deep Learning for Acoustic Modeling in Parametric Speech Generation: A systematic review of existing techniques and future trends.

[BibT_eX]

[DOI]

IEEE Signal Process. Mag., 2015

Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis.

[BibT_eX]

[DOI]

Hasim Sak

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Directly modeling speech waveforms by neural networks for statistical parametric speech synthesis.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

2014

Deep mixture density networks for acoustic modeling in statistical parametric speech synthesis.

[BibT_eX]

[DOI]

Andrew W. Senior

Proceedings of the IEEE International Conference on Acoustics, 2014

2013

Autoregressive Models for Statistical Parametric Speech Synthesis.

[BibT_eX]

[DOI]

Matt Shannon

William Byrne

IEEE Trans. Speech Audio Process., 2013

Speech Synthesis Based on Hidden Markov Models.

[BibT_eX]

[DOI]

Proc. IEEE, 2013

Deep learning in speech synthesis.

[BibT_eX]

[DOI]

Proceedings of the Eighth ISCA Tutorial and Research Workshop on Speech Synthesis, 2013

Statistical parametric speech synthesis using deep neural networks.

[BibT_eX]

[DOI]

Andrew W. Senior

Mike Schuster

Proceedings of the IEEE International Conference on Acoustics, 2013

2012

Product of Experts for Statistical Parametric Speech Synthesis.

[BibT_eX]

[DOI]

IEEE Trans. Speech Audio Process., 2012

Statistical Parametric Speech Synthesis Based on Speaker and Language Factorization.

[BibT_eX]

[DOI]

Cassia Valentini-Botinhao

Norbert Braunschweiler

IEEE Trans. Speech Audio Process., 2012

Combining multiple high quality corpora for improving HMM-TTS.

[BibT_eX]

[DOI]

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Cepstral analysis based on the glimpse proportion measure for improving the intelligibility of HMM-based synthetic speech in noise.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

2011

Continuous Stochastic Feature Mapping Based on Trajectory HMMs.

[BibT_eX]

[DOI]

IEEE Trans. Speech Audio Process., 2011

Context adaptive training with factorized decision trees for HMM-based statistical parametric speech synthesis.

[BibT_eX]

[DOI]

Speech Commun., 2011

Bayesian Context Clustering Using Cross Validation for Speech Recognition.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2011

The Effect of Using Normalized Models in Statistical Speech Synthesis.

[BibT_eX]

[DOI]

Matt Shannon

William J. Byrne

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Gaussian Process Experts for Voice Conversion.

[BibT_eX]

[DOI]

Nicholas Pilkington

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Multipulse Sequences for Residual Signal Modeling.

[BibT_eX]

[DOI]

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Estimation of Window Coefficients for Dynamic Feature Extraction for HMM-Based Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Decision tree-based context clustering based on cross validation and hierarchical priors.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2011

2010

A Covariance-Tying Technique for HMM-Based Speech Synthesis.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2010

HMM-based polyglot speech synthesis by speaker and language adaptive training.

[BibT_eX]

[DOI]

Norbert Braunschweiler

Proceedings of the Seventh ISCA Tutorial and Research Workshop on Speech Synthesis, 2010

Statistical parametric speech synthesis with joint estimation of acoustic and excitation model parameters.

[BibT_eX]

[DOI]

Ranniery Maia

Proceedings of the Seventh ISCA Tutorial and Research Workshop on Speech Synthesis, 2010

Speaker and language adaptive training for HMM-based polyglot speech synthesis.

[BibT_eX]

[DOI]

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Context adaptive training with factorized decision trees for HMM-based speech synthesis.

[BibT_eX]

[DOI]

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

An implementation of decision tree-based context clustering on graphics processing units.

[BibT_eX]

[DOI]

Nicholas Pilkington

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Training a parametric-based logF0 model with the minimum generation error criterion.

[BibT_eX]

[DOI]

Javier Latorre

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Statistical parametric speech synthesis based on product of experts.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2010

2009

Robust Speaker-Adaptive HMM-Based Text-to-Speech Synthesis.

[BibT_eX]

[DOI]

IEEE Trans. Speech Audio Process., 2009

Context-dependent additive log f_0 model for HMM-based speech synthesis.

[BibT_eX]

[DOI]

Norbert Braunschweiler

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Tying covariance matrices to reduce the footprint of HMM-based speech synthesis systems.

[BibT_eX]

[DOI]

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Stereo-based stochastic noise compensation based on trajectory GMMS.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2009

A Bayesian approach to HMM-based speech synthesis.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2009

2008

A Fully Consistent Hidden Semi-Markov Model-Based Speech Recognition System.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2008

Probabilistic feature mapping based on trajectory HMMs.

[BibT_eX]

[DOI]

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Acoustic modeling based on model structure annealing for speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Unsupervised adaptation for HMM-based speech synthesis.

[BibT_eX]

[DOI]

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Bayesian context clustering using cross valid prior distribution for HMM-based speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Performance evaluation of the speaker-independent HMM-based speech synthesis system "HTS 2007" for the Blizzard Challenge 2007.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2008

Acoustic modeling with contextual additive structure for HMM-based speech recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2008

The HTS-2008 System: Yet Another Evaluation of the Speaker-Adaptive HMM-based Speech Synthesis System in The 2008 Blizzard Challenge.

[BibT_eX]

[DOI]

Proceedings of the Blizzard Challenge 2008, 2008

2007

Details of the Nitech HMM-Based Speech Synthesis System for the Blizzard Challenge 2005.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2007

A Hidden Semi-Markov Model-Based Speech Synthesis System.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2007

State Duration Modeling for HMM-Based Speech Synthesis.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2007

Reformulating the HMM as a trajectory model by imposing explicit relationships between static and dynamic feature vector sequences.

[BibT_eX]

[DOI]

Comput. Speech Lang., 2007

The HMM-based speech synthesis system (HTS) version 2.0.

[BibT_eX]

[DOI]

Proceedings of the Sixth ISCA Workshop on Speech Synthesis, 2007

Improved average-voice-based speech synthesis using gender-mixed modeling and a parameter generation algorithm considering GV.

[BibT_eX]

[DOI]

Proceedings of the Sixth ISCA Workshop on Speech Synthesis, 2007

An excitation model for HMM-based speech synthesis based on residual modeling.

[BibT_eX]

[DOI]

Proceedings of the Sixth ISCA Workshop on Speech Synthesis, 2007

Model-space MLLR for trajectory HMMs.

[BibT_eX]

[DOI]

Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

A trainable excitation model for HMM-based speech synthesis.

[BibT_eX]

[DOI]

Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Statistical Parametric Speech Synthesis.

[BibT_eX]

[DOI]

Alan W. Black

Proceedings of the IEEE International Conference on Acoustics, 2007

Speaker-independent HMM-based speech synthesis system - HTS-2007 system for the Blizzard Challenge 2007.

[BibT_eX]

[DOI]

Proceedings of the Evaluation of text-to-speech systems: Blizzard Challenge 2007, 2007

2006

Speaker adaptation of trajectory HMMs using feature-space MLLR.

[BibT_eX]

[DOI]

Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

An HMM-based singing voice synthesis system.

[BibT_eX]

[DOI]

Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

Estimating Trajectory Hmm Parameters Using Monte Carlo Em With Gibbs Sampler.

[BibT_eX]

[DOI]

Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

Hidden Semi-Markov Model Based Speech Recognition System using Weighted Finite-State Transducer.

[BibT_eX]

[DOI]

Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

The Nitech-NAIST HMM-based speech synthesis system for the Blizzard Challenge 2006.

[BibT_eX]

[DOI]

Tomoki Toda

Proceedings of the Blizzard Challenge 2006, Pittsburgh, PA, USA, September 16, 2006, 2006

2005

Simultaneous clustering of phonetic context, dimension, and state position for acoustic modeling using decision trees.

[BibT_eX]

[DOI]

Syst. Comput. Jpn., 2005

Continuous Speech Recognition Based on General Factor Dependent Acoustic Models.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2005

Applying Sparse KPCA for Feature Extraction in Speech Recognition.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2005

Deterministic Annealing EM Algorithm in Acoustic Modeling for Speaker and Speech Recognition.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2005

An overview of nitech HMM-based speech synthesis system for blizzard challenge 2005.

[BibT_eX]

[DOI]

Tomoki Toda

Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

On building a concatenative speech synthesis system from the blizzard challenge speech databases.

[BibT_eX]

[DOI]

Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

Sparse KPCA for Feature Extraction in Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

2004

An introduction of trajectory model into HMM-based speech synthesis.

[BibT_eX]

[DOI]

Proceedings of the Fifth ISCA ITRW on Speech Synthesis, 2004

Hidden semi-Markov model based speech synthesis.

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Spoken Language Processing, 2004

Constructing emotional speech synthesizers with limited speech database.

[BibT_eX]

[DOI]

Murtaza Bulut

Shrikanth S. Narayanan

Ryosuke Tsuzuki

Proceedings of the 8th International Conference on Spoken Language Processing, 2004

Deterministic annealing EM algorithm in parameter estimation for acoustic model.

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Spoken Language Processing, 2004

A Viterbi algorithm for a trajectory model derived from HMM with explicit relationship between static and dynamic features.

[BibT_eX]

[DOI]

Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

2003

Decision tree-based simultaneous clustering of phonetic contexts, dimensions, and state positions for acoustic modeling.

[BibT_eX]

[DOI]

Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Trajectory modeling based on HMMs with the explicit relationship between static and dynamic features.

[BibT_eX]

[DOI]

Fernando Gil Vianna Resende Jr.

Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Towards the development of a brazilian portuguese text-to-speech system based on HMM.

[BibT_eX]

[DOI]

Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

On the use of kernel PCA for feature extraction in speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Speech recognition using voice-characteristic-dependent acoustic models.

[BibT_eX]

[DOI]

Proceedings of the 2003 IEEE International Conference on Acoustics, 2003

Improving the performance of HMM-based very low bit rate speech coding.

[BibT_eX]

[DOI]

Proceedings of the 2003 IEEE International Conference on Acoustics, 2003

2002

Decision tree distribution tying based on a dimensional split technique.

[BibT_eX]

[DOI]