Simon King

Orcid: 0000-0002-2694-2843

Affiliations:
  • University of Edinburgh, Centre for Speech Technology Research, Scotland, UK


According to our database1, Simon King authored at least 240 papers between 1996 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
The limits of the Mean Opinion Score for speech synthesis evaluation.
Comput. Speech Lang., March, 2024

Natural language guidance of high-fidelity text-to-speech with synthetic annotations.
CoRR, 2024

2023
Differentiable Grey-box Modelling of Phaser Effects using Frame-based Spectral Processing.
CoRR, 2023

Using a Large Language Model to Control Speaking Style for Expressive TTS.
CoRR, 2023

Do Prosody Transfer Models Transfer Prosody?
CoRR, 2023

Cognitive Load of Modern TTS Systems Under Noisy Conditions.
Proceedings of the International Workshop on Cognitive AI 2023 co-located with the 3rd International Conference on Learning & Reasoning (IJCLR 2023), 2023

Autovocoder: Fast Waveform Generation from a Learned Speech Representation Using Differentiable Digital Signal Processing.
Proceedings of the IEEE International Conference on Acoustics, 2023

Ensemble Prosody Prediction For Expressive Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2023

Do Prosody Transfer Models Transfer Prosodyƒ.
Proceedings of the IEEE International Conference on Acoustics, 2023

2022
Combining conversational speech with read speech to improve prosody in Text-to-Speech synthesis.
Proceedings of the Interspeech 2022, 2022

Back to the Future: Extending the Blizzard Challenge 2013.
Proceedings of the Interspeech 2022, 2022

Speech Audio Corrector: using speech from non-target speakers for one-off correction of mispronunciations in grapheme-input text-to-speech.
Proceedings of the Interspeech 2022, 2022

2021
An Overview of Voice Conversion and Its Challenges: From Statistical Modeling to Deep Learning.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Detection and Analysis of Attention Errors in Sequence-to-Sequence Text-to-Speech.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

ADEPT: A Dataset for Evaluating Prosody Transfer.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

Ctrl-P: Temporal Control of Prosodic Variation for Speech Synthesis.
Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August, 2021

2020
A Vector Quantized Variational Autoencoder (VQ-VAE) Autoregressive Neural F<sub>0</sub> Model for Statistical Parametric Speech Synthesis.
IEEE ACM Trans. Audio Speech Lang. Process., 2020

Using previous acoustic context to improve Text-to-Speech synthesis.
CoRR, 2020

Perception of prosodic variation for speech synthesis using an unsupervised discrete representation of F0.
CoRR, 2020

Comparison of Speech Representations for Automatic Quality Estimation in Multi-Speaker Text-to-Speech Synthesis.
Proceedings of the Odyssey 2020: The Speaker and Language Recognition Workshop, 2020

Hider-Finder-Combiner: An Adversarial Architecture for General Speech Signal Modification.
Proceedings of the Interspeech 2020, 2020

An Unsupervised Method to Select a Speaker Subset from Large Multi-Speaker Speech Synthesis Datasets.
Proceedings of the Interspeech 2020, 2020

Testing the Limits of Representation Mixing for Pronunciation Correction in End-to-End Speech Synthesis.
Proceedings of the Interspeech 2020, 2020

A Sound Engineering Approach to Near End Listening Enhancement.
Proceedings of the Interspeech 2020, 2020

Speaker Adaptation of a Multilingual Acoustic Model for Cross-Language Synthesis.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Persuasive Synthetic Speech: Voice Perception and User Behaviour.
Proceedings of the 2nd Conference on Conversational User Interfaces, 2020

2019
Enriched communication across the lifespan.
Proces. del Leng. Natural, 2019

Using generative modelling to produce varied intonation for speech synthesis.
CoRR, 2019

Disentangling Style Factors from Speaker Representations.
Proceedings of the Interspeech 2019, 2019

Using Pupil Dilation to Measure Cognitive Load When Listening to Text-to-Speech in Quiet and in Noise.
Proceedings of the Interspeech 2019, 2019

Investigating the Robustness of Sequence-to-Sequence Text-to-Speech Models to Imperfectly-Transcribed Training Data.
Proceedings of the Interspeech 2019, 2019

Evaluating Near End Listening Enhancement Algorithms in Realistic Environments.
Proceedings of the Interspeech 2019, 2019

Improving Speech Synthesis with Discourse Relations.
Proceedings of the Interspeech 2019, 2019

Speech Waveform Reconstruction Using Convolutional Neural Networks with Noise and Periodic Inputs.
Proceedings of the IEEE International Conference on Acoustics, 2019

Attentive Filtering Networks for Audio Replay Attack Detection.
Proceedings of the IEEE International Conference on Acoustics, 2019

2018
Analysing Shortcomings of Statistical Parametric Speech Synthesis.
CoRR, 2018

Exploring the robustness of features and enhancement on speech recognition systems in highly-reverberant real environments.
CoRR, 2018

Examplar-Based Speechwaveform Generation for Text-To-Speech.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Exemplar-based Speech Waveform Generation.
Proceedings of the Interspeech 2018, 2018

Impact of Different Speech Types on Listening Effort.
Proceedings of the Interspeech 2018, 2018

Learning Interpretable Control Dimensions for Speech Synthesis by Using External Data.
Proceedings of the Interspeech 2018, 2018

Measuring the Cognitive Load of Synthetic Speech Using a Dual Task Paradigm.
Proceedings of the Interspeech 2018, 2018

Using Pupillometry to Measure the Cognitive Load of Synthetic Speech.
Proceedings of the Interspeech 2018, 2018

2017
Using Eigenvoices and Nearest-Neighbors in HMM-Based Cross-Lingual Speaker Adaptation With Limited Data.
IEEE ACM Trans. Audio Speech Lang. Process., 2017

Locally Normalized Filter Banks Applied to Deep Neural-Network-Based Robust Speech Recognition.
IEEE Signal Process. Lett., 2017

A Hierarchical Encoder-Decoder Model for Statistical Parametric Speech Synthesis.
Proceedings of the Interspeech 2017, 2017

Nativization of Foreign Names in TTS for Automatic Reading of World News in Swahili.
Proceedings of the Interspeech 2017, 2017

Direct Modelling of Magnitude and Phase Spectra for Statistical Parametric Speech Synthesis.
Proceedings of the Interspeech 2017, 2017

The blizzard machine learning challenge 2017.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

2016
Anti-Spoofing for Text-Independent Speaker Verification: An Initial Database, Comparison of Countermeasures, and Human Performance.
IEEE ACM Trans. Audio Speech Lang. Process., 2016

Improving Trajectory Modelling for DNN-Based Speech Synthesis by Using Stacked Bottleneck Features and Minimum Generation Error Training.
IEEE ACM Trans. Audio Speech Lang. Process., 2016

ALISA: An automatic lightly supervised speech segmentation and alignment tool.
Comput. Speech Lang., 2016

Improving Trajectory Modelling for DNN-based Speech Synthesis by using Stacked Bottleneck Features and Minimum Trajectory Error Training.
CoRR, 2016

Investigating gated recurrent neural networks for speech synthesis.
CoRR, 2016

Merlin: An Open Source Neural Network Speech Synthesis System.
Proceedings of the 9th ISCA Speech Synthesis Workshop, 2016

A Demonstration of the Merlin Open Source Neural Network Speech Synthesis System.
Proceedings of the 9th ISCA Speech Synthesis Workshop, 2016

DNN-based Speech Synthesis for Indian Languages from ASCII text.
Proceedings of the 9th ISCA Speech Synthesis Workshop, 2016

Median-based generation of synthetic speech durations using a non-parametric approach.
Proceedings of the 2016 IEEE Spoken Language Technology Workshop, 2016

A Template-Based Approach for Speech Synthesis Intonation Generation Using LSTMs.
Proceedings of the Interspeech 2016, 2016

The Use of Locally Normalized Cepstral Coefficients (LNCC) to Improve Speaker Recognition Accuracy in Highly Reverberant Rooms.
Proceedings of the Interspeech 2016, 2016

Waveform Generation Based on Signal Reshaping for Statistical Parametric Speech Synthesis.
Proceedings of the Interspeech 2016, 2016

GlottDNN - A Full-Band Glottal Vocoder for Statistical Parametric Speech Synthesis.
Proceedings of the Interspeech 2016, 2016

Investigating gated recurrent networks for speech synthesis.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

From HMMS to DNNS: Where do the improvements come from?
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Smooth talking: Articulatory join costs for unit selection.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Deep neural network-guided unit selection synthesis.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Robust TTS duration modelling using DNNS.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Testing the consistency assumption: Pronunciation variant forced alignment in read and spontaneous speech synthesis.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2015
Soft context clustering for F0 modeling in HMM-based speech synthesis.
EURASIP J. Adv. Signal Process., 2015

A perceptually-motivated low-complexity instantaneous linear channel normalization technique applied to speaker verification.
Comput. Speech Lang., 2015

Phonetic segmentation of speech using STEP and t-SNE.
Proceedings of the International Conference on Speech Technology and Human-Computer Dialogue, 2015

A Comparison of Manual and Automatic Voice Repair for Individual with Vocal Disabilities.
Proceedings of the 6th Workshop on Speech and Language Processing for Assistive Technologies, 2015

A study of speaker adaptation for DNN-based speech synthesis.
Proceedings of the INTERSPEECH 2015, 2015

Minimum trajectory error training for deep neural networks, combined with stacked bottleneck features.
Proceedings of the INTERSPEECH 2015, 2015

Sentence-level control vectors for deep neural network speech synthesis.
Proceedings of the INTERSPEECH 2015, 2015

Towards minimum perceptual error training for DNN-based speech synthesis.
Proceedings of the INTERSPEECH 2015, 2015

Deep neural network context embeddings for model selection in rich-context HMM synthesis.
Proceedings of the INTERSPEECH 2015, 2015

Reconstructing voices within the multiple-average-voice-model framework.
Proceedings of the INTERSPEECH 2015, 2015

Robustness to additive noise of locally-normalized cepstral coefficients in speaker verification.
Proceedings of the INTERSPEECH 2015, 2015

What speech synthesis can do for you (and what you can do for speech synthesis).
Proceedings of the 18th International Congress of Phonetic Sciences, 2015

Deep neural networks employing Multi-Task Learning and stacked bottleneck features for speech synthesis.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

SAS: A speaker verification spoofing database containing diverse attacks.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Attributing modelling errors in HMM synthesis by stepping gradually from natural to modelled speech.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

2014
Statistical parametric speech synthesis for Ibibio.
Speech Commun., 2014

Introduction to the Issue on Statistical Parametric Speech Synthesis.
IEEE J. Sel. Top. Signal Process., 2014

Context-dependent acoustic modeling based on hidden maximum entropy model for statistical parametric speech synthesis.
EURASIP J. Audio Speech Music. Process., 2014

Intelligibility enhancement of HMM-generated speech in additive noise by modifying Mel cepstral coefficients to increase the glimpse proportion.
Comput. Speech Lang., 2014

Feature analysis for discriminative confidence estimation in spoken term detection.
Comput. Speech Lang., 2014

Introduction to the Special Issue on The listening talker: context-dependent speech production and perception.
Comput. Speech Lang., 2014

The listening talker: A review of human and algorithmic context-induced modifications of speech.
Comput. Speech Lang., 2014

Unsupervised lexical clustering of speech segments using fixed-dimensional acoustic embeddings.
Proceedings of the 2014 IEEE Spoken Language Technology Workshop, 2014

Investigating source and filter contributions, and their interaction, to statistical parametric speech synthesis.
Proceedings of the INTERSPEECH 2014, 2014

Development of a genre-dependent TTS system with cross-speaker speaking-style transplantation.
Proceedings of the 2nd International Workshop on Speech, Language and Audio in Multimedia, 2014

Measuring the perceptual effects of modelling assumptions in speech synthesis using stimuli constructed from repeated natural speech.
Proceedings of the INTERSPEECH 2014, 2014

A comparison of open-source segmentation architectures for dealing with imperfect data from the media in speech synthesis.
Proceedings of the INTERSPEECH 2014, 2014

Investigating automatic & human filled pause insertion for speech synthesis.
Proceedings of the INTERSPEECH 2014, 2014

Neural net word representations for phrase-break prediction without a part of speech tagger.
Proceedings of the IEEE International Conference on Acoustics, 2014

Multiple-average-voice-based speech synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2014

Voice source modelling using deep neural networks for statistical parametric speech synthesis.
Proceedings of the 22nd European Signal Processing Conference, 2014

2013
Cross-Lingual Automatic Speech Recognition Using Tandem Features.
IEEE ACM Trans. Audio Speech Lang. Process., 2013

Recording speech articulation in dialogue: Evaluating a synchronized double electromagnetic articulography setup.
J. Phonetics, 2013

Personalising speech-to-speech translation: Unsupervised cross-lingual speaker adaptation for HMM-based speech synthesis.
Comput. Speech Lang., 2013

Noise robustness in HMM-TTS speaker adaptation.
Proceedings of the Eighth ISCA Tutorial and Research Workshop on Speech Synthesis, 2013

Unsupervised and lightly-supervised learning for rapid construction of TTS systems in multiple languages from 'found' data: evaluation and analysis.
Proceedings of the Eighth ISCA Tutorial and Research Workshop on Speech Synthesis, 2013

Using neighbourhood density and selective SNR boosting to increase the intelligibility of synthetic speech in noise.
Proceedings of the Eighth ISCA Tutorial and Research Workshop on Speech Synthesis, 2013

Multilingual number transcription for text-to-speech conversion.
Proceedings of the Eighth ISCA Tutorial and Research Workshop on Speech Synthesis, 2013

Investigating the shortcomings of HMM synthesis.
Proceedings of the Eighth ISCA Tutorial and Research Workshop on Speech Synthesis, 2013

Using adaptation to improve speech transcription alignment in noisy and reverberant environments.
Proceedings of the Eighth ISCA Tutorial and Research Workshop on Speech Synthesis, 2013

Combining a vector space representation of linguistic context with a deep neural network for text-to-speech synthesis.
Proceedings of the Eighth ISCA Tutorial and Research Workshop on Speech Synthesis, 2013

Mage - HMM-based speech synthesis reactively controlled by the articulators.
Proceedings of the Eighth ISCA Tutorial and Research Workshop on Speech Synthesis, 2013

Mage - reactive articulatory feature control of HMM-based parametric speech synthesis.
Proceedings of the Eighth ISCA Tutorial and Research Workshop on Speech Synthesis, 2013

Towards Personalised Synthesised Voices for Individuals with Vocal Disabilities: Voice Banking and Reconstruction.
Proceedings of the Fourth Workshop on Speech and Language Processing for Assistive Technologies, 2013

The voice bank corpus: Design, collection and data analysis of a large regional accent speech database.
Proceedings of the 2013 International Conference Oriental COCOSDA held jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE), 2013

Combining perceptually-motivated spectral shaping with loudness and duration modification for intelligibility enhancement of HMM-based synthetic speech in noise.
Proceedings of the INTERSPEECH 2013, 2013

TUNDRA: a multilingual corpus of found data for TTS research created with light supervision.
Proceedings of the INTERSPEECH 2013, 2013

Lightly supervised discriminative training of grapheme models for improved sentence-level alignment of speech and text data.
Proceedings of the INTERSPEECH 2013, 2013

The edinburgh speech production facility doubletalk corpus.
Proceedings of the INTERSPEECH 2013, 2013

Combining in-domain and out-of-domain speech data for automatic recognition of disordered speech.
Proceedings of the INTERSPEECH 2013, 2013

Reactive accent interpolation through an interactive map application.
Proceedings of the INTERSPEECH 2013, 2013

Improving intelligibility in noise of HMM-generated speech via noise-dependent and -independent methods.
Proceedings of the IEEE International Conference on Acoustics, 2013

Where are the challenges in speaker diarization?
Proceedings of the IEEE International Conference on Acoustics, 2013

Lightly supervised GMM VAD to use audiobook for speech synthesiser.
Proceedings of the IEEE International Conference on Acoustics, 2013

Factorized context modelling for Text-to-Speech synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2013

Discriminative tandem features for HMM-based EEG classification.
Proceedings of the 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 2013

2012
Direct posterior confidence for out-of-vocabulary spoken term detection.
ACM Trans. Inf. Syst., 2012

Analysis of unsupervised cross-lingual speaker adaptation for HMM-based speech synthesis using KLD-based transform mapping.
Speech Commun., 2012

Impacts of machine translation and speech synthesis on speech-to-speech translation.
Speech Commun., 2012

Term-Dependent Confidence Normalisation for Out-of-Vocabulary Spoken Term Detection.
J. Comput. Sci. Technol., 2012

A grapheme-based method for automatic alignment of speech and text data.
Proceedings of the 2012 IEEE Spoken Language Technology Workshop (SLT), 2012

Noise-robust whispered speech recognition using a non-audible-murmur microphone with VTS compensation.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012

Using HMM-based Speech Synthesis to Reconstruct the Voice of Individuals with Degenerative Speech Disorders.
Proceedings of the INTERSPEECH 2012, 2012

Mel cepstral coefficient modification based on the Glimpse Proportion measure for improving the intelligibility of HMM-generated synthetic speech in noise.
Proceedings of the INTERSPEECH 2012, 2012

Evaluating speech intelligibility enhancement for HMM-based synthetic speech in noise.
Proceedings of the ISCA Workshop on Statistical And Perceptual Audition, 2012

Detecting Acronyms from Capital Letter Sequences in Spanish.
Proceedings of the INTERSPEECH 2012, 2012

Using Bayesian Networks to find relevant context features for HMM-based speech synthesis.
Proceedings of the INTERSPEECH 2012, 2012

Analysis of speaker clustering strategies for HMM-based speech synthesis.
Proceedings of the INTERSPEECH 2012, 2012

Cepstral analysis based on the glimpse proportion measure for improving the intelligibility of HMM-based synthetic speech in noise.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

2011
Stochastic Pronunciation Modeling for Out-of-Vocabulary Spoken Term Detection.
IEEE Trans. Speech Audio Process., 2011

Letter-to-Sound Pronunciation Prediction Using Conditional Random Fields.
IEEE Signal Process. Lett., 2011

The Romanian speech synthesis (RSS) corpus: Building a high quality HMM-based speech synthesis system using a high sampling rate.
Speech Commun., 2011

Listeners' weighting of acoustic cues to synthetic speech naturalness: A multidimensional scaling analysis.
Speech Commun., 2011

Unsupervised Continuous-Valued Word Features for Phrase-Break Prediction without a Part-of-Speech Tagger.
Proceedings of the INTERSPEECH 2011, 2011

Can Objective Measures Predict the Intelligibility of Modified HMM-Based Synthetic Speech in Noise?
Proceedings of the INTERSPEECH 2011, 2011

Announcing the Electromagnetic Articulography (Day 1) Subset of the mngu0 Articulatory Corpus.
Proceedings of the INTERSPEECH 2011, 2011

Formant-Controlled HMM-Based Speech Synthesis.
Proceedings of the INTERSPEECH 2011, 2011

Handling overlaps in spoken term detection.
Proceedings of the IEEE International Conference on Acoustics, 2011

Evaluation of objective measures for intelligibility prediction of HMM-based synthetic speech in noise.
Proceedings of the IEEE International Conference on Acoustics, 2011

An analysis of machine translation and speech synthesis in speech-to-speech translation system.
Proceedings of the IEEE International Conference on Acoustics, 2011

Vocal attractiveness of statistical speech synthesisers.
Proceedings of the IEEE International Conference on Acoustics, 2011

Voice banking and voice reconstruction for MND patients.
Proceedings of the 13th International ACM SIGACCESS Conference on Computers and Accessibility, 2011

2010
Thousands of Voices for HMM-Based Speech Synthesis-Analysis and Application of TTS Systems Built on Various ASR Corpora.
IEEE Trans. Speech Audio Process., 2010

Synthesis of Child Speech With HMM Adaptation and Voice Conversion.
IEEE Trans. Speech Audio Process., 2010

Analysis of statistical parametric and unit selection speech synthesis systems applied to emotional speech.
Speech Commun., 2010

Measuring the Gap Between HMM-Based ASR and TTS.
IEEE J. Sel. Top. Signal Process., 2010

Speaker adaptation and the evaluation of speaker similarity in the EMIME speech-to-speech translation project.
Proceedings of the Seventh ISCA Tutorial and Research Workshop on Speech Synthesis, 2010

Letter-based speech synthesis.
Proceedings of the Seventh ISCA Tutorial and Research Workshop on Speech Synthesis, 2010

Speech synthesis without the right data.
Proceedings of the Seventh ISCA Tutorial and Research Workshop on Speech Synthesis, 2010

Evans, Joe Frankel, Raphaël Troncy: Direct posterior confidence for out-of-vocabulary spoken term detection.
Proceedings of the 2010 International Workshop on Searching Spontaneous Conversational Speech, 2010

Roles of the average voice in speaker-adaptive HMM-based speech synthesis.
Proceedings of the INTERSPEECH 2010, 2010

The role of higher-level linguistic features in HMM-based speech synthesis.
Proceedings of the INTERSPEECH 2010, 2010

CRF-based stochastic pronunciation modeling for out-of-vocabulary spoken term detection.
Proceedings of the INTERSPEECH 2010, 2010

Augmented set of features for confidence estimation in spoken term detection.
Proceedings of the INTERSPEECH 2010, 2010

A classifier-based target cost for unit selection speech synthesis trained on perceptual data.
Proceedings of the INTERSPEECH 2010, 2010

Simple methods for improving speaker-similarity of HMM-based speech synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2010

Stochastic pronunciation modelling and soft match for out-of-vocabulary spoken term detection.
Proceedings of the IEEE International Conference on Acoustics, 2010

Unsupervised cross-lingual speaker adaptation for HMM-based speech synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2010


2009
Robust Speaker-Adaptive HMM-Based Text-to-Speech Synthesis.
IEEE Trans. Speech Audio Process., 2009

Thousands of voices for HMM-based speech synthesis.
Proceedings of the INTERSPEECH 2009, 2009

HMM adaptation and voice conversion for the synthesis of child speech: a comparison.
Proceedings of the INTERSPEECH 2009, 2009

Term-dependent confidence for out-of-vocabulary term detection.
Proceedings of the INTERSPEECH 2009, 2009

Stochastic pronunciation modelling for spoken term detection.
Proceedings of the INTERSPEECH 2009, 2009

A posterior probability-based system hybridisation and combination for spoken term detection.
Proceedings of the INTERSPEECH 2009, 2009

Speech synthesis without a phone inventory.
Proceedings of the INTERSPEECH 2009, 2009

Posterior-based confidence measures for spoken term detection.
Proceedings of the IEEE International Conference on Acoustics, 2009

Diagonal priors for full covariance speech recognition.
Proceedings of the 2009 IEEE Workshop on Automatic Speech Recognition & Understanding, 2009

2008
A comparison of grapheme and phoneme-based units for Spanish spoken term detection.
Speech Commun., 2008

Bayesian networks for phone duration prediction.
Speech Commun., 2008

HMM-based synthesis of child speech.
Proceedings of the 1st Workshop on Child, Computer and Interaction, 2008

Cross-Lingual Speaker Adaptation for HMM-Based Speech Synthesis.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008

Robustness of HMM-based speech synthesis.
Proceedings of the INTERSPEECH 2008, 2008

A posterior approach for microphone array based speech recognition.
Proceedings of the INTERSPEECH 2008, 2008

Cross-lingual portability of MLP-based tandem features - a case study for English and Hungarian.
Proceedings of the INTERSPEECH 2008, 2008

Investigating festival's target cost function using perceptual experiments.
Proceedings of the INTERSPEECH 2008, 2008

Unsupervised adaptation for HMM-based speech synthesis.
Proceedings of the INTERSPEECH 2008, 2008

Growing bottleneck features for tandem ASR.
Proceedings of the INTERSPEECH 2008, 2008

Covariance updates for discriminative training by constrained line search.
Proceedings of the INTERSPEECH 2008, 2008

A shrinkage estimator for speech recognition with full covariance HMMs.
Proceedings of the INTERSPEECH 2008, 2008

A comparison of phone and grapheme-based spoken term detection.
Proceedings of the IEEE International Conference on Acoustics, 2008

2007
Speech Recognition Using Linear Dynamic Models.
IEEE Trans. Speech Audio Process., 2007

Multisyn: Open-domain unit selection for the Festival speech synthesis system.
Speech Commun., 2007

Factoring Gaussian precision matrices for linear dynamic models.
Pattern Recognit. Lett., 2007

Articulatory feature recognition using dynamic Bayesian networks.
Comput. Speech Lang., 2007

Improved average-voice-based speech synthesis using gender-mixed modeling and a parameter generation algorithm considering GV.
Proceedings of the Sixth ISCA Workshop on Speech Synthesis, 2007

Single speaker segmentation and inventory selection using dynamic time warping self organization and joint multigram mapping.
Proceedings of the Sixth ISCA Workshop on Speech Synthesis, 2007

Modelling prominence and emphasis improves unit-selection synthesis.
Proceedings of the INTERSPEECH 2007, 2007

Articulatory feature classifiers trained on 2000 hours of telephone speech.
Proceedings of the INTERSPEECH 2007, 2007

Sparse Gaussian graphical models for speech recognition.
Proceedings of the INTERSPEECH 2007, 2007

Articulatory Feature-Based Methods for Acoustic and Audio-Visual Speech Recognition: Summary from the 2006 JHU Summer workshop.
Proceedings of the IEEE International Conference on Acoustics, 2007

Manual Transcription of Conversational Speech at the Articulatory Feature Level.
Proceedings of the IEEE International Conference on Acoustics, 2007

An Articulatory Feature-Based Tandem Approach and Factored Observation Modeling.
Proceedings of the IEEE International Conference on Acoustics, 2007

Monolingual and crosslingual comparison of tandem features derived from articulatory and phone MLPS.
Proceedings of the IEEE Workshop on Automatic Speech Recognition & Understanding, 2007

2006
Subjective evaluation of join cost and smoothing methods for unit selection speech synthesis.
IEEE Trans. Speech Audio Process., 2006

Observation process adaptation for linear dynamic models.
Speech Commun., 2006

Expressive prosody for unit-selection speech synthesis.
Proceedings of the INTERSPEECH 2006, 2006

Joint prosodic and segmental unit selection speech synthesis.
Proceedings of the INTERSPEECH 2006, 2006

2005
Inductive String Template-Based Learning of Spoken Language.
Proceedings of the Pattern Recognition in Information Systems, 2005

Multidimensional scaling of listener responses to synthetic speech.
Proceedings of the INTERSPEECH 2005, 2005

SVitchboard 1: small vocabulary tasks from Switchboard.
Proceedings of the INTERSPEECH 2005, 2005

Predicting consonant duration with Bayesian belief networks.
Proceedings of the INTERSPEECH 2005, 2005

A hybrid ANN/DBN approach to articulatory feature recognition.
Proceedings of the INTERSPEECH 2005, 2005

Multisyn voices from ARCTIC data for the blizzard challenge.
Proceedings of the INTERSPEECH 2005, 2005

Genetic triangulation of graphical models for speech and language processing.
Proceedings of the INTERSPEECH 2005, 2005

Detection of Symbolic Gestural Events in Articulatory Data for Use in Structural Representations of Continuous Speech.
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

2004
Subjective evaluation of join cost & smoothing methods.
Proceedings of the Fifth ISCA ITRW on Speech Synthesis, 2004

Accurate spectral envelope estimation for articulation-to-speech synthesis.
Proceedings of the Fifth ISCA ITRW on Speech Synthesis, 2004

Festival 2 - build your own general purpose unit selection speech synthesiser.
Proceedings of the Fifth ISCA ITRW on Speech Synthesis, 2004

Subjective evaluation of join cost functions used in unit selection speech synthesis.
Proceedings of the INTERSPEECH 2004, 2004

Source-filter separation for articulation-to-speech synthesis.
Proceedings of the INTERSPEECH 2004, 2004

Estimating detailed spectral envelopes using articulatory clustering.
Proceedings of the INTERSPEECH 2004, 2004

Phone classification in pseudo-euclidean vector spaces.
Proceedings of the INTERSPEECH 2004, 2004

Structural Representation of Speech for Phonetic Classification.
Proceedings of the 17th International Conference on Pattern Recognition, 2004

2003
Dependence and independence in automatic speech recognition and synthesis.
J. Phonetics, 2003

Modelling the uncertainty in recovering articulation from acoustics.
Comput. Speech Lang., 2003

Kalman-filter based join cost for unit-selection speech synthesis.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Estimation of voice source and vocal tract characteristics based on multi-frame analysis.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Estimating the spectral envelope of voiced speech using multi-frame analysis.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Discriminative methods for improving named entity extraction on speech data.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Named entity extraction from word lattices.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Transforming F0 contours.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Transforming voice quality.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

2002
Objective distance measures for spectral discontinuities in concatenative speech synthesis.
Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

Framewise phone classification using support vector machines.
Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

2001
ASR - articulatory speech recognition.
Proceedings of the EUROSPEECH 2001 Scandinavia, 2001

2000
Detection of phonological features in continuous speech using neural networks.
Comput. Speech Lang., 2000

An automatic speech recognition system using neural networks and linear dynamic models to recover and model articulatory traces.
Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

1998
Speech recognition via phonetically featured syllables.
Proceedings of the 5th International Conference on Spoken Language Processing, Incorporating The 7th Australian International Speech Science and Technology Conference, Sydney Convention Centre, Sydney, Australia, 30th November, 1998

1997
Using intonation to constrain language models in speech recognition.
Proceedings of the Fifth European Conference on Speech Communication and Technology, 1997

Speech synthesis using non-uniform units in the Verbmobil project.
Proceedings of the Fifth European Conference on Speech Communication and Technology, 1997

1996
Using prosodic information to constrain language models for spoken dialogue.
Proceedings of the 4th International Conference on Spoken Language Processing, 1996


  Loading...