Jindrich Matousek

Lukás Vladar

Proceedings of the Text, Speech, and Dialogue - 28th International Conference, 2025

An Exploration of ECAPA-TDNN and x-vector Speaker Representations in Zero-Shot Multi-speaker TTS.

[BibT_eX]

[DOI]

Marie Kunesová

Proceedings of the Text, Speech, and Dialogue - 28th International Conference, 2025

2024

Using LSTM neural networks for cross-lingual phonetic speech segmentation with an iterative correction procedure.

[BibT_eX]

[DOI]

Comput. Intell., April, 2024

T5G2P: Text-to-Text Transfer Transformer Based Grapheme-to-Phoneme Conversion.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2024

Effects of Training Strategies and the Amount of Speech Data on the Quality of Speech Synthesis.

[BibT_eX]

[DOI]

Lukás Vladar

Proceedings of the Text, Speech, and Dialogue - 27th International Conference, 2024

Sentences vs Phrases in Neural Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the Text, Speech, and Dialogue - 27th International Conference, 2024

Zero-Shot vs. Few-Shot Multi-speaker TTS Using Pre-trained Czech SpeechT5 Model.

[BibT_eX]

[DOI]

Proceedings of the Text, Speech, and Dialogue - 27th International Conference, 2024

Homograph Disambiguation with Text-to-Text Transfer Transformer.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

Zero-shot Out-of-domain is No Joke: Lessons Learned in the VoiceMOS 2023 MOS Prediction Challenge.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

2023

VITS: Quality Vs. Speed Analysis.

[BibT_eX]

[DOI]

Proceedings of the Text, Speech, and Dialogue - 26th International Conference, 2023

Neural Speech Synthesis with Enriched Phrase Boundaries.

[BibT_eX]

[DOI]

Marie Kunesová

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Ensemble of Deep Neural Network Models for MOS Prediction.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

T5G2P: Multilingual Grapheme-to-Phoneme Conversion with Text-to-Text Transfer Transformer.

[BibT_eX]

[DOI]

Proceedings of the Pattern Recognition - 7th Asian Conference, 2023

VITS, Tacotron or FastSpeech? Challenging Some of the Most Popular Synthesizers.

[BibT_eX]

[DOI]

Alice Tihelková

Proceedings of the Pattern Recognition - 7th Asian Conference, 2023

2022

Text-to-Text Transfer Transformer Phrasing Model Using Enriched Text Input.

[BibT_eX]

[DOI]

Proceedings of the Text, Speech, and Dialogue - 25th International Conference, 2022

On Comparison of Phonetic Representations for Czech Neural Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the Text, Speech, and Dialogue - 25th International Conference, 2022

Phonetic Speech Segmentation of Audiobooks by Using Adapted LSTM-Based Acoustic Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Artificial Intelligence - IBERAMIA 2022, 2022

Sequence-to-Sequence CNN-BiLSTM Based Glottal Closure Instant Detection from Raw Speech.

[BibT_eX]

[DOI]

Proceedings of the Artificial Neural Networks in Pattern Recognition, 2022

2021

On Comparison of XGBoost and Convolutional Neural Networks for Glottal Closure Instant Detection.

[BibT_eX]

[DOI]

Michal Vrastil

Proceedings of the Text, Speech, and Dialogue - 24th International Conference, 2021

How Much End-to-End is Tacotron 2 End-to-End TTS System.

[BibT_eX]

[DOI]

Alice Tihelková

Proceedings of the Text, Speech, and Dialogue - 24th International Conference, 2021

Human and Transformer-Based Prosodic Phrasing in Two Speech Genres.

[BibT_eX]

[DOI]

Jan Volín

Proceedings of the Speech and Computer - 23rd International Conference, 2021

Save Your Voice: Voice Banking and TTS for Anyone.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

A Comparison of Convolutional Neural Networks for Glottal Closure Instant Detection from Raw Speech.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

2020

Speech and web-based technology to enhance education for pupils with visual impairment.

[BibT_eX]

[DOI]

J. Multimodal User Interfaces, 2020

Dialogue act based expressive speech synthesis in limited domain for the Czech language.

[BibT_eX]

[DOI]

Informatica (Slovenia), 2020

Synthetic Speech Evaluation by 2D GMM Classification in Pleasure-Arousal Scale.

[BibT_eX]

[DOI]

Proceedings of the 43rd International Conference on Telecommunications and Signal Processing, 2020

Context-Aware XGBoost for Glottal Closure Instant Detection in Speech Signal.

[BibT_eX]

[DOI]

Michal Vrastil

Proceedings of the Text, Speech, and Dialogue, 2020

Synthetic Speech Evaluation by Differential Maps in Pleasure-Arousal Space.

[BibT_eX]

[DOI]

Proceedings of the Speech and Computer - 22nd International Conference, 2020

2019

Air traffic control communication (ATCC) speech corpora and their use for ASR and TTS development.

[BibT_eX]

[DOI]

Lang. Resour. Evaluation, 2019

Artefact Determination by GMM-Based Continuous Detection of Emotional Changes in Synthetic Speech.

[BibT_eX]

[DOI]

Proceedings of the 42nd International Conference on Telecommunications and Signal Processing, 2019

Czech Speech Synthesis with Generative Neural Vocoder.

[BibT_eX]

[DOI]

Proceedings of the Text, Speech, and Dialogue - 22nd International Conference, 2019

Evaluation of Synthetic Speech by GMM-Based Continuous Detection of Emotional States.

[BibT_eX]

[DOI]

Proceedings of the Text, Speech, and Dialogue - 22nd International Conference, 2019

Web-Based Speech Synthesis Editor.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Framework for Conducting Tasks Requiring Human Assessment.

[BibT_eX]

[DOI]

Adam Chýlek

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Using Extreme Gradient Boosting to Detect Glottal Closure Instants in Speech Signal.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

2018

Evaluation of Synthetic Speech Quality by Statistical Analysis of Voiced and Unvoiced Part Durations.

[BibT_eX]

[DOI]

Proceedings of the 41st International Conference on Telecommunications and Signal Processing, 2018

Current State of Text-to-Speech System ARTIC: A Decade of Research on the Field of Speech Technologies.

[BibT_eX]

[DOI]

Proceedings of the Text, Speech, and Dialogue - 21st International Conference, 2018

Automatic Evaluation of Synthetic Speech Quality by a System Based on Statistical Analysis.

[BibT_eX]

[DOI]

Proceedings of the Text, Speech, and Dialogue - 21st International Conference, 2018

First Steps Towards Hybrid Speech Synthesis in Czech TTS System ARTIC.

[BibT_eX]

[DOI]

Proceedings of the Speech and Computer - 20th International Conference, 2018

On the Contribution of Articulatory Features to Speech Synthesis.

[BibT_eX]

[DOI]

Martin Matura

Markéta Juzová

Proceedings of the Speech and Computer - 20th International Conference, 2018

Design and Development of Speech Corpora for Air Traffic Control Training.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Language Resources and Evaluation, 2018

Glottal Closure Instant Detection from Speech Signal Using Voting Classifier and Recursive Feature Elimination.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

On the Analysis of Training Data for Wavenet-Based Speech Synthesis.

[BibT_eX]

[DOI]

Jakub Vit

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017

Anomaly-based annotation error detection in speech-synthesis corpora.

[BibT_eX]

[DOI]

Comput. Speech Lang., 2017

Automatic Classification of Types of Artefacts Arising During the Unit Selection Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the Text, Speech, and Dialogue - 20th International Conference, 2017

Automatic Phonetic Segmentation Using the Kaldi Toolkit.

[BibT_eX]

[DOI]

Michal Klíma

Proceedings of the Text, Speech, and Dialogue - 20th International Conference, 2017

Annotation Error Detection: Anomaly Detection vs. Classification.

[BibT_eX]

[DOI]

Proceedings of the Speech and Computer - 19th International Conference, 2017

Classification-Based Detection of Glottal Closure Instants from Speech Signals.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Voice Conservation and TTS System for People Facing Total Laryngectomy.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

WebSubDub - Experimental System for Creating High-Quality Alternative Audio Track for TV Broadcasting.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

2016

Comparison of one and two-level architecture of the GMM-based speaker age classifier.

[BibT_eX]

[DOI]

Proceedings of the 39th International Conference on Telecommunications and Signal Processing, 2016

Unit-Selection Speech Synthesis Adjustments for Audiobook-Based Voices.

[BibT_eX]

[DOI]

Proceedings of the Text, Speech, and Dialogue - 19th International Conference, 2016

Evaluation of TTS Personification by GMM-Based Speaker Gender and Age Classifier.

[BibT_eX]

[DOI]

Proceedings of the Text, Speech, and Dialogue - 19th International Conference, 2016

On the Influence of the Number of Anomalous and Normal Examples in Anomaly-Based Annotation Errors Detection.

[BibT_eX]

[DOI]

Proceedings of the Text, Speech, and Dialogue - 19th International Conference, 2016

GMM-based speaker gender and age classification after voice conversion.

[BibT_eX]

[DOI]

Proceedings of the First International Workshop on Sensing, 2016

Designing High-Coverage Multi-level Text Corpus for Non-professional-voice Conservation.

[BibT_eX]

[DOI]

Markéta Juzová

Proceedings of the Speech and Computer - 18th International Conference, 2016

Voting Detector: A Combination of Anomaly Detectors to Reveal Annotation Errors in TTS Corpora.

[BibT_eX]

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

ARET - Automatic Reading of Educational Texts for Visually Impaired Students.

[BibT_eX]

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

2015

Detection of artefacts in czech synthetic speech based on ANOVA statistics.

[BibT_eX]

[DOI]

Proceedings of the 38th International Conference on Telecommunications and Signal Processing, 2015

Experiment with GMM-Based Artefact Localization in Czech Synthetic Speech.

[BibT_eX]

[DOI]

Proceedings of the Text, Speech, and Dialogue - 18th International Conference, 2015

Detection of Large Segmentation Errors with Score Predictive Model.

[BibT_eX]

[DOI]

Martin Matura

Proceedings of the Text, Speech, and Dialogue - 18th International Conference, 2015

Anomaly-based annotation errors detection in TTS corpora.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

2014

Modelling F0 Dynamics in Unit Selection Based Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the Text, Speech and Dialogue - 17th International Conference, 2014

GMM Classification of Text-to-Speech Synthesis: Identification of Original Speaker's Voice.

[BibT_eX]

[DOI]

Proceedings of the Text, Speech and Dialogue - 17th International Conference, 2014

Quality Improvements of Zero-Concatenation-Cost Chain Based Unit Selection.

[BibT_eX]

[DOI]

Proceedings of the Speech and Computer - 16th International Conference, 2014

Very fast unit selection using Viterbi search with zero-concatenation-cost chains.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2014

2013

Concatenation Artifact Detection Trained from Listeners Evaluations.

[BibT_eX]

[DOI]

Jakub Vit

Proceedings of the Text, Speech, and Dialogue - 16th International Conference, 2013

Experiment with Evaluation of Quality of the Synthetic Speech by the GMM Classifier.

[BibT_eX]

[DOI]

Proceedings of the Text, Speech, and Dialogue - 16th International Conference, 2013

SVM-Based Detection of Misannotated Words in Read Speech Corpora.

[BibT_eX]

[DOI]

Proceedings of the Text, Speech, and Dialogue - 16th International Conference, 2013

Configuring TTS Evaluation Method Based on Unit Cost Outlier Detection.

[BibT_eX]

[DOI]

Proceedings of the Text, Speech, and Dialogue - 16th International Conference, 2013

Experiments on Reducing Footprint of Unit Selection TTS System.

[BibT_eX]

[DOI]

Proceedings of the Text, Speech, and Dialogue - 16th International Conference, 2013

Is unit selection aware of audible artifacts?

[BibT_eX]

[DOI]

Proceedings of the Eighth ISCA Tutorial and Research Workshop on Speech Synthesis, 2013

Improvements in Czech Expressive Speech Synthesis in Limited Domain.

[BibT_eX]

[DOI]

Proceedings of the Speech and Computer - 15th International Conference, 2013

Annotation errors detection in TTS corpora.

[BibT_eX]

[DOI]

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

2012

On the Impact of Annotation Errors on Unit-Selection Speech Synthesis.

[BibT_eX]

[DOI]

Lubos Smídl

Proceedings of the Text, Speech and Dialogue - 15th International Conference, 2012

On the impact of labialization contexts on unit selection speech synthesis.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Signal Processing and Information Technology, 2012

Improving automatic dubbing with subtitle timing optimisation using video cut detection.

[BibT_eX]

[DOI]

Jakub Vit

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

2011

On the detection of pitch marks using a robust multi-phase algorithm.

[BibT_eX]

[DOI]

Speech Commun., 2011

Several Aspects of Machine-Driven Phrasing in Text-to-Speech Systems.

[BibT_eX]

[DOI]

Prague Bull. Math. Linguistics, 2011

Web-Based System for Automatic Reading of Technical Documents for Vision Impaired Students.

[BibT_eX]

[DOI]

Proceedings of the Text, Speech and Dialogue - 14th International Conference, 2011

Identifying Concatenation Discontinuities by Hierarchical Divisive Clustering of Pitch Contours.

[BibT_eX]

[DOI]

Proceedings of the Text, Speech and Dialogue - 14th International Conference, 2011

Analysis of Data Collected in Listening Tests for the Purpose of Evaluation of Concatenation Cost Functions.

[BibT_eX]

[DOI]

Proceedings of the Text, Speech and Dialogue - 14th International Conference, 2011

2010

Automatic Segmentation of Parasitic Sounds in Speech Corpora for TTS Synthesis.

[BibT_eX]

[DOI]

Proceedings of the Text, Speech and Dialogue, 13th International Conference, 2010

Collection and Analysis of Data for Evaluation of Concatenation Cost Functions.

[BibT_eX]

[DOI]

Proceedings of the Text, Speech and Dialogue, 13th International Conference, 2010

Listening-Test-Based Annotation of Communicative Functions for Expressive Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the Text, Speech and Dialogue, 13th International Conference, 2010

Enhancements of viterbi search for fast unit selection synthesis.

[BibT_eX]

[DOI]

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

2009

Automatic Pitch-Synchronous Phonetic Segmentation with Context-Independent HMMs.

[BibT_eX]

[DOI]

Proceedings of the Text, Speech and Dialogue, 12th International Conference, 2009

Design of the Test Stimuli for the Evaluation of Concatenation Cost Functions.

[BibT_eX]

[DOI]

Proceedings of the Text, Speech and Dialogue, 12th International Conference, 2009

First Experiments on Text-to-Speech System Personification.

[BibT_eX]

[DOI]

Proceedings of the Text, Speech and Dialogue, 12th International Conference, 2009

Identification and automatic detection of parasitic speech sounds.

[BibT_eX]

[DOI]

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

2008

Building of a Speech Corpus Optimised for Unit Selection TTS Synthesis.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Language Resources and Evaluation, 2008

Automatic pitch-synchronous phonetic segmentation.

[BibT_eX]

[DOI]

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

2007

Quality Deterioration Factors in Unit Selection Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the Text, Speech and Dialogue, 10th International Conference, 2007

Recording and Annotation of Speech Corpus for Czech Unit Selection Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the Text, Speech and Dialogue, 10th International Conference, 2007

Pitch Marks at Peaks or Valleys?

[BibT_eX]

[DOI]

Proceedings of the Text, Speech and Dialogue, 10th International Conference, 2007

Voice Conversion Based on Probabilistic Parameter Transformation and Extended Inter-speaker Residual Prediction.

[BibT_eX]

[DOI]

Proceedings of the Text, Speech and Dialogue, 10th International Conference, 2007

Evaluation of various unit types in the unit selection approach for the Czech language using the Festival system.

[BibT_eX]

[DOI]

Proceedings of the Sixth ISCA Workshop on Speech Synthesis, 2007

A robust multi-phase pitch-mark detection algorithm.

[BibT_eX]

[DOI]

Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

F0 transformation within the voice conversion framework.

[BibT_eX]

[DOI]

Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

2006

Design, implementation and evaluation of the Czech realistic audio-visual speech synthesis.

[BibT_eX]

[DOI]

Signal Process., 2006

Diphones vs. Triphones in Czech Unit Selection TTS.

[BibT_eX]

[DOI]

Proceedings of the Text, Speech and Dialogue, 9th International Conference, 2006

Current State of Czech Text-to-Speech System ARTIC.

[BibT_eX]

[DOI]

Proceedings of the Text, Speech and Dialogue, 9th International Conference, 2006

First Steps Towards New Czech Voice Conversion System.

[BibT_eX]

[DOI]

Proceedings of the Text, Speech and Dialogue, 9th International Conference, 2006

Unit selection and its relation to symbolic prosody: a new approach.

[BibT_eX]

[DOI]

Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

On building phonetically and prosodically rich speech corpus for text-to-speech synthesis.

[BibT_eX]

Proceedings of the Second IASTED International Conference on Computational Intelligence, 2006

2005

Formal Prosodic Structures and Their Application in NLP.

[BibT_eX]

[DOI]

Proceedings of the Text, Speech and Dialogue, 8th International Conference, 2005

On Modelling Glottal Stop in Czech Text-to-Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the Text, Speech and Dialogue, 8th International Conference, 2005

Hybrid syllable/triphone speech synthesis.

[BibT_eX]

[DOI]

Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

2004

Advanced Prosody Modelling.

[BibT_eX]

[DOI]

Proceedings of the Text, Speech and Dialogue, 7th International Conference, 2004

Slovak Text-to-Speech Synthesis in ARTIC System.

[BibT_eX]

[DOI]

Proceedings of the Text, Speech and Dialogue, 7th International Conference, 2004

The Design of Czech Language Formal Listening Tests for the Evaluation of TTS Systems.

[BibT_eX]

[DOI]

Proceedings of the Fourth International Conference on Language Resources and Evaluation, 2004

Recent improvements on ARTIC: czech text-to-speech system.

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Spoken Language Processing, 2004

2003

Experiments with Automatic Segmentation for Czech Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the Text, Speech and Dialogue, 6th International Conference, 2003

Sentence boundary detection in Czech TTS system using neural networks.

[BibT_eX]

[DOI]

Proceedings of the Seventh International Symposium on Signal Processing and Its Applications, 2003

Automatic segmentation for czech concatenative speech synthesis using statistical approach with boundary-specific correction.

[BibT_eX]

[DOI]

Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

2002

German and Czech Speech Synthesis Using HMM-Based Speech Segment Database.

[BibT_eX]

[DOI]

Proceedings of the Text, Speech and Dialogue, 5th International Conference, 2002

2001

Large broadcast news and read speech corpora of spoken czech.

[BibT_eX]

[DOI]

Proceedings of the EUROSPEECH 2001 Scandinavia, 2001

Design of speech corpus for text-to-speech synthesis.

[BibT_eX]

[DOI]

Jiri Kruta

Proceedings of the EUROSPEECH 2001 Scandinavia, 2001

2000

Building a New Czech Text-to-Speech System Using Triphone-Based Speech Units.

[BibT_eX]

[DOI]

Proceedings of the Text, Speech and Dialogue - Third International Workshop, 2000

ARTIC: a new Czech text-to-speech system using statistical approach to speech segment database construction.

[BibT_eX]

[DOI]

Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

1999

Statistical Approach to the Automatic Synthesis of Czech Speech.

[BibT_eX]

[DOI]

Zbynek Tychtl

Proceedings of the Text, Speech and Dialogue - Second International Workshop, 1999

Speech synthesis using HMM-based acoustic unit inventory.

[BibT_eX]

[DOI]