Josef V. Psutka

Marie Kunesová

CoRR, June, 2025

2024

A Comparative Analysis of Bilingual and Trilingual Wav2Vec Models for Automatic Speech Recognition in Multilingual Oral History Archives.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

2023

Transfer Learning of Transformer-Based Speech Recognition Models from Czech to Slovak.

[BibT_eX]

[DOI]

Jan Lehecka

Proceedings of the Text, Speech, and Dialogue - 26th International Conference, 2023

Transformer-based Speech Recognition Models for Oral History Archives in English, German, and Czech.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

2022

Exploring Capabilities of Monolingual Audio Transformers using Large Datasets in Automatic Speech Recognition of Czech.

[BibT_eX]

[DOI]

CoRR, 2022

Transformer-Based Automatic Speech Recognition of Formal and Colloquial Czech in MALACH Project.

[BibT_eX]

[DOI]

Jan Lehecka

Proceedings of the Text, Speech, and Dialogue - 25th International Conference, 2022

2021

CNN-TDNN-Based Architecture for Speech Recognition Using Grapheme Models in Bilingual Czech-Slovak Task.

[BibT_eX]

[DOI]

Jan Svec

Proceedings of the Text, Speech, and Dialogue - 24th International Conference, 2021

Recognition of Heavily Accented and Emotional Speech of English and Czech Holocaust Survivors Using Various DNN Architectures.

[BibT_eX]

[DOI]

Proceedings of the Speech and Computer - 23rd International Conference, 2021

Various DNN-HMM Architectures Used in Acoustic Modeling with Single-Speaker and Single-Channel.

[BibT_eX]

[DOI]

Proceedings of the Statistical Language and Speech Processing, 2021

Spoken Term Detection and Relevance Score Estimation Using Dot-Product of Pronunciation Embeddings.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Live TV Subtitling Through Respeaking.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

2020

Live TV subtitling through respeaking with remote cutting-edge technology.

[BibT_eX]

[DOI]

Multim. Tools Appl., 2020

Complexity of the TDNN Acoustic Model with Respect to the HMM Topology.

[BibT_eX]

[DOI]

Proceedings of the Text, Speech, and Dialogue, 2020

Diarization Based on Identification with X-Vectors.

[BibT_eX]

[DOI]

Zbynek Zajíc

Ludek Müller

Proceedings of the Speech and Computer - 22nd International Conference, 2020

Increasing the Accuracy of the ASR System by Prolonging Voiceless Phonemes in the Speech of Patients Using the Electrolarynx.

[BibT_eX]

[DOI]

Petr Stanislav

Proceedings of the Speech and Computer - 22nd International Conference, 2020

2019

Sample size for maximum-likelihood estimates of Gaussian model depending on dimensionality of pattern space.

[BibT_eX]

[DOI]

Pattern Recognit., 2019

Diarization of the Language Consulting Center Telephone Calls.

[BibT_eX]

[DOI]

Proceedings of the Speech and Computer - 21st International Conference, 2019

2018

First Insight into the Processing of the Language Consulting Center Data.

[BibT_eX]

[DOI]

Proceedings of the Speech and Computer - 20th International Conference, 2018

Towards Processing of the Oral History Interviews and Related Printed Documents.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Language Resources and Evaluation, 2018

On the Use of Grapheme Models for Searching in Large Spoken Archives.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017

Recognition of the Electrolaryngeal Speech: Comparison Between Human and Machine.

[BibT_eX]

[DOI]

Petr Stanislav

Proceedings of the Text, Speech, and Dialogue - 20th International Conference, 2017

An Analysis of the RNN-Based Spoken Term Detection Training.

[BibT_eX]

[DOI]

Jan Svec

Lubos Smídl

Proceedings of the Speech and Computer - 19th International Conference, 2017

A Relevance Score Estimation for Spoken Term Detection Based on RNN-Generated Pronunciation Embeddings.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

2015

Sample Size for Maximum Likelihood Estimates of Gaussian Model.

[BibT_eX]

[DOI]

Proceedings of the Computer Analysis of Images and Patterns, 2015

Gaussian Mixture Model Selection Using Multiple Random Subsampling with Initialization.

[BibT_eX]

[DOI]

Proceedings of the Computer Analysis of Images and Patterns, 2015

2014

Captioning of Live TV Commentaries from the Olympic Games in Sochi: Some Interesting Insights.

[BibT_eX]

[DOI]

Proceedings of the Text, Speech and Dialogue - 17th International Conference, 2014

2013

Online Speaker Adaptation of an Acoustic Model Using Face Recognition.

[BibT_eX]

[DOI]

Proceedings of the Text, Speech, and Dialogue - 16th International Conference, 2013

Covariance Matrix Enhancement Approach to Train Robust Gaussian Mixture Models of Speech Data.

[BibT_eX]

[DOI]

Proceedings of the Speech and Computer - 15th International Conference, 2013

Towards Live Subtitling of TV Ice-hockey Commentary.

[BibT_eX]

[DOI]

Proceedings of the SIGMAP and WINSYS 2013, 2013

2012

Optimized Acoustic Likelihoods Computation for NVIDIA and ATI/AMD Graphics Processors.

[BibT_eX]

[DOI]

IEEE Trans. Speech Audio Process., 2012

Captioning of Live TV Programs through Speech Recognition and Re-speaking.

[BibT_eX]

[DOI]

Proceedings of the Text, Speech and Dialogue - 15th International Conference, 2012

Influence of Different Phoneme Mappings on the Recognition Accuracy of Electrolaryngeal Speech.

[BibT_eX]

Petr Stanislav

Proceedings of the SIGMAP and WINSYS 2012, 2012

Full covariance Gaussian mixture models evaluation on GPU.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Signal Processing and Information Technology, 2012

Novel Approach to Live Captioning Through Re-speaking: Tailoring Speech Recognition to Re-speaker's Needs.

[BibT_eX]

[DOI]

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

2011

System for fast lexical and phonetic spoken term detection in a Czech cultural heritage archive.

[BibT_eX]

[DOI]

EURASIP J. Audio Speech Music. Process., 2011

Speaker-Clustered Acoustic Models Evaluated on GPU for On-line Subtitling of Parliament Meetings.

[BibT_eX]

[DOI]

Proceedings of the Text, Speech and Dialogue - 14th International Conference, 2011

Optimization of the Gaussian Mixture Model Evaluation on GPU.

[BibT_eX]

[DOI]

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

2010

Gender-Dependent Acoustic Models Fusion Developed for Automatic Subtitling of Parliament Meetings Broadcasted by the Czech TV.

[BibT_eX]

[DOI]

Proceedings of the Text, Speech and Dialogue, 13th International Conference, 2010

Fast Phonetic/Lexical Searching in the Archives of the Czech Holocaust Testimonies: Advancing Towards the MALACH Project Visions.

[BibT_eX]

[DOI]

Proceedings of the Text, Speech and Dialogue, 13th International Conference, 2010

2009

Using Morphological Information for Robust Language Modeling in Czech ASR System.

[BibT_eX]

[DOI]

Pavel Ircing

IEEE Trans. Speech Audio Process., 2009

Discriminative Training of Gender-Dependent Acoustic Models.

[BibT_eX]

[DOI]

Proceedings of the Text, Speech and Dialogue, 12th International Conference, 2009

Training of Speaker-clustered Acoustic Models for use in Real-time Recognizers.

[BibT_eX]

Proceedings of the SIGMAP 2009, 2009

Fast Speaker Adaptation in Automatic Online Subtitling.

[BibT_eX]

Proceedings of the SIGMAP 2009, 2009

2007

Benefit of Maximum Likelihood Linear Transform (MLLT) Used at Different Levels of Covariance Matrices Clustering in ASR Systems.

[BibT_eX]

[DOI]

Proceedings of the Text, Speech and Dialogue, 10th International Conference, 2007

Feature space reduction and decorrelation in a large number of speech recognition experiments.

[BibT_eX]

Proceedings of the Signal and Image Processing (SIP 2007), 2007

Searching for a Robust MFCC-Based Parameterization for ASR Application.

[BibT_eX]

Lubos Smídl

Proceedings of the SIGMAP 2007, 2007

Live TV Subtitling - Fast 2-pass LVCSR System for Online Subtitling.

[BibT_eX]

Proceedings of the SIGMAP 2007, 2007

What Can and Cannot Be Found in Czech Spontaneous Speech Using Document-Oriented IR Methods - UWB at CLEF 2007 CL-SR Track.

[BibT_eX]

[DOI]

Pavel Ircing

Jan Vavruska

Proceedings of the Advances in Multilingual and Multimodal Information Retrieval, 2007

2006

Automatic Online Subtitling of the Czech Parliament Meetings.

[BibT_eX]

[DOI]

Proceedings of the Text, Speech and Dialogue, 9th International Conference, 2006

Benefit of a Class-based Language Model for Real-time Closed-captioning of TV Ice-hockey Commentaries.

[BibT_eX]

[DOI]

Proceedings of the Fifth International Conference on Language Resources and Evaluation, 2006

Comparison of keyword spotting methods for searching in speech.

[BibT_eX]

[DOI]

Lubos Smídl

Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

Adaptive language model in automatic online subtitling.

[BibT_eX]

Proceedings of the Second IASTED International Conference on Computational Intelligence, 2006

2005

Automatic transcription of Czech, Russian, and Slovak spontaneous speech in the MALACH project.

[BibT_eX]

[DOI]

Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

2004

Issues in Annotation of the Czech Spontaneous Speech Corpus in the MALACH project.

[BibT_eX]

[DOI]

Proceedings of the Fourth International Conference on Language Resources and Evaluation, 2004

2003

Towards Automatic Transcription of Spontaneous Czech Speech in the MALACH Project.

[BibT_eX]

[DOI]

Proceedings of the Text, Speech and Dialogue, 6th International Conference, 2003

Building LVCSR System for Transcription of Spontaneously Pronounced Russian Testimonies in the MALACH Project: Initial Steps and First Results.

[BibT_eX]

[DOI]

Proceedings of the Text, Speech and Dialogue, 6th International Conference, 2003

Large vocabulary ASR for spontaneous czech in the MALACH project.

[BibT_eX]

[DOI]

Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

2002

Automatic Transcription of Czech Language Oral History in the MALACH Project: Resources and Initial Experiments.

[BibT_eX]

[DOI]

Proceedings of the Text, Speech and Dialogue, 5th International Conference, 2002

2001

The Influence of a Filter Shape in Telephone-Based Recognition Module Using PLP Parameterization.

[BibT_eX]

[DOI]

Ludek Müller

Proceedings of the Text, Speech and Dialogue, 4th International Conference, 2001

Comparison of MFCC and PLP parameterizations in the speaker independent continuous speech recognition task.

[BibT_eX]

[DOI]

Ludek Müller