Jindrich Zdánský

Speech Commun., 2025

Combining multilingual resources to enhance end-to-end speech recognition systems for Scandinavian languages.

[BibT_eX]

[DOI]

Speech Commun., 2025

2024

A lightweight approach to real-time speaker diarization: from audio toward audio-visual data streams.

[BibT_eX]

[DOI]

EURASIP J. Audio Speech Music. Process., December, 2024

2023

Developing State-of-the-Art End-to-End ASR for Norwegian.

[BibT_eX]

[DOI]

Proceedings of the Text, Speech, and Dialogue - 26th International Conference, 2023

Online Speaker Diarization Using Optimized SE-ResNet Architecture.

[BibT_eX]

[DOI]

Proceedings of the Text, Speech, and Dialogue - 26th International Conference, 2023

Online Punctuation Restoration using ELECTRA Model for streaming ASR Systems.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Combining Multilingual Resources and Models to Develop State-of-the-Art E2E ASR for Swedish.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

2022

Target Speech Extraction: Independent Vector Extraction Guided by Supervised Speaker Identification.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2022

Lexicon-based vs. Lexicon-free ASR for Norwegian Parliament Speech Transcription.

[BibT_eX]

[DOI]

Proceedings of the Text, Speech, and Dialogue - 25th International Conference, 2022

Overlapped Speech Detection in Broadcast Streams Using X-vectors.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

2021

Identification of related languages from spoken data: Moving from off-line to on-line scenario.

[BibT_eX]

[DOI]

Comput. Speech Lang., 2021

Blind Extraction of Target Speech Source Guided by Supervised Speaker Identification via X-vectors.

[BibT_eX]

[DOI]

CoRR, 2021

Identification of Scandinavian Languages from Speech Using Bottleneck Features and X-Vectors.

[BibT_eX]

[DOI]

Proceedings of the Text, Speech, and Dialogue - 24th International Conference, 2021

Using X-Vectors for Speech Activity Detection in Broadcast Streams.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Blind Extraction of Moving Audio Source in a Challenging Environment Supported by Speaker Identification Via X-Vectors.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

2020

Very Fast Keyword Spotting System with Real Time Factor Below 0.01.

[BibT_eX]

[DOI]

Proceedings of the Text, Speech, and Dialogue, 2020

Voice-Activity and Overlapped Speech Detection Using x-Vectors.

[BibT_eX]

[DOI]

Proceedings of the Text, Speech, and Dialogue, 2020

Adaptive Blind Audio Source Extraction Supervised By Dominant Speaker Identification Using X-Vectors.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Optical Character Recognition for Audio-Visual Broadcast Transcription System.

[BibT_eX]

[DOI]

Proceedings of the 11th IEEE International Conference on Cognitive Infocommunications, 2020

2019

On Practical Aspects of Multi-condition Training Based on Augmentation for Reverberation-/Noise-Robust Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the Text, Speech, and Dialogue - 22nd International Conference, 2019

An Approach to Online Speaker Change Point Detection Using DNNs and WFSTs.

[BibT_eX]

[DOI]

Lukás Mateju

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

2018

Robust Recognition of Conversational Telephone Speech via Multi-condition Training and Data Augmentation.

[BibT_eX]

[DOI]

Proceedings of the Text, Speech, and Dialogue - 21st International Conference, 2018

Using Deep Neural Networks for Identification of Slavic Languages from Acoustic Signal.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Robust Recognition of Speech with Background Music in Acoustically Under-Resourced Scenarios.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017

Speech Activity Detection in online broadcast transcription using Deep Neural Networks and Weighted Finite State Transducers.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Robust Automatic Recognition of Speech with background music.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

2016

Study on the Use of Deep Neural Networks for Speech Activity Detection in Broadcast Recordings.

[BibT_eX]

[DOI]

Lukás Mateju

Proceedings of the 13th International Joint Conference on e-Business and Telecommunications (ICETE 2016), 2016

Investigation into the Use of WFSTs and DNNs for Speech Activity Detection in Broadcast Data Transcription.

[BibT_eX]

[DOI]

Lukás Mateju

Proceedings of the E-Business and Telecommunications - 13th International Joint Conference, 2016

2015

Compensation of nonlinear distortions in speech for automatic recognition.

[BibT_eX]

[DOI]

Proceedings of the 38th International Conference on Telecommunications and Signal Processing, 2015

2014

Speech-to-text technology to transcribe and disclose 100, 000+ hours of bilingual documents from historical Czech and Czechoslovak radio archive.

[BibT_eX]

[DOI]

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

2013

Speaker-adaptive speech recognition using speaker diarization for improved transcription of large spoken archives.

[BibT_eX]

[DOI]

Speech Commun., 2013

2012

Making Czech Historical Radio Archive Accessible and Searchable for Wide Public.

[BibT_eX]

[DOI]

J. Multim., 2012

Incorporation of the ASR output in speaker segmentation and clustering within the task of speaker diarization of broadcast streams.

[BibT_eX]

[DOI]

Proceedings of the 14th IEEE International Workshop on Multimedia Signal Processing, 2012

Large-scale processing, indexing and search system for Czech audio-visual cultural heritage archives.

[BibT_eX]

[DOI]

Proceedings of the 14th IEEE International Workshop on Multimedia Signal Processing, 2012

Browsing, indexing and automatic transcription of lectures for distance learning.

[BibT_eX]

[DOI]

Proceedings of the 14th IEEE International Workshop on Multimedia Signal Processing, 2012

Study on Integration of Speaker Diarization with Speaker Adaptive Speech Recognition for Broadcast Transcription.

[BibT_eX]

[DOI]

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Real-Time Lecture Transcription using ASR for Czech Hearing Impaired or Deaf Students.

[BibT_eX]

[DOI]

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

2011

Voice Technology to Enable Sophisticated Access to Historical Audio Archive of the Czech Radio.

[BibT_eX]

[DOI]

Proceedings of the Multimedia for Cultural Heritage - First International Workshop, 2011

PLDA-Based Clustering for Speaker Diarization of Broadcast Streams.

[BibT_eX]

[DOI]

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

2009

Very large vocabulary voice dictation for mobile devices.

[BibT_eX]

[DOI]

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Challenges in Speech Processing of Slavic Languages (Case Studies in Speech Recognition of Czech and Slovak).

[BibT_eX]

[DOI]

Proceedings of the Development of Multimodal Interfaces: Active Listening and Synchrony, 2009

2008

Study on Speaker Adaptation Methods in the Broadcast News Transcription Task.

[BibT_eX]

[DOI]

Proceedings of the Text, Speech and Dialogue, 11th International Conference, 2008

Joint audio-visual processing, representation and indexing of TV news programmes.

[BibT_eX]

[DOI]

Josef Chaloupka

Proceedings of the International Workshop on Multimedia Signal Processing, 2008

Czech-to-slovak adapted broadcast news transcription system.

[BibT_eX]

[DOI]

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Enhancement of noisy speech recordings via blind source separation.

[BibT_eX]

[DOI]

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

MLLR Transforms Based Speaker Recognition in Broadcast Streams.

[BibT_eX]

[DOI]

Jan Silovský

Proceedings of the Cross-Modal Analysis of Speech, Gestures, Gaze and Facial Expressions, 2008

Voice Technology Applied for Building a Prototype Smart Room.

[BibT_eX]

[DOI]

Proceedings of the Multimodal Signals: Cognitive and Algorithmic Issues, 2008

Audio-visual voice command recognition in noisy conditions.

[BibT_eX]

[DOI]

Josef Chaloupka

Proceedings of the International Conference on Auditory-Visual Speech Processing 2008, 2008

2006

A System for Information Retrieval from Large Records of Czech Spoken Data.

[BibT_eX]

[DOI]

Proceedings of the Text, Speech and Dialogue, 9th International Conference, 2006

BINSEG: an efficient speaker-based segmentation technique.

[BibT_eX]

[DOI]

Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

Continual on-line monitoring of Czech spoken broadcast programs.

[BibT_eX]

[DOI]

Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

2005

The COST278 broadcast news segmentation and speaker clustering evaluation - overview, methodology, systems, results.

[BibT_eX]

[DOI]

Laura Docío Fernández

Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

Detection of acoustic change-points in audio records via global BIC maximization and dynamic programming.

[BibT_eX]

[DOI]

Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

Fully automated system for Czech spoken broadcast transcription with very large (300k+) lexicon.

[BibT_eX]

[DOI]

Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

2004

Fully Automated Approach to Broadcast News Transcription in Czech Language.

[BibT_eX]

[DOI]

Petr David

Proceedings of the Text, Speech and Dialogue, 7th International Conference, 2004

An improved preprocessor for the automatic transcription of broadcast news audio stream.

[BibT_eX]

[DOI]

Petr David