Jan Hajic

Orcid: 0000-0002-3503-7730

Affiliations:
  • Charles University, Institute of Formal and Applied Linguistics, Prague, Czech Republic


According to our database1, Jan Hajic authored at least 133 papers between 1982 and 2025.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
An Expanded Massive Multilingual Dataset for High-Performance Language Technologies.
CoRR, March, 2025

Semantic Role Labeling: A Systematical Survey.
CoRR, February, 2025


2024
Textual Coverage of Eventive Entries in Lexical Semantic Resources.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

Building a Broad Infrastructure for Uniform Meaning Representations.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

2023

Consulting the Community: How to Reach Digital Language Equality in Europe by 2030?
Proceedings of the European Language Equality, 2023

Results of the Forward-looking Community-wide Consultation.
Proceedings of the European Language Equality, 2023

Extending an Event-type Ontology: Adding Verbs and Classes Using Fine-tuned LLMs Suggestions.
CoRR, 2023

What's the Meaning of Superhuman Performance in Today's NLU?
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

2022
Making a Semantic Event-type Ontology Multilingual.
Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022

Quality and Efficiency of Manual Annotation: Pre-annotation Bias.
Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022

Overview of the ELE Project.
Proceedings of the 23rd Annual Conference of the European Association for Machine Translation, 2022

2021
Designing a Uniform Meaning Representation for Natural Language Processing.
Künstliche Intell., 2021

SynSemClass for German: Extending a Multilingual Verb Lexicon.
Proceedings of the Conference on Digital Curation Technologies (Qurator 2021), Berlin, Germany, February 8th - to, 2021


The Interaction of Personal Data, Intellectual Property and Freedom of Expression in the Context of Language Research.
Proceedings of the Selected Papers from the CLARIN Annual Conference 2021, 2021

2020
SynSemClass Linked Lexicon: Mapping Synonymy between Languages.
Proceedings of the 2020 Globalex Workshop on Linked Lexicography, 2020



Universal Dependencies v2: An Evergrowing Multilingual Treebank Collection.
Proceedings of The 12th Language Resources and Evaluation Conference, 2020

Prague Dependency Treebank - Consolidated 1.0.
Proceedings of The 12th Language Resources and Evaluation Conference, 2020

CLARIN: Distributed Language Resources and Technology in a European Infrastructure.
Proceedings of the 1st International Workshop on Language Technology Platforms, 2020

FGD at MRP 2020: Prague Tectogrammatical Graphs.
Proceedings of the CoNLL 2020 Shared Task: Cross-Framework Meaning Representation Parsing, 2020

MRP 2020: The Second Shared Task on Cross-Framework and Cross-Lingual Meaning Representation Parsing.
Proceedings of the CoNLL 2020 Shared Task: Cross-Framework Meaning Representation Parsing, 2020

2019
Evaluating Contextualized Embeddings on 54 Languages in POS Tagging, Lemmatization and Dependency Parsing.
CoRR, 2019

UDPipe at SIGMORPHON 2019: Contextualized Embeddings, Regularization with Morphological Categories, Corpora Merging.
CoRR, 2019

Czech Text Processing with Contextual Embeddings: POS Tagging, Lemmatization, Parsing and NER.
Proceedings of the Text, Speech, and Dialogue - 22nd International Conference, 2019

MRP 2019: Cross-Framework Meaning Representation Parsing.
Proceedings of the Shared Task on Cross-Framework Meaning Representation Parsing at the 2019 Conference on Natural Language Learning, 2019

The Impact of Copyright and Personal Data Laws on the Creation and Use of Models for Language Technologies.
Proceedings of the Selected Papers from the CLARIN Annual Conference 2019, Leipzig, Germany, September 30, 2019

Neural Architectures for Nested NER through Linearization.
Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019

2018
Tools for Building an Interlinked Synonym Lexicon Network.
Proceedings of the Eleventh International Conference on Language Resources and Evaluation, 2018

Creating a Verb Synonym Lexicon Based on a Parallel Corpus.
Proceedings of the Eleventh International Conference on Language Resources and Evaluation, 2018

SumeCzech: Large Czech News-Based Summarization Dataset.
Proceedings of the Eleventh International Conference on Language Resources and Evaluation, 2018

Diacritics Restoration Using Neural Networks.
Proceedings of the Eleventh International Conference on Language Resources and Evaluation, 2018

Bridging the LAPPS Grid and CLARIN.
Proceedings of the Eleventh International Conference on Language Resources and Evaluation, 2018

LemmaTag: Jointly Tagging and Lemmatizing for Morphologically Rich Languages with BRNNs.
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31, 2018

CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies.
Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, Brussels, Belgium, October 31, 2018

Synonymy in Bilingual Context: The CzEngClass Lexicon.
Proceedings of the 27th International Conference on Computational Linguistics, 2018

Expletives in Universal Dependency Treebanks.
Proceedings of the Second Workshop on Universal Dependencies, 2018

2017
PDTSC 2.0 - Spoken Corpus with Rich Multi-layer Structural Annotation.
Proceedings of the Text, Speech, and Dialogue - 20th International Conference, 2017

Extracting Verbal Multiword Data from Rich Treebank Annotation.
Proceedings of the 15th International Workshop on Treebanks and Linguistic Theories (TLT15), 2017

Syntactic-Semantic Classes of Context-Sensitive Synonyms Based on a Bilingual Corpus.
Proceedings of the Human Language Technology. Challenges for Computer Science and Linguistics, 2017


2016
Linguistically Annotated Corpus as an Invaluable Resource for Advancements in Linguistic Research: A Case Study.
Prague Bull. Math. Linguistics, 2016

The strategic impact of META-NET on the regional, national and international level.
Lang. Resour. Evaluation, 2016

Neural Networks for Featureless Named Entity Recognition in Czech.
Proceedings of the Text, Speech, and Dialogue - 19th International Conference, 2016

Inherently Pronominal Verbs in Czech: Description and Conversion Based on Treebank Annotation.
Proceedings of the 12th Workshop on Multiword Expressions, 2016

UDPipe: Trainable Pipeline for Processing CoNLL-U Files Performing Tokenization, Morphological Analysis, POS Tagging and Parsing.
Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016, 2016

Fostering the Next Generation of European Language Technology: Recent Developments ― Emerging Initiatives ― Challenges and Opportunities.
Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016, 2016

QTLeap WSD/NED Corpora: Semantic Annotation of Parallel Corpora in Six Languages.
Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016, 2016

Towards Comparability of Linguistic Graph Banks for Semantic Parsing.
Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016, 2016

Universal Dependencies v1: A Multilingual Treebank Collection.
Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016, 2016

European Platform for the Multilingual Digital Single Market: Conceptual Proposal.
Proceedings of the Human Language Technologies - The Baltic Perspective, 2016

TectoMT - a deep linguistic core of the combined Cimera MT system.
Proceedings of the 19th Annual Conference of the European Association for Machine Translation: Projects/Products, 2016

Joint search in a bilingual valency lexicon and an annotated corpus.
Proceedings of the COLING 2016, 2016

2015
SemEval 2015 Task 18: Broad-Coverage Semantic Dependency Parsing.
Proceedings of the 9th International Workshop on Semantic Evaluation, 2015

Using Parallel Texts and Lexicons for Verbal Word Sense Disambiguation.
Proceedings of the Third International Conference on Dependency Linguistics, 2015

Deletions and Node Reconstructions in a Dependency-Based Multilevel Annotation Scheme.
Proceedings of the Computational Linguistics and Intelligent Text Processing, 2015

Bilingual English-Czech Valency Lexicon Linked to a Parallel Corpus.
Proceedings of The 9th Linguistic Annotation Workshop, 2015

2014
HamleDT: Harmonized multi-language dependency treebank.
Lang. Resour. Evaluation, 2014

Adaptation of machine translation for multilingual information retrieval in the medical domain.
Artif. Intell. Medicine, 2014

Machine Translation of Medical Texts in the Khresmoi Project.
Proceedings of the Ninth Workshop on Statistical Machine Translation, 2014

SemEval 2014 Task 8: Broad-Coverage Semantic Dependency Parsing.
Proceedings of the 8th International Workshop on Semantic Evaluation, 2014

Observations and Lessons Learnt from Non Health Professionals Evaluating a Health Search Engine.
Proceedings of the e-Health - For Continuity of Care - Proceedings of MIE2014, the 25th European Medical Informatics Conference, Istanbul, Turkey, August 31, 2014

Not an Interlingua, But Close: Comparison of English AMRs to Chinese and Czech.
Proceedings of the Ninth International Conference on Language Resources and Evaluation, 2014

Multilingual Test Sets for Machine Translation of Search Queries for Cross-Lingual Information Retrieval in the Medical Domain.
Proceedings of the Ninth International Conference on Language Resources and Evaluation, 2014

CLARA: A New Generation of Researchers in Common Language Resources and Their Applications.
Proceedings of the Ninth International Conference on Language Resources and Evaluation, 2014

Comparing Czech and English AMRs.
Proceedings of Workshop on Lexical and Grammatical Resources for Language Processing, 2014

Verbal Valency Frame Detection and Selection in Czech and English.
Proceedings of the Second Workshop on EVENTS: Definition, 2014

Open-Source Tools for Morphology, Lemmatization, POS Tagging and Named Entity Recognition.
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, 2014

2013
Joint Morphological and Syntactic Analysis for Richly Inflected Languages.
Trans. Assoc. Comput. Linguistics, 2013

A New State-of-The-Art Czech Named Entity Recognizer.
Proceedings of the Text, Speech, and Dialogue - 16th International Conference, 2013

An Analysis of Annotation of Verb-Noun Idiomatic Combinations in a Parallel Dependency Corpus.
Proceedings of the 9th Workshop on Multiword Expressions, 2013


2012
HamleDT: To Parse or Not to Parse?
Proceedings of the Eighth International Conference on Language Resources and Evaluation, 2012

Announcing Prague Czech-English Dependency Treebank 2.0.
Proceedings of the Eighth International Conference on Language Resources and Evaluation, 2012

Penalty Functions for Evaluation Measures of Unsegmented Speech Retrieval.
Proceedings of the Information Access Evaluation. Multilinguality, Multimodality, and Visual Analytics, 2012

2011
Frederick Jelinek's Obituary.
Prague Bull. Math. Linguistics, 2011

2010
Treebank Annotation.
Proceedings of the Handbook of Natural Language Processing, Second Edition., 2010

Reliving the History: The Beginnings of Statistical Machine Translation and Languages with Rich Morphology.
Proceedings of the Advances in Natural Language Processing, 2010

Resources for adding semantics to machine translation.
Proceedings of the 2010 International Workshop on Spoken Language Translation, 2010

2009
Tectogrammatical Annotation of the Wall Street Journal.
Prague Bull. Math. Linguistics, 2009

A cost-effective lexical acquisition process for large-scale thesaurus translation.
Lang. Resour. Evaluation, 2009

Semi-Supervised Training for the Averaged Perceptron POS Tagger.
Proceedings of the EACL 2009, 12th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference, Athens, Greece, March 30, 2009

The CoNLL-2009 Shared Task: Syntactic and Semantic Dependencies in Multiple Languages.
Proceedings of the Thirteenth Conference on Computational Natural Language Learning: Shared Task, 2009

2008
The Czech Academic Corpus 2.0 Guide.
Prague Bull. Math. Linguistics, 2008

Phrase-Based and Deep Syntactic English-to-Czech Statistical Machine Translation.
Proceedings of the Third Workshop on Statistical Machine Translation, 2008

PDTSL: An annotated resource for speech reconstruction.
Proceedings of the 2008 IEEE Spoken Language Technology Workshop, 2008

Validating the Quality of Full Morphological Annotation.
Proceedings of the International Conference on Language Resources and Evaluation, 2008

2007
Some of Our Best Friends Are Statisticians.
Proceedings of the Text, Speech and Dialogue, 10th International Conference, 2007

2006
Perspectives of Turning Prague Dependency Treebank into a Knowledge Base.
Proceedings of the Fifth International Conference on Language Resources and Evaluation, 2006

Leveraging Recurrent Phrase Structure in Large-scale Ontology Translation.
Proceedings of the 11th Annual conference of the European Association for Machine Translation, 2006

Leveraging Reusability: Cost-Effective Lexical Acquisition for Large-Scale Ontology Translation.
Proceedings of the ACL 2006, 2006

2005
Cross-language text classification.
Proceedings of the SIGIR 2005: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2005

Non-Projective Dependency Parsing using Spanning Tree Algorithms.
Proceedings of the HLT/EMNLP 2005, 2005

Automatic transcription of Czech, Russian, and Slovak spontaneous speech in the MALACH project.
Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

Prague Czech-English dependency treebank: resource for structure-based MT.
Proceedings of the 10th EAMT Conference: Practical applications of machine translation, 2005

2004
Automatic recognition of spontaneous speech for access to multilingual oral history archives.
IEEE Trans. Speech Audio Process., 2004

Issues in Annotation of the Czech Spontaneous Speech Corpus in the MALACH project.
Proceedings of the Fourth International Conference on Language Resources and Evaluation, 2004

Prague Czech-English Dependency Treebank. Syntactically Annotated Resources for Machine Translation.
Proceedings of the Fourth International Conference on Language Resources and Evaluation, 2004

The development of ASR for Slavic languages in the MALACH project.
Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

Disambiguation of Rich Inflection - Computational Morphology of Czech.
Charles University, ISBN: 978-80-246-0282-0, 2004

2003
Annotation Lexicons: Using the Valency Lexicon for Tectogrammatical Annotation.
Prague Bull. Math. Linguistics, 2003

Towards Automatic Transcription of Spontaneous Czech Speech in the MALACH Project.
Proceedings of the Text, Speech and Dialogue, 6th International Conference, 2003

Building LVCSR System for Transcription of Spontaneously Pronounced Russian Testimonies in the MALACH Project: Initial Steps and First Results.
Proceedings of the Text, Speech and Dialogue, 6th International Conference, 2003

A simple multilingual machine translation system.
Proceedings of Machine Translation Summit IX: Papers, 2003

Large vocabulary ASR for spontaneous czech in the MALACH project.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Combination of a hidden tag model and a traditional n-gram model: a case study in czech speech recognition.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

2002
Testing the Limits - Adding a New Language to an MT System.
Prague Bull. Math. Linguistics, 2002

Automatic Transcription of Czech Language Oral History in the MALACH Project: Resources and Initial Experiments.
Proceedings of the Text, Speech and Dialogue, 5th International Conference, 2002

Cross-Language Access to Recorded Speech in the MALACH Project.
Proceedings of the Text, Speech and Dialogue, 5th International Conference, 2002

Tectogrammatical representation: towards a minimal transfer in machine translation.
Proceedings of the Sixth International Workshop on Tree Adjoining Grammar and Related Frameworks, 2002

2001
The Current Status of the Prague Dependency Treebank.
Proceedings of the Text, Speech and Dialogue, 4th International Conference, 2001

On large vocabulary continuous speech recognition of highly inflectional language - czech.
Proceedings of the EUROSPEECH 2001 Scandinavia, 2001

Serial Combination of Rules and Statistics: A Case Study in Czech Tagging.
Proceedings of the Association for Computational Linguistic, 2001

2000
Morpheme Based Language Models for Speech Recognition of Czech.
Proceedings of the Text, Speech and Dialogue - Third International Workshop, 2000

Morphological Tagging: Data vs. Dictionaries.
Proceedings of the 6th Applied Natural Language Processing Conference, 2000

Machine Translation of Very Close Languages.
Proceedings of the 6th Applied Natural Language Processing Conference, 2000

1999
Word Sense Disambiguation of Czech Texts.
Proceedings of the Text, Speech and Dialogue - Second International Workshop, 1999

Large Vocabulary Speech Recognition for Read and Broadcast Czech.
Proceedings of the Text, Speech and Dialogue - Second International Workshop, 1999

A Statistical Parser for Czech.
Proceedings of the 27th Annual Meeting of the Association for Computational Linguistics, 1999

1998
Czech language processing, POS tagging.
Proceedings of the First International Conference on Language Resources and Evaluation, 1998

Tagging Inflective Languages: Prediction of Morphological Categories for a Rich, Structured Tagset.
Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, 1998

1997
Probabilistic and Rule-Based Tagger of an Inflective Language- a Comparison.
Proceedings of the 5th Applied Natural Language Processing Conference, 1997

1995
Machine Translation in the Czech Republic: history, methods, systems.
Proceedings of Machine Translation Summit V, 1995

1993
But Dictionaries Are Data Too.
Proceedings of the Human Language Technology: Proceedings of a Workshop Held at Plainsboro, 1993

1992
Derivation Of Underlying Valency Frames From A Learner's Dictionary.
Proceedings of the 14th International Conference on Computational Linguistics, 1992

Tagging and Alignment of Parallel Texts: Current Status of BCP.
Proceedings of the 3rd Applied Natural Language Processing Conference, 1992

1990
Spelling-checking for Highly Inflective Languages.
Proceedings of the 13th International Conference on Computational Linguistics, 1990

1988
Formal morphology.
Proceedings of the 12th International Conference on Computational Linguistics, 1988

1987
RUSLAN - An MT System Between Closely Related Languages.
Proceedings of the EACL 1989, 1987

1982
Inferencing And Search For An Answer In TIBAQ.
Proceedings of the 9th International Conference on Computational Linguistics, 1982


  Loading...