Tomaz Erjavec

Orcid: 0000-0002-1560-4099

According to our database1, Tomaz Erjavec authored at least 88 papers between 1990 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024

2023
The ParlaMint corpora of parliamentary proceedings.
Lang. Resour. Evaluation, March, 2023

2022
Dealing with Abbreviations in the Slovenian Biographical Lexicon.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

TEI and Git in ParlaMint: Collaborative Development of Language Resources.
Proceedings of the Selected Papers from the CLARIN Annual Conference 2022, 2022

2021
The KAS corpus of Slovenian academic writing.
Lang. Resour. Evaluation, 2021

EveOut: an event-centric news dataset to analyze an outlet's event selection patterns.
Informatica (Slovenia), 2021

2020
The Janes project: language resources and tools for Slovene user generated content.
Lang. Resour. Evaluation, 2020

MULTEXT-East.
CoRR, 2020

Gigafida 2.0: The Reference Corpus of Written Standard Slovene.
Proceedings of The 12th Language Resources and Evaluation Conference, 2020

2019
How to tag non-standard language: Normalisation versus domain adaptation for Slovene historical and user-generated texts.
Nat. Lang. Eng., 2019

KAS-term: Extracting Slovene Terms from Doctoral Theses via Supervised Machine Learning.
Proceedings of the Text, Speech, and Dialogue - 22nd International Conference, 2019

The FRENK Datasets of Socially Unacceptable Discourse in Slovene and English.
Proceedings of the Text, Speech, and Dialogue - 22nd International Conference, 2019

2018
CLARIN's Key Resource Families.
Proceedings of the Eleventh International Conference on Language Resources and Evaluation, 2018

Datasets of Slovene and Croatian Moderated News Comments.
Proceedings of the 2nd Workshop on Abusive Language Online, 2018

2017
Slovenian Biography.
Proceedings of the Second Conference on Biographical Data in a Digital World 2017, 2017

Language-independent Gender Prediction on Twitter.
Proceedings of the Second Workshop on NLP and Computational Social Science, 2017

Adapting a State-of-the-Art Tagger for South Slavic Languages to Non-Standard Text.
Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing, 2017

The Universal Dependencies Treebank for Slovenian.
Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing, 2017

Legal Framework, Dataset and Annotation Schema for Socially Unacceptable Online Discourse Practices in Slovene.
Proceedings of the First Workshop on Abusive Language Online, 2017

2016
TextFlows: A visual programming platform for text mining and natural language processing.
Sci. Comput. Program., 2016

Modernising historical Slovene words.
Nat. Lang. Eng., 2016

Overview of Annotation Creation: Processes & Tools.
CoRR, 2016

Gold-Standard Datasets for Annotation of Slovene Computer-Mediated Communication.
Proceedings of the 10th Workshop on Recent Advances in Slavonic Natural Languages Processing, 2016

Corpus-Based Diacritic Restoration for South Slavic Languages.
Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016, 2016

Corpus vs. Lexicon Supervision in Morphosyntactic Tagging: the Case of Slovene.
Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016, 2016

Normalising Slovene data: historical texts vs. user-generated content.
Proceedings of the 13th Conference on Natural Language Processing, 2016

Closing a Gap in the Language Resources Landscape: Groundwork and Best Practices from Projects on Computer-mediated Communication in four European Countries.
Proceedings of the Selected papers from the CLARIN Annual Conference 2016, 2016

2015
The IMP historical Slovene language resources.
Lang. Resour. Evaluation, 2015

The slWaC Corpus of the SloveneWeb.
Informatica (Slovenia), 2015

Predicting the Level of Text Standardness in User-generated Content.
Proceedings of the Recent Advances in Natural Language Processing, 2015

2014
TweetCaT: a tool for building Twitter corpora of smaller languages.
Proceedings of the Ninth International Conference on Language Resources and Evaluation, 2014

sloWCrowd: A crowdsourcing tool for lexicographic tasks.
Proceedings of the Ninth International Conference on Language Resources and Evaluation, 2014

Standardizing Tweets with Character-Level Machine Translation.
Proceedings of the Computational Linguistics and Intelligent Text Processing, 2014

2013
Modernizing historical Slovene words with character-based SMT.
Proceedings of the 4th Biennial International Workshop on Balto-Slavic Natural Language Processing, 2013

2012
MULTEXT-East: morphosyntactic resources for Central and Eastern European languages.
Lang. Resour. Evaluation, 2012

NLP Web Services for Slovene and English: Morphosyntactic Tagging, Lemmatisation and Definition Extraction.
Informatica (Slovenia), 2012

The goo300k corpus of historical Slovene.
Proceedings of the Eighth International Conference on Language Resources and Evaluation, 2012

Lexicon Construction and Corpus Annotation of Historical Language with the CoBaLT Editor.
Proceedings of the 6th Workshop on Language Technology for Cultural Heritage, 2012

2011
hrWaC and slWac: Compiling Web Corpora for Croatian and Slovene.
Proceedings of the Text, Speech and Dialogue - 14th International Conference, 2011

Automatic linguistic annotation of historical language: ToTrTaLe and XIX century Slovene.
Proceedings of the 5th ACL Workshop on Language Technology for Cultural Heritage, 2011

OWL/DL formalization of the MULTEXT-East morphosyntactic specifications.
Proceedings of the Fifth Linguistic Annotation Workshop, 2011

2010
LemmaGen: Multilingual Lemmatisation with Induced Ripple-Down Rules.
J. Univers. Comput. Sci., 2010

Experimental Deployment of a Grid Virtual Organization for Human Language Technologies.
Proceedings of the International Conference on Language Resources and Evaluation, 2010

The JOS Linguistically Tagged Corpus of Slovene.
Proceedings of the International Conference on Language Resources and Evaluation, 2010

MULTEXT-East Version 4: Multilingual Morphosyntactic Specifications, Lexicons and Corpora.
Proceedings of the International Conference on Language Resources and Evaluation, 2010

2009
A Common XML-based Framework for Syntactic Annotations
CoRR, 2009

2008
Improving Morphosyntactic Tagging of Slovene Language through Meta-tagging.
Informatica (Slovenia), 2008

Ripple Down Rule learning for automated word lemmatisation.
AI Commun., 2008

Designing and Evaluating a Russian Tagset.
Proceedings of the International Conference on Language Resources and Evaluation, 2008

The JOS Morphosyntactically Tagged Corpus of Slovene.
Proceedings of the International Conference on Language Resources and Evaluation, 2008

2007
Quantifying the MULTEXT-East morphosyntactic resources.
Proceedings of the Exact Methods in the Study of Language and Text, 2007

2006
Morphosyntactic Tagging of Slovene Legal Language.
Informatica (Slovenia), 2006

A tool set for the quick and efficient exploration of large document collections
CoRR, 2006

The JRC-Acquis: A Multilingual Aligned Parallel Corpus with 20+ Languages.
Proceedings of the Fifth International Conference on Language Resources and Evaluation, 2006

Building Slovene WordNet.
Proceedings of the Fifth International Conference on Language Resources and Evaluation, 2006

The English-Slovene ACQUIS corpus.
Proceedings of the Fifth International Conference on Language Resources and Evaluation, 2006

Towards a Slovene Dependency Treebank.
Proceedings of the Fifth International Conference on Language Resources and Evaluation, 2006

TEI and Microsoft: a marriage made in....
Proceedings of the Digital Historical Corpora - Architecture, Annotation, and Retrieval, 03.12., 2006

2005
The VoiceTRAN Speech-to-Speech Communicator.
Proceedings of the Text, Speech and Dialogue, 8th International Conference, 2005

Digital Critical Editions of Slovenian Literature: an Application of Collaborative Work Using Open Standards.
Proceedings of the From Author to Reader: Challenges for the Digital Content Chain: Proceedings of the 9th ICCC International Conference on Electronic Publishing held at Katholieke Universiteit Leuven, 2005

Initial considerations in building a speech-to-speech translation system for the Slovenian-English language pair.
Proceedings of the 10th EAMT Conference: Practical applications of machine translation, 2005

2004
Morpho-Syntactic Descriptions in MULTEXT-East - the Case of Serbian.
Informatica (Slovenia), 2004

Machine Learning of Morphosyntactic Structure: Lemmatizing Unknown Slovene Words.
Appl. Artif. Intell., 2004

Towards an International Standard on Feature Structure Representation.
Proceedings of the Fourth International Conference on Language Resources and Evaluation, 2004

Making an XML-based Japanese-Slovene Learners' Dictionary.
Proceedings of the Fourth International Conference on Language Resources and Evaluation, 2004

MULTEXT-East Version 3: Multilingual Morphosyntactic Specifications, Lexicons and Corpora.
Proceedings of the Fourth International Conference on Language Resources and Evaluation, 2004

Migrating Language Resources from SGML to XML: The Text Encoding Initiative Recommendations.
Proceedings of the Fourth International Conference on Language Resources and Evaluation, 2004

2003
Encoding Biomedical Resources in TEI: The Case of the GENIA Corpus.
Proceedings of the Workshop on Natural Language Processing in Biomedicine, 2003

Stretching TEI: Converting the Genia Corpus.
Proceedings of 4th International Workshop on Linguistically Interpreted Corpora, 2003

2002
Compiling and Using the IJS-ELAN Parallel Corpus.
Informatica (Slovenia), 2002

Sense Discrimination with Parallel Corpora.
Proceedings of the ACL Workshop on Word Sense Disambiguation: Recent Successes and Future Directions, 2002

2001
Automatic Sense Tagging Using Parallel Corpora.
Proceedings of the Sixth Natural Language Processing Pacific Rim Symposium, 2001

Harmonised Morphosyntactic Tagging for Seven Languages and Orwell's 1984.
Proceedings of the Sixth Natural Language Processing Pacific Rim Symposium, 2001

2000
Rules for Automatic Grapheme-to-Allophone Transcription in Slovene.
Proceedings of the Text, Speech and Dialogue - Third International Workshop, 2000

Corpora of Slovene Spoken Language for Multi-lingual Applications.
Proceedings of the Second International Conference on Language Resources and Evaluation, 2000

The Concede Model for Lexical Databases.
Proceedings of the Second International Conference on Language Resources and Evaluation, 2000

Morphosyntactic Tagging of Slovene: Evaluating Taggers and Tagsets.
Proceedings of the Second International Conference on Language Resources and Evaluation, 2000

1999
The ELAN Slovene-English aligned corpus.
Proceedings of Machine Translation Summit VII, 1999

Learning to Lemmatise Slovene Words.
Proceedings of the Learning Language in Logic, 1999

Learning Word Segmentation Rules for Tag Prediction.
Proceedings of the Inductive Logic Programming, 9th International Workshop, 1999

Morphosyntactic Tagging of Slovene Using Progol.
Proceedings of the Inductive Logic Programming, 9th International Workshop, 1999

1998
Standardised specifications, development and assessment of large morpho-lexical resources for six central and eastern european languages.
Proceedings of the First International Conference on Language Resources and Evaluation, 1998

East meets West: multilingual resources in a European context.
Proceedings of the First International Conference on Language Resources and Evaluation, 1998

The MULTEXT East corpus.
Proceedings of the First International Conference on Language Resources and Evaluation, 1998

Learning Multilingual Morphology with CLOG.
Proceedings of the Inductive Logic Programming, 8th International Workshop, 1998

Multext-East: Parallel and Comparable Corpora and Lexicons for Six Central and Eastern European Languages.
Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, 1998

1997
Induction of Slovene Nominal Paradigms.
Proceedings of the Inductive Logic Programming, 7th International Workshop, 1997

1990
An Integrated System For Morphological Analysis Of The Slovene Language.
Proceedings of the 13th International Conference on Computational Linguistics, 1990


  Loading...