Tommi Jauhiainen

Orcid: 0000-0002-6474-3570

Affiliations:
  • University of Helsinki, Finland


According to our database1, Tommi Jauhiainen authored at least 30 papers between 2001 and 2023.

Collaborative distances:
  • Dijkstra number2 of five.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2023
FinnSentiment: a Finnish social media corpus for sentiment polarity annotation.
Lang. Resour. Evaluation, 2023

Language Variety Identification with True Labels.
CoRR, 2023

Findings of the VarDial Evaluation Campaign 2023.
Proceedings of the Tenth Workshop on NLP for Similar Languages, Varieties and Dialects, 2023

Tuning HeLI-OTS for Guarani-Spanish Code Switching Analysis.
Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2023) co-located with the Conference of the Spanish Society for Natural Language Processing (SEPLN 2023), 2023

Automatic Word Segmentation for Egyptian Hieroglyphic Texts.
Proceedings of the Annual International Conference of the Alliance of Digital Humanities Organizations, 2023

2022
Optimizing Naive Bayes for Arabic Dialect Identification.
Proceedings of the The Seventh Arabic Natural Language Processing Workshop, 2022

HeLI-OTS, Off-the-shelf Language Identifier for Text.
Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022

Language Identification as part of the Text Corpus Creation Pipeline at the Language Bank of Finland.
Proceedings of the 6th Digital Humanities in the Nordic and Baltic Countries Conference (DHNB 2022), 2022

The Pipeline for Publishing Resources in the Language Bank of Finland.
Proceedings of the Selected Papers from the CLARIN Annual Conference 2022, 2022

2021
Comparing Approaches to Dravidian Language Identification.
Proceedings of the Eighth Workshop on NLP for Similar Languages, Varieties and Dialects, 2021

Naive Bayes-based Experiments in Romanian Dialect Identification.
Proceedings of the Eighth Workshop on NLP for Similar Languages, Varieties and Dialects, 2021

Findings of the VarDial Evaluation Campaign 2021.
Proceedings of the Eighth Workshop on NLP for Similar Languages, Varieties and Dialects, 2021

2020
Uralic Language Identification (ULI) 2020 shared task dataset and the Wanca 2017 corpus.
CoRR, 2020

Uralic Language Identification (ULI) 2020 shared task dataset and the Wanca 2017 corpora.
Proceedings of the 7th Workshop on NLP for Similar Languages, Varieties and Dialects, 2020

Experiments in Language Variety Geolocation and Dialect Identification.
Proceedings of the 7th Workshop on NLP for Similar Languages, Varieties and Dialects, 2020

A Report on the VarDial Evaluation Campaign 2020.
Proceedings of the 7th Workshop on NLP for Similar Languages, Varieties and Dialects, 2020

Releasing a Toolkit and Comparing the Performance of Language Embeddings Across Various Spoken Language Identification Datasets.
Proceedings of the Interspeech 2020, 2020

Building Web Corpora for Minority Languages.
Proceedings of the 12th Web as Corpus Workshop, 2020

2019
Language model adaptation for language and dialect identification of text.
Nat. Lang. Eng., 2019

Automatic Language Identification in Texts: A Survey.
J. Artif. Intell. Res., 2019

Language and Dialect Identification of Cuneiform Texts.
CoRR, 2019

2018
HeLI-based Experiments in Swiss German Dialect Identification.
Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects, 2018

HeLI-based Experiments in Discriminating Between Dutch and Flemish Subtitles.
Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects, 2018

Iterative Language Model Adaptation for Indo-Aryan Language Identification.
Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects, 2018

2017
Evaluating HeLI with Non-Linear Mappings.
Proceedings of the Fourth Workshop on NLP for Similar Languages, 2017

Evaluation of language identification methods using 285 languages.
Proceedings of the 21st Nordic Conference on Computational Linguistics, 2017

2016
HeLI, a Word-Based Backoff Method for Language Identification.
Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects, 2016

2015
Language Set Identification in Noisy Synthetic Multilingual Documents.
Proceedings of the Computational Linguistics and Intelligent Text Processing, 2015

2002
Adaptive Dialogue Systems - Interaction with Interact.
Proceedings of the SIGDIAL 2002 Workshop, 2002

2001
Using existing written language analyzers in understanding natural spoken Finnish.
Proceedings of the 13th Nordic Conference of Computational Linguistics, 2001


  Loading...