Tanja Samardzic

Ljiljana Dolamic

Fabio Rinaldi

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

DistaLs: a Comprehensive Collection of Language Distance Measures.

[BibT_eX]

[DOI]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

2024

NLP_DI at NADI 2024 shared task: Multi-label Arabic Dialect Classifications with an Unsupervised Cross-Encoder.

[BibT_eX]

[DOI]

Proceedings of The Second Arabic Natural Language Processing Conference, 2024

A Measure for Transparent Comparison of Linguistic Diversity in Multilingual NLP Data Sets.

[BibT_eX]

[DOI]

Steven Moran

Olga Pelloni

Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2024, 2024

2023

Languages Through the Looking Glass of BPE Compression.

[BibT_eX]

[DOI]

Comput. Linguistics, 2023

Optimizing the Size of Subword Vocabularies in Dialect Classification.

[BibT_eX]

[DOI]

Proceedings of the Tenth Workshop on NLP for Similar Languages, Varieties and Dialects, 2023

STT4SG-350: A Speech Corpus for All Swiss German Dialect Regions.

[BibT_eX]

[DOI]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2023

2022

NLP DI at NADI Shared Task Subtask-1: Sub-word Level Convolutional Neural Models and Pre-trained Binary Classifiers for Dialect Identification.

[BibT_eX]

[DOI]

Proceedings of the The Seventh Arabic Natural Language Processing Workshop, 2022

TeDDi Sample: Text Data Diversity Sample for Language Comparison and Multilingual NLP.

[BibT_eX]

[DOI]

Steven Moran

Olga Pelloni

Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022

Subword Evenness (SuE) as a Predictor of Cross-lingual Transfer to Low-resource Languages.

[BibT_eX]

[DOI]

Olga Pelloni

Anastassia Shaitarova

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Early Guessing for Dialect Identification.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

On Language Spaces, Scales and Cross-Lingual Transfer of UD Parsers.

[BibT_eX]

[DOI]

Proceedings of the 26th Conference on Computational Natural Language Learning, 2022

2021

Interpretability for Morphological Inflection: from Character-level Predictions to Subword-level Rules.

[BibT_eX]

[DOI]

Tatyana Ruzsics

Olga Sozinova

Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 2021

From characters to words: the turning point of BPE merges.

[BibT_eX]

[DOI]

Olga Sozinova

Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 2021

2020

ASR for Non-standardised Languages with Dialectal Variation: the case of Swiss German.

[BibT_eX]

[DOI]

Iuliia Nigmatulina

Tannon Kew

Proceedings of the 7th Workshop on NLP for Similar Languages, Varieties and Dialects, 2020

UZH TILT: A Kaldi recipe for Swiss German Speech to Standard German Text.

[BibT_eX]

[DOI]

Proceedings of the 5th Swiss Text Analytics Conference and the 16th Conference on Natural Language Processing, 2020

A Swiss German Dictionary: Variation in Speech and Writing.

[BibT_eX]

[DOI]

Proceedings of The 12th Language Resources and Evaluation Conference, 2020

2019

Neural text normalization with adapted decoding and POS features.

[BibT_eX]

[DOI]

Nat. Lang. Eng., 2019

Digitising Swiss German: how to process and study a polycentric spoken language.

[BibT_eX]

[DOI]

Yves Scherrer

Elvira Glaser

Lang. Resour. Evaluation, 2019

Multilevel Text Normalization with Sequence-to-Sequence Networks and Multisource Learning.

[BibT_eX]

[DOI]

Tatyana Ruzsics

CoRR, 2019

2018

Are prominent mountains frequently mentioned in text? Exploring the spatial expressiveness of text frequency.

[BibT_eX]

[DOI]

Curdin Derungs

Int. J. Geogr. Inf. Sci., 2018

Language Identification and Morphosyntactic Tagging: The Second VarDial Evaluation Campaign.

[BibT_eX]

[DOI]

Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects, 2018

Encoder-Decoder Methods for Text Normalization.

[BibT_eX]

[DOI]

Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects, 2018

Future Actions for Swiss German - Workshop Results at SwissText 2018.

[BibT_eX]

[DOI]

Mark Cieliebak

Jan Milan Deriu

Proceedings of the 3rd Swiss Text Analytics Conference, SwissText 2018, Winterthur, 2018

2017

Variation in Word Frequency Distributions: Definitions, Measures and Implications for a Corpus-Based Language Typology.

[BibT_eX]

[DOI]

Dimitrios Alikaniotis

Paula Buttery

J. Quant. Linguistics, 2017

Neural Sequence-to-sequence Learning of Internal Word Structure.

[BibT_eX]

[DOI]

Tatyana Ruzsics

Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017), 2017

Universal Dependencies for Serbian in Comparison with Croatian and Other Slavic Languages.

[BibT_eX]

[DOI]

Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing, 2017

2016

ArchiMob - A Corpus of Spoken Swiss German.

[BibT_eX]

[DOI]

Yves Scherrer

Elvira Glaser

Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016, 2016

A Framework for Automatic Acquisition of Croatian and Serbian Verb Aspect from Corpora.

[BibT_eX]

[DOI]

Maja Milicevic

Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016, 2016

TweetGeo - A Tool for Collecting, Processing and Analysing Geo-encoded Linguistic Data.

[BibT_eX]

[DOI]

Nikola Ljubesic

Curdin Derungs

Proceedings of the COLING 2016, 2016

A Comparison Between Morphological Complexity Measures: Typological Data vs. Language Corpora.

[BibT_eX]

[DOI]

Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity, 2016

2015

Automatic interlinear glossing as two-level sequence classification.

[BibT_eX]

[DOI]

Robert Schikowski

Sabine Stoll

Proceedings of the 9th SIGHUM Workshop on Language Technology for Cultural Heritage, 2015

Regional Linguistic Data Initiative (ReLDI).

[BibT_eX]

[DOI]

Nikola Ljubesic

Maja Milicevic

Proceedings of the 5th Workshop on Balto-Slavic Natural Language Processing, 2015

2014

Part-of-Speech Tag Disambiguation by Cross-Linguistic Majority Vote.

[BibT_eX]

[DOI]

Noëmi Aepli

Ruprecht von Waldenfels

Proceedings of the First Workshop on Applying NLP Tools to Similar Languages, 2014

2012

Lemmatising Serbian as Category Tagging with Bidirectional Sequence Classification.

[BibT_eX]

[DOI]

Andrea Gesmundo

Proceedings of the Eighth International Conference on Language Resources and Evaluation, 2012

Lemmatisation as a Tagging Task.

[BibT_eX]

[DOI]

Andrea Gesmundo

Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, July 8-14, 2012, Jeju Island, Korea, 2012

2010

Cross-Lingual Validity of PropBank in the Manual Annotation of French.

[BibT_eX]

[DOI]

Lonneke van der Plas