Tanja Samardzic

Orcid: 0000-0001-6451-3946

According to our database1, Tanja Samardzic authored at least 34 papers between 2010 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
A Measure for Transparent Comparison of Linguistic Diversity in Multilingual NLP Data Sets.
CoRR, 2024

2023
Languages Through the Looking Glass of BPE Compression.
Comput. Linguistics, 2023

Optimizing the Size of Subword Vocabularies in Dialect Classification.
Proceedings of the Tenth Workshop on NLP for Similar Languages, Varieties and Dialects, 2023

STT4SG-350: A Speech Corpus for All Swiss German Dialect Regions.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2023

2022
NLP DI at NADI Shared Task Subtask-1: Sub-word Level Convolutional Neural Models and Pre-trained Binary Classifiers for Dialect Identification.
Proceedings of the The Seventh Arabic Natural Language Processing Workshop, 2022

TeDDi Sample: Text Data Diversity Sample for Language Comparison and Multilingual NLP.
Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022

Subword Evenness (SuE) as a Predictor of Cross-lingual Transfer to Low-resource Languages.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Early Guessing for Dialect Identification.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

On Language Spaces, Scales and Cross-Lingual Transfer of UD Parsers.
Proceedings of the 26th Conference on Computational Natural Language Learning, 2022

2021
Interpretability for Morphological Inflection: from Character-level Predictions to Subword-level Rules.
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 2021

From characters to words: the turning point of BPE merges.
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 2021

2020
ASR for Non-standardised Languages with Dialectal Variation: the case of Swiss German.
Proceedings of the 7th Workshop on NLP for Similar Languages, Varieties and Dialects, 2020

UZH TILT: A Kaldi recipe for Swiss German Speech to Standard German Text.
Proceedings of the 5th Swiss Text Analytics Conference and the 16th Conference on Natural Language Processing, 2020

A Swiss German Dictionary: Variation in Speech and Writing.
Proceedings of The 12th Language Resources and Evaluation Conference, 2020

2019
Neural text normalization with adapted decoding and POS features.
Nat. Lang. Eng., 2019

Digitising Swiss German: how to process and study a polycentric spoken language.
Lang. Resour. Evaluation, 2019

Multilevel Text Normalization with Sequence-to-Sequence Networks and Multisource Learning.
CoRR, 2019

2018
Are prominent mountains frequently mentioned in text? Exploring the spatial expressiveness of text frequency.
Int. J. Geogr. Inf. Sci., 2018

Language Identification and Morphosyntactic Tagging: The Second VarDial Evaluation Campaign.
Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects, 2018

Encoder-Decoder Methods for Text Normalization.
Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects, 2018

Future Actions for Swiss German - Workshop Results at SwissText 2018.
Proceedings of the 3rd Swiss Text Analytics Conference, SwissText 2018, Winterthur, 2018

2017
Variation in Word Frequency Distributions: Definitions, Measures and Implications for a Corpus-Based Language Typology.
J. Quant. Linguistics, 2017

Neural Sequence-to-sequence Learning of Internal Word Structure.
Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017), 2017

Universal Dependencies for Serbian in Comparison with Croatian and Other Slavic Languages.
Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing, 2017

2016
ArchiMob - A Corpus of Spoken Swiss German.
Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016, 2016

A Framework for Automatic Acquisition of Croatian and Serbian Verb Aspect from Corpora.
Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016, 2016

TweetGeo - A Tool for Collecting, Processing and Analysing Geo-encoded Linguistic Data.
Proceedings of the COLING 2016, 2016

A Comparison Between Morphological Complexity Measures: Typological Data vs. Language Corpora.
Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity, 2016

2015
Automatic interlinear glossing as two-level sequence classification.
Proceedings of the 9th SIGHUM Workshop on Language Technology for Cultural Heritage, 2015

Regional Linguistic Data Initiative (ReLDI).
Proceedings of the 5th Workshop on Balto-Slavic Natural Language Processing, 2015

2014
Part-of-Speech Tag Disambiguation by Cross-Linguistic Majority Vote.
Proceedings of the First Workshop on Applying NLP Tools to Similar Languages, 2014

2012
Lemmatising Serbian as Category Tagging with Bidirectional Sequence Classification.
Proceedings of the Eighth International Conference on Language Resources and Evaluation, 2012

Lemmatisation as a Tagging Task.
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, July 8-14, 2012, Jeju Island, Korea, 2012

2010
Cross-Lingual Validity of PropBank in the Manual Annotation of French.
Proceedings of the Fourth Linguistic Annotation Workshop, 2010


  Loading...