Peter Rupnik

Orcid: 0009-0000-9700-3686

According to our database1, Peter Rupnik authored at least 21 papers between 2021 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
Supercharging Agenda Setting Research: The ParlaCAP Dataset of 28 European Parliaments and a Scalable Multilingual LLM-Based Classification.
CoRR, February, 2026

Mići Princ - A Little Boy Teaching Speech Technologies the Chakavian Dialect.
CoRR, February, 2026

The Growing Gains and Pains of Iterative Web Corpora Crawling: Insights from South Slavic CLASSLA-web 2.0 Corpora.
CoRR, January, 2026

Regional Variation in the Performance of ASR Models on Croatian and Serbian.
Proceedings of the 13th Workshop on NLP for Similar Languages, Varieties and Dialects, 2026

2025
State of the Art in Text Classification for South Slavic Languages: Fine-Tuning or Prompting?
CoRR, November, 2025

ParlaSpeech 3.0: Richly Annotated Spoken Parliamentary Corpora of Croatian, Czech, Polish, and Serbian.
CoRR, November, 2025

ParlaMint II: advancing comparable parliamentary corpora across Europe.
Lang. Resour. Evaluation, September, 2025

Identifying Primary Stress Across Related Languages and Dialects with Transformer-based Speech Encoder Models.
Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

2024
Language Models on a Diet: Cost-Efficient Development of Encoders for Closely-Related Languages via Additional Pretraining.
CoRR, 2024

JSI and WüNLP at the DIALECT-COPA Shared Task: In-Context Learning From Just a Few Dialectal Examples Gets You Quite Far.
Proceedings of the Eleventh Workshop on NLP for Similar Languages, Varieties, and Dialects, 2024

DIALECT-COPA: Extending the Standard Translations of the COPA Causal Commonsense Reasoning Dataset to South Slavic Dialects.
Proceedings of the Eleventh Workshop on NLP for Similar Languages, Varieties, and Dialects, 2024

The ParlaSpeech Collection of Automatically Generated Speech and Text Datasets from Parliamentary Proceedings.
Proceedings of the Speech and Computer - 26th International Conference, 2024

Do Language Models Care about Text Quality? Evaluating Web-Crawled Corpora across 11 Languages.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

The ParlaSent Multilingual Training Dataset for Sentiment Identification in Parliamentary Proceedings.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

2023
BENCHić-lang: A Benchmark for Discriminating between Bosnian, Croatian, Montenegrin and Serbian.
Proceedings of the Tenth Workshop on NLP for Similar Languages, Varieties and Dialects, 2023

Get to Know Your Parallel Data: Performing English Variety and Genre Classification over MaCoCu Corpora.
Proceedings of the Tenth Workshop on NLP for Similar Languages, Varieties and Dialects, 2023

MaCoCu: Massive collection and curation of monolingual and bilingual data: focus on under-resourced languages.
Proceedings of the 24th Annual Conference of the European Association for Machine Translation, 2023

2022
The ParlaSent-BCS dataset of sentiment-annotated parliamentary debates from Bosnia-Herzegovina, Croatia, and Serbia.
CoRR, 2022

The GINCO Training Dataset for Web Genre Identification of Documents Out in the Wild.
Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022

MaCoCu: Massive collection and curation of monolingual and bilingual data: focus on under-resourced languages.
Proceedings of the 23rd Annual Conference of the European Association for Machine Translation, 2022

2021
Improving Effectiveness of a Coaching System Through Preference Learning.
Proceedings of the PETRA '21: The 14th PErvasive Technologies Related to Assistive Environments Conference, Virtual Event, Greece, 29 June, 2021


  Loading...