Vinko Sabolcec

According to our database1, Vinko Sabolcec authored at least 6 papers between 2025 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
Toward Cross-Lingual Quality Classifiers for Multilingual Pretraining Data Selection.
CoRR, April, 2026

Apertus: Democratizing Open and Compliant LLMs for Global Language Environments.
, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2026

2025
FineWeb2: One Pipeline to Scale Them All - Adapting Pre-Training Data Processing to Every Language.
CoRR, June, 2025

Can Performant LLMs Be Ethical? Quantifying the Impact of Web Crawling Opt-Outs.
CoRR, April, 2025

Enhancing Multilingual LLM Pretraining with Model-Based Data Selection.
CoRR, February, 2025

URLs Help, Topics Guide: Understanding Metadata Utility in LLM Training.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025


  Loading...