Hynek Kydlícek

According to our database1, Hynek Kydlícek authored at least 7 papers between 2023 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
FineWeb2: One Pipeline to Scale Them All - Adapting Pre-Training Data Processing to Every Language.
CoRR, June, 2025

Sailor2: Sailing in South-East Asia with Inclusive Multilingual LLMs.
CoRR, February, 2025

SmolLM2: When Smol Goes Big - Data-Centric Training of a Small Language Model.
CoRR, February, 2025

Towards Best Practices for Open Datasets for LLM Training.
CoRR, January, 2025

2024
BenCzechMark : A Czech-centric Multitask and Multimetric Benchmark for Large Language Models with Duel Scoring Mechanism.
CoRR, 2024

The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

2023
A Dataset and Strong Baselines for Classification of Czech News Texts.
Proceedings of the Text, Speech, and Dialogue - 26th International Conference, 2023


  Loading...