Pavel Stepachev

According to our database1, Pavel Stepachev authored at least 10 papers between 2018 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
CommonLID: Re-evaluating State-of-the-Art Language Identification Performance on Web Data.
, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
CoRR, January, 2026

2025
HPLT 3.0: Very Large-Scale Multilingual Resources for LLM and MT. Mono- and Bi-lingual Data, Multilingual Evaluation, and Pre-Trained Models.
CoRR, November, 2025

Back to Bytes: Revisiting Tokenization Through UTF-8.
CoRR, October, 2025

An Expanded Massive Multilingual Dataset for High-Performance Language Technologies.
CoRR, March, 2025



2024
Context and System Fusion in Post-ASR Emotion Recognition with Large Language Models.
CoRR, 2024

Quality or Quantity? On Data Scale and Diversity in Adapting Large Language Models for Low-Resource Translation.
Proceedings of the Ninth Conference on Machine Translation, 2024

HPLT's First Release of Data and Models.
Proceedings of the 25th Annual Conference of the European Association for Machine Translation (Volume 2), 2024

2018
Multi-source synthetic treebank creation for improved cross-lingual dependency parsing.
Proceedings of the Second Workshop on Universal Dependencies, 2018


  Loading...