Ján Cegin

Orcid: 0000-0003-2692-9320

According to our database¹, Ján Cegin authored at least 15 papers between 2020 and 2026.

Collaborative distances:

Dijkstra number² of five.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Bibliography

2026

Revisiting Prompt Sensitivity in Large Language Models for Text Classification: The Role of Prompt Underspecification.

[BibT_eX]

[DOI]

CoRR, February, 2026

Better as Generators Than Classifiers: Leveraging LLMs and Synthetic Data for Low-Resource Multilingual Classification.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EACL 2026, 2026

MultiCW: A Large-Scale Balanced Benchmark Dataset for Training Robust Check-Worthiness Detection Models.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EACL 2026, 2026

RoSE: Round-robin Synthetic Data Evaluation for Selecting LLM Generators without Human Test Sets.

[BibT_eX]

[DOI]

Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics, 2026

2025

LLMs vs Established Text Augmentation Techniques for Classification: When do the Benefits Outweight the Costs?

[BibT_eX]

[DOI]

Ján Cegin

Jakub Simko

Peter Brusilovsky

Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

Use Random Selection for Now: Investigation of Few-Shot Selection Strategies in LLM-based Text Augmentation.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2025, 2025

A Rigorous Evaluation of LLM Data Generation Strategies for Low-Resource Languages.

[BibT_eX]

[DOI]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

2024

Use Random Selection for Now: Investigation of Few-Shot Selection Strategies in LLM-based Text Augmentation for Classification.

[BibT_eX]

[DOI]

CoRR, 2024

Fighting Randomness with Randomness: Mitigating Optimisation Instability of Fine-Tuning using Delayed Ensemble and Noisy Interpolation.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

Effects of diversity incentives on sample diversity and downstream model performance in LLM-based text augmentation.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023

ChatGPT to Replace Crowdsourcing of Paraphrases for Intent Classification: Higher Diversity and Comparable Model Robustness.

[BibT_eX]

[DOI]

Ján Cegin

Jakub Simko

Peter Brusilovsky

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

2022

A Game for Crowdsourcing Adversarial Examples for False Information Detection.

[BibT_eX]

[DOI]

Ján Cegin

Jakub Simko

Peter Brusilovsky

Proceedings of the 2nd Workshop on Adverse Impacts and Collateral Effects of Artificial Intelligence Technologies, 2022

2020

Machine learning based test data generation for safety-critical software.

[BibT_eX]

[DOI]

Ján Cegin

Proceedings of the ESEC/FSE '20: 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2020

Synthesized dataset for search-based test data generation methods focused on MC/DC criterion.

[BibT_eX]

[DOI]

Ján Cegin

Karol Rástocný

Mária Bieliková

Proceedings of the 20th IEEE International Conference on Software Quality, 2020

Test Data Generation for MC/DC Criterion using Reinforcement Learning.

[BibT_eX]

[DOI]

Ján Cegin

Karol Rástocný