Ján Cegin

Orcid: 0000-0003-2692-9320

According to our database1, Ján Cegin authored at least 15 papers between 2020 and 2026.

Collaborative distances:
  • Dijkstra number2 of five.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

Online presence:

On csauthors.net:

Bibliography

2026
Revisiting Prompt Sensitivity in Large Language Models for Text Classification: The Role of Prompt Underspecification.
CoRR, February, 2026

Better as Generators Than Classifiers: Leveraging LLMs and Synthetic Data for Low-Resource Multilingual Classification.
Proceedings of the Findings of the Association for Computational Linguistics: EACL 2026, 2026

MultiCW: A Large-Scale Balanced Benchmark Dataset for Training Robust Check-Worthiness Detection Models.
Proceedings of the Findings of the Association for Computational Linguistics: EACL 2026, 2026

RoSE: Round-robin Synthetic Data Evaluation for Selecting LLM Generators without Human Test Sets.
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics, 2026

2025
LLMs vs Established Text Augmentation Techniques for Classification: When do the Benefits Outweight the Costs?
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

Use Random Selection for Now: Investigation of Few-Shot Selection Strategies in LLM-based Text Augmentation.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2025, 2025

A Rigorous Evaluation of LLM Data Generation Strategies for Low-Resource Languages.
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

2024
Use Random Selection for Now: Investigation of Few-Shot Selection Strategies in LLM-based Text Augmentation for Classification.
CoRR, 2024

Fighting Randomness with Randomness: Mitigating Optimisation Instability of Fine-Tuning using Delayed Ensemble and Noisy Interpolation.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

Effects of diversity incentives on sample diversity and downstream model performance in LLM-based text augmentation.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023
ChatGPT to Replace Crowdsourcing of Paraphrases for Intent Classification: Higher Diversity and Comparable Model Robustness.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

2022
A Game for Crowdsourcing Adversarial Examples for False Information Detection.
Proceedings of the 2nd Workshop on Adverse Impacts and Collateral Effects of Artificial Intelligence Technologies, 2022

2020
Machine learning based test data generation for safety-critical software.
Proceedings of the ESEC/FSE '20: 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2020

Synthesized dataset for search-based test data generation methods focused on MC/DC criterion.
Proceedings of the 20th IEEE International Conference on Software Quality, 2020

Test Data Generation for MC/DC Criterion using Reinforcement Learning.
Proceedings of the 13th IEEE International Conference on Software Testing, 2020


  Loading...