Scarlett Li

Orcid: 0009-0002-8912-4861

According to our database1, Scarlett Li authored at least 18 papers between 2024 and 2026.

Collaborative distances:

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
Scaling Data Difficulty: Improving Coding Models via Reinforcement Learning on Fresh and Challenging Problems.
CoRR, March, 2026

TestExplora: Benchmarking LLMs for Proactive Bug Discovery via Repository-Level Test Generation.
CoRR, February, 2026

Closing the Loop: Universal Repository Representation with RPG-Encoder.
CoRR, February, 2026

X-Coder: Advancing Competitive Programming with Fully Synthetic Tasks, Solutions, and Tests.
CoRR, January, 2026

2025
RPG: A Repository Planning Graph for Unified and Scalable Codebase Generation.
CoRR, September, 2025

rStar2-Agent: Agentic Reasoning Technical Report.
CoRR, August, 2025

Data Efficacy for Language Model Training.
CoRR, June, 2025

IterPref: Focal Preference Learning for Code Generation via Iterative Debugging.
CoRR, March, 2025

EpiCoder: Encompassing Diversity and Complexity in Code Generation.
Proceedings of the Forty-second International Conference on Machine Learning, 2025

Teaching Your Models to Understand Code via Focal Preference Alignment.
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

PEACE: Empowering Geologic Map Holistic Understanding with MLLMs.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

ProductMeta: An Interactive System for Metaphorical Product Design Ideation with Multimodal Large Language Models.
Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems, 2025

MMLU-CF: A Contamination-free Multi-task Language Understanding Benchmark.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

FEA-Bench: A Benchmark for Evaluating Repository-Level Code Generation for Feature Implementation.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Value Compass Benchmarks: A Comprehensive, Generative and Self-Evolving Platform for LLMs' Value Evaluation.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), 2025

2024
MMLU-CF: A Contamination-free Multi-task Language Understanding Benchmark.
CoRR, 2024

RedStone: Curating General, Code, Math, and QA Data for Large Language Models.
CoRR, 2024

Significant ASR Error Detection for Conversational Voice Assistants.
Proceedings of the IEEE International Conference on Acoustics, 2024


  Loading...