Change Jia
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
2025
Evaluation is All You Need: Strategic Overclaiming of LLM Reasoning Capabilities Through Evaluation Design.
CoRR, June, 2025
CoRR, March, 2025
Stress Testing Generalization: How Minor Modifications Undermine Large Language Model Performance.
CoRR, February, 2025