Xuerui Su
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
2025
DGRO: Enhancing LLM Reasoning via Exploration-Exploitation Control and Reward Variance Management.
CoRR, May, 2025
Trust Region Preference Approximation: A simple and stable reinforcement learning algorithm for LLM reasoning.
CoRR, April, 2025
CoRR, February, 2025