Clive Bai
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
2026
Listwise Policy Optimization: Group-based RLVR as Target-Projection on the LLM Response Simplex.
CoRR, May, 2026
Composition-RL: Compose Your Verifiable Prompts for Reinforcement Learning of Large Language Models.
CoRR, February, 2026
Small Generalizable Prompt Predictive Models Can Steer Efficient RL Post-Training of Large Reasoning Models.
CoRR, February, 2026
CoRR, January, 2026