Runlong Zhou

According to our database1, Runlong Zhou authored at least 15 papers between 2021 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
Sharp Gap-Dependent Variance-Aware Regret Bounds for Tabular MDPs.
CoRR, June, 2025

Understanding the Performance Gap in Preference Learning: A Dichotomy of RLHF and DPO.
CoRR, May, 2025

CASCADE Your Datasets for Cross-Mode Knowledge Retrieval of Language Models.
CoRR, April, 2025

Extragradient Preference Optimization (EGPO): Beyond Last-Iterate Convergence for Nash Learning from Human Feedback.
CoRR, March, 2025

The Crucial Role of Samplers in Online Direct Preference Optimization.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024
Transformers are Efficient Compilers, Provably.
CoRR, 2024

Multi-Agent Reinforcement Learning from Human Feedback: Data Coverage and Algorithmic Techniques.
CoRR, 2024

Free from Bellman Completeness: Trajectory Stitching via Model-based Return-conditioned Supervised Learning.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Reflect-RL: Two-Player Online RL Fine-Tuning for LMs.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023
Understanding Curriculum Learning in Policy Optimization for Online Combinatorial Optimization.
Trans. Mach. Learn. Res., 2023

Sharp Variance-Dependent Bounds in Reinforcement Learning: Best of Both Worlds in Stochastic and Deterministic Environments.
Proceedings of the International Conference on Machine Learning, 2023

Horizon-Free and Variance-Dependent Reinforcement Learning for Latent Markov Decision Processes.
Proceedings of the International Conference on Machine Learning, 2023

2022
Horizon-Free Reinforcement Learning for Latent Markov Decision Processes.
CoRR, 2022

Understanding Curriculum Learning in Policy Optimization for Solving Combinatorial Optimization Problems.
CoRR, 2022

2021
Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021


  Loading...