Tengyu Xu

CoRR, May, 2025

Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary Feedback.

[BibT_eX]

[DOI]

CoRR, January, 2025

HyperZero: A Customized End-to-End Auto-Tuning System for Recommendation with Hourly Feedback.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, V.1, 2025

Think Smarter not Harder: Adaptive Reasoning with Inference Aware Optimization.

[BibT_eX]

[DOI]

Zishun Yu

Di Jin

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Learning Auxiliary Tasks Improves Reference-Free Hallucination Detection in Open-Domain Long-Form Generation.

[BibT_eX]

[DOI]

Chengwei Qin

Wenxuan Zhou

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2025

2024

Provably Efficient Offline Reinforcement Learning With Trajectory-Wise Reward.

[BibT_eX]

[DOI]

IEEE Trans. Inf. Theory, September, 2024

Faster algorithm and sharper analysis for constrained Markov decision process.

[BibT_eX]

[DOI]

Oper. Res. Lett., 2024

Multi-IF: Benchmarking LLMs on Multi-Turn and Multilingual Instructions Following.

[BibT_eX]

[DOI]

CoRR, 2024

The Perfect Blend: Redefining RLHF with Mixture of Judges.

[BibT_eX]

[DOI]

CoRR, 2024

2023

Constraint-based multi-agent reinforcement learning for collaborative tasks.

[BibT_eX]

[DOI]

Comput. Animat. Virtual Worlds, 2023

2022

Provably Efficient Offline Reinforcement Learning with Trajectory-Wise Reward.

[BibT_eX]

[DOI]

CoRR, 2022

Deterministic policy gradient: Convergence analysis.

[BibT_eX]

[DOI]

Proceedings of the Uncertainty in Artificial Intelligence, 2022

A Unifying Framework of Off-Policy General Value Function Evaluation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Model-Based Offline Meta-Reinforcement Learning with Regularization.

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Learning Representations, 2022

PER-ETD: A Polynomially Efficient Emphatic Temporal Difference Learning Method.

[BibT_eX]

[DOI]

Ziwei Guan

Proceedings of the Tenth International Conference on Learning Representations, 2022

2021

A Unified Off-Policy Evaluation Approach for General Value Function.

[BibT_eX]

[DOI]

CoRR, 2021

Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality.

[BibT_eX]

[DOI]

Proceedings of the 38th International Conference on Machine Learning, 2021

CRPO: A New Approach for Safe Reinforcement Learning with Convergence Guarantee.

[BibT_eX]

[DOI]

Guanghui Lan

Proceedings of the 38th International Conference on Machine Learning, 2021

Proximal Gradient Descent-Ascent: Variable Convergence under KŁ Geometry.

[BibT_eX]

[DOI]

Proceedings of the 9th International Conference on Learning Representations, 2021

Sample Complexity Bounds for Two Timescale Value-based Reinforcement Learning Algorithms.

[BibT_eX]

[DOI]

Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, 2021

When Will Generative Adversarial Imitation Learning Algorithms Attain Global Convergence.

[BibT_eX]

[DOI]

Ziwei Guan

Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, 2021

Non-asymptotic Convergence of Adam-type Reinforcement Learning Algorithms under Markovian Sampling.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020

A Primal Approach to Constrained Policy Optimization: Global Optimality and Finite-Time Analysis.

[BibT_eX]

[DOI]

Guanghui Lan

CoRR, 2020

Enhanced First and Zeroth Order Variance Reduced Algorithms for Min-Max Optimization.

[BibT_eX]

[DOI]

CoRR, 2020

Non-asymptotic Convergence Analysis of Two Time-scale (Natural) Actor-Critic Algorithms.

[BibT_eX]

[DOI]

Zhe Wang

CoRR, 2020

Improving Sample Complexity Bounds for Actor-Critic Algorithms.

[BibT_eX]

[DOI]

Zhe Wang

CoRR, 2020

Improving Sample Complexity Bounds for (Natural) Actor-Critic Algorithms.

[BibT_eX]

[DOI]

Zhe Wang

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Reanalysis of Variance Reduced Temporal Difference Learning.

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Learning Representations, 2020

2019

Finite-Sample Analysis for SARSA and Q-Learning with Linear Function Approximation.

[BibT_eX]

[DOI]

Shaofeng Zou

CoRR, 2019

Finite-Sample Analysis for SARSA with Linear Function Approximation.

[BibT_eX]

[DOI]

Shaofeng Zou

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Two Time-scale Off-Policy TD Learning: Non-asymptotic Analysis over Markovian Samples.

[BibT_eX]

[DOI]

Shaofeng Zou