Ziniu Li

Jiacai Liu

CoRR, December, 2025

Taming the Tail: Stable LLM Reinforcement Learning via Dynamic Vocabulary Pruning.

[BibT_eX]

[DOI]

CoRR, December, 2025

Trust Region Masking for Long-Horizon LLM Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, December, 2025

Exploration vs Exploitation: Rethinking RLVR through Clipping, Entropy, and Spurious Reward.

[BibT_eX]

[DOI]

CoRR, December, 2025

Beyond Token-level Supervision: Unlocking the Potential of Decoding-based Regression via Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, December, 2025

SpeechJudge: Towards Human-Level Judgment for Speech Naturalness.

[BibT_eX]

[DOI]

CoRR, November, 2025

ORGEval: Graph-Theoretic Evaluation of LLMs in Optimization Modeling.

[BibT_eX]

[DOI]

CoRR, October, 2025

Scaling Latent Reasoning via Looped Language Models.

[BibT_eX]

[DOI]

CoRR, October, 2025

Teaching Language Models to Reason with Tools.

[BibT_eX]

[DOI]

CoRR, October, 2025

Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget Allocation.

[BibT_eX]

[DOI]

CoRR, September, 2025

TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling.

[BibT_eX]

[DOI]

CoRR, August, 2025

Bridging Formal Language with Chain-of-Thought Reasoning to Geometry Problem Solving.

[BibT_eX]

[DOI]

CoRR, August, 2025

CoRT: Code-integrated Reasoning within Thinking.

[BibT_eX]

[DOI]

CoRR, June, 2025

A Survey on Large Language Models for Mathematical Reasoning.

[BibT_eX]

[DOI]

CoRR, June, 2025

Quality-Diversity Red-Teaming: Automated Generation of High-Quality and Diverse Attackers for Large Language Models.

[BibT_eX]

[DOI]

CoRR, June, 2025

Spectral Policy Optimization: Coloring your Incorrect Reasoning in GRPO.

[BibT_eX]

[DOI]

CoRR, May, 2025

Controlling Large Language Model with Latent Actions.

[BibT_eX]

[DOI]

CoRR, March, 2025

RealCritic: Towards Effectiveness-Driven Evaluation of Language Model Critiques.

[BibT_eX]

[DOI]

CoRR, January, 2025

Enabling Scalable Oversight via Self-Evolving Critic.

[BibT_eX]

[DOI]

CoRR, January, 2025

Controlling Large Language Model with Latent Action.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Adam-mini: Use Fewer Learning Rates To Gain More.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Understanding and Mitigating Hallucination in Large Vision-Language Models via Modular Attribution and Intervention.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Preserving Diversity in Supervised Fine-Tuning of Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Advancing Zero-shot Text-to-Speech Intelligibility across Diverse Domains via Preference Alignment.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024

Sensing Jamming Strategy From Limited Observations: An Imitation Learning Perspective.

[BibT_eX]

[DOI]

IEEE Trans. Signal Process., 2024

Entropic Distribution Matching in Supervised Fine-tuning of LLMs: Less Overfitting and Better Diversity.

[BibT_eX]

[DOI]

CoRR, 2024

Adam-mini: Use Fewer Learning Rates To Gain More.

[BibT_eX]

[DOI]

CoRR, 2024

BWArea Model: Learning World Model, Inverse Dynamics, and Policy for Controllable Language Generation.

[BibT_eX]

[DOI]

CoRR, 2024

On the Algorithmic Bias of Aligning Large Language Models with RLHF: Preference Collapse and Matching Regularization.

[BibT_eX]

[DOI]

CoRR, 2024

Why Transformers Need Adam: A Hessian Perspective.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

When is RL better than DPO in RLHF? A Representation and Optimization Perspective.

[BibT_eX]

[DOI]

Proceedings of the Second Tiny Papers Track at ICLR 2024, 2024

Unlocking Black-Box Prompt Tuning Efficiency via Zeroth-Order Optimization.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

2023

Policy Optimization in RLHF: The Impact of Out-of-preference Data.

[BibT_eX]

[DOI]

CoRR, 2023

ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

Deploying Offline Reinforcement Learning with Human Feedback.

[BibT_eX]

[DOI]

CoRR, 2023

Theoretical Analysis of Offline Imitation With Supplementary Dataset.

[BibT_eX]

[DOI]

CoRR, 2023

Provably Efficient Adversarial Imitation Learning with Unknown Transitions.

[BibT_eX]

[DOI]

Proceedings of the Uncertainty in Artificial Intelligence, 2023

Imitation Learning from Imperfection: Theoretical Justifications and Algorithms.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

2022

Error Bounds of Imitating Policies and Environments for Reinforcement Learning.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2022

Understanding Adversarial Imitation Learning in Small Sample Regime: A Stage-coupled Analysis.

[BibT_eX]

[DOI]

CoRR, 2022

A Note on Target Q-learning For Solving Finite MDPs with A Generative Oracle.

[BibT_eX]

[DOI]

CoRR, 2022

Rethinking ValueDice: Does It Really Improve Performance?

[BibT_eX]

[DOI]

CoRR, 2022

HyperDQN: A Randomized Exploration Method for Deep Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Learning Representations, 2022

2021

Nearly Minimax Optimal Adversarial Imitation Learning with Known and Unknown Transitions.

[BibT_eX]

[DOI]

CoRR, 2021

2020

Solving the Inverse Design Problem of Electrical Fuse With Machine Learning.

[BibT_eX]

[DOI]

IEEE Access, 2020

Error Bounds of Imitating Policies and Environments.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Efficient Exploration by Novelty-Pursuit.

[BibT_eX]

[DOI]

Xiong-Hui Chen

Proceedings of the Distributed Artificial Intelligence - Second International Conference, 2020

2019

On Value Discrepancy of Imitation Learning.

[BibT_eX]

[DOI]