Ziniu Li

Orcid: 0000-0003-0449-002X

According to our database1, Ziniu Li authored at least 58 papers between 2019 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
A Survey on Large Language Models for Mathematical Reasoning.
ACM Comput. Surv., June, 2026

Do Phone-Use Agents Respect Your Privacy?
CoRR, April, 2026

Off-Policy Value-Based Reinforcement Learning for Large Language Models.
CoRR, March, 2026

Non-Adversarial Imitation Learning Provably Free of Compounding Errors: The Role of Bellman Constraints.
CoRR, March, 2026

The Optimal Token Baseline: Variance Reduction for Long-Horizon LLM-RL.
CoRR, February, 2026

Beyond Precision: Training-Inference Mismatch is an Optimization Problem and Simple LR Scheduling Fixes It.
CoRR, February, 2026

The Molecular Structure of Thought: Mapping the Topology of Long Chain-of-Thought Reasoning.
CoRR, January, 2026

Stepwise Guided Policy Optimization: Coloring Your Incorrect Reasoning in GRPO.
Trans. Mach. Learn. Res., 2026

2025
Encyclo-K: Evaluating LLMs with Dynamically Composed Knowledge Statements.
CoRR, December, 2025

A Note on Hybrid Online Reinforcement and Imitation Learning for LLMs: Formulations and Algorithms.
CoRR, December, 2025

Taming the Tail: Stable LLM Reinforcement Learning via Dynamic Vocabulary Pruning.
CoRR, December, 2025

Trust Region Masking for Long-Horizon LLM Reinforcement Learning.
CoRR, December, 2025

Exploration vs Exploitation: Rethinking RLVR through Clipping, Entropy, and Spurious Reward.
CoRR, December, 2025

Beyond Token-level Supervision: Unlocking the Potential of Decoding-based Regression via Reinforcement Learning.
CoRR, December, 2025

SpeechJudge: Towards Human-Level Judgment for Speech Naturalness.
CoRR, November, 2025

ORGEval: Graph-Theoretic Evaluation of LLMs in Optimization Modeling.
CoRR, October, 2025

Scaling Latent Reasoning via Looped Language Models.
CoRR, October, 2025

Teaching Language Models to Reason with Tools.
CoRR, October, 2025

Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget Allocation.
CoRR, September, 2025

TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling.
CoRR, August, 2025

Bridging Formal Language with Chain-of-Thought Reasoning to Geometry Problem Solving.
CoRR, August, 2025

CoRT: Code-integrated Reasoning within Thinking.
CoRR, June, 2025

A Survey on Large Language Models for Mathematical Reasoning.
CoRR, June, 2025

Quality-Diversity Red-Teaming: Automated Generation of High-Quality and Diverse Attackers for Large Language Models.
CoRR, June, 2025

Spectral Policy Optimization: Coloring your Incorrect Reasoning in GRPO.
CoRR, May, 2025

Controlling Large Language Model with Latent Actions.
CoRR, March, 2025

RealCritic: Towards Effectiveness-Driven Evaluation of Language Model Critiques.
CoRR, January, 2025

Enabling Scalable Oversight via Self-Evolving Critic.
CoRR, January, 2025

Controlling Large Language Model with Latent Action.
Proceedings of the Forty-second International Conference on Machine Learning, 2025

Adam-mini: Use Fewer Learning Rates To Gain More.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Understanding and Mitigating Hallucination in Large Vision-Language Models via Modular Attribution and Intervention.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Preserving Diversity in Supervised Fine-Tuning of Large Language Models.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Advancing Zero-shot Text-to-Speech Intelligibility across Diverse Domains via Preference Alignment.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024
Sensing Jamming Strategy From Limited Observations: An Imitation Learning Perspective.
IEEE Trans. Signal Process., 2024

Entropic Distribution Matching in Supervised Fine-tuning of LLMs: Less Overfitting and Better Diversity.
CoRR, 2024

Adam-mini: Use Fewer Learning Rates To Gain More.
CoRR, 2024

BWArea Model: Learning World Model, Inverse Dynamics, and Policy for Controllable Language Generation.
CoRR, 2024

On the Algorithmic Bias of Aligning Large Language Models with RLHF: Preference Collapse and Matching Regularization.
CoRR, 2024

Why Transformers Need Adam: A Hessian Perspective.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

When is RL better than DPO in RLHF? A Representation and Optimization Perspective.
Proceedings of the Second Tiny Papers Track at ICLR 2024, 2024

Unlocking Black-Box Prompt Tuning Efficiency via Zeroth-Order Optimization.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

2023
Policy Optimization in RLHF: The Impact of Out-of-preference Data.
CoRR, 2023

ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models.
CoRR, 2023

Deploying Offline Reinforcement Learning with Human Feedback.
CoRR, 2023

Theoretical Analysis of Offline Imitation With Supplementary Dataset.
CoRR, 2023

Provably Efficient Adversarial Imitation Learning with Unknown Transitions.
Proceedings of the Uncertainty in Artificial Intelligence, 2023

Imitation Learning from Imperfection: Theoretical Justifications and Algorithms.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

2022
Error Bounds of Imitating Policies and Environments for Reinforcement Learning.
IEEE Trans. Pattern Anal. Mach. Intell., 2022

Understanding Adversarial Imitation Learning in Small Sample Regime: A Stage-coupled Analysis.
CoRR, 2022

A Note on Target Q-learning For Solving Finite MDPs with A Generative Oracle.
CoRR, 2022

Rethinking ValueDice: Does It Really Improve Performance?
CoRR, 2022

HyperDQN: A Randomized Exploration Method for Deep Reinforcement Learning.
Proceedings of the Tenth International Conference on Learning Representations, 2022

2021
Nearly Minimax Optimal Adversarial Imitation Learning with Known and Unknown Transitions.
CoRR, 2021

2020
Solving the Inverse Design Problem of Electrical Fuse With Machine Learning.
IEEE Access, 2020

Error Bounds of Imitating Policies and Environments.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Efficient Exploration by Novelty-Pursuit.
Proceedings of the Distributed Artificial Intelligence - Second International Conference, 2020

2019
On Value Discrepancy of Imitation Learning.
CoRR, 2019


  Loading...