Zhuoran Yang

Suriyanarayanan Vaikuntanathan

Yanyong Zhang

CoRR, February, 2026

Llama-3.1-FoundationAI-SecurityLLM-Reasoning-8B Technical Report.

[BibT_eX]

[DOI]

CoRR, January, 2026

Demystifying the Slash Pattern in Attention: The Role of RoPE.

[BibT_eX]

[DOI]

CoRR, January, 2026

Offline Reinforcement Learning for Human-Guided Human-Machine Interaction with Private Information.

[BibT_eX]

[DOI]

Manag. Sci., 2026

A Reinforcement Learning Approach in Multi-Phase Second-Price Auction Design.

[BibT_eX]

[DOI]

J. Mach. Learn. Res., 2026

Probing Audio-Visual Reasoning in Multimodal Language Models through the Lens of Audio.

[BibT_eX]

[DOI]

Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2026

2025

Dual-Robust Cross-Domain Offline Reinforcement Learning Against Dynamics Shifts.

[BibT_eX]

[DOI]

CoRR, December, 2025

Cross-Domain Offline Policy Adaptation with Dynamics- and Value-Aligned Data Filtering.

[BibT_eX]

[DOI]

CoRR, December, 2025

Build Your Personalized Research Group: A Multiagent Framework for Continual and Interactive Science Automation.

[BibT_eX]

[DOI]

CoRR, October, 2025

Unlocking Out-of-Distribution Generalization in Transformers via Recursive Latent Space Reasoning.

[BibT_eX]

[DOI]

CoRR, October, 2025

Muon Outperforms Adam in Tail-End Associative Memory Learning.

[BibT_eX]

[DOI]

CoRR, September, 2025

The Future of Artificial Intelligence and the Mathematical and Physical Sciences (AI+MPS).

[BibT_eX]

[DOI]

Nicolás García Trillos

Cecilia Garraffo

Robert Ghrist

Rafael Gómez-Bombarelli

Aggelos K. Katsaggelos

Christopher Rackauckas

René Vidal

Francisco Villaescusa-Navarro

Yaroslava G. Yingling

CoRR, September, 2025

Kwai Keye-VL 1.5 Technical Report.

[BibT_eX]

[DOI]

CoRR, September, 2025

Kwai Keye-VL Technical Report.

[BibT_eX]

[DOI]

CoRR, July, 2025

Taming Polysemanticity in LLMs: Provable Feature Recovery via Sparse Autoencoders.

[BibT_eX]

[DOI]

CoRR, June, 2025

Learning to Lead: Incentivizing Strategic Agents in the Dark.

[BibT_eX]

[DOI]

Yuchen Wu

Xinyi Zhong

CoRR, June, 2025

Quantile-Optimal Policy Learning under Unmeasured Confounding.

[BibT_eX]

[DOI]

CoRR, June, 2025

Self-Supervised Pre-training with Combined Datasets for 3D Perception in Autonomous Driving.

[BibT_eX]

[DOI]

CoRR, April, 2025

LightDefense: A Lightweight Uncertainty-Driven Defense against Jailbreaks via Shifted Token Distribution.

[BibT_eX]

[DOI]

CoRR, April, 2025

Error Analyses of Auto-Regressive Video Diffusion Models: A Unified Framework.

[BibT_eX]

[DOI]

CoRR, March, 2025

Nash Equilibrium Constrained Auto-bidding With Bi-level Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, March, 2025

Reflective Planning: Vision-Language Models for Multi-Stage Long-Horizon Robotic Manipulation.

[BibT_eX]

[DOI]

CoRR, February, 2025

Active Advantage-Aligned Online Reinforcement Learning with Offline Data.

[BibT_eX]

[DOI]

CoRR, February, 2025

DrugImproverGPT: A Large Language Model for Drug Optimization with Fine-Tuning via Structured Policy Optimization.

[BibT_eX]

[DOI]

CoRR, February, 2025

Sample-Efficient Reinforcement Learning from Human Feedback via Information-Directed Sampling.

[BibT_eX]

[DOI]

CoRR, February, 2025

Learning Task Representations from In-Context Learning.

[BibT_eX]

[DOI]

CoRR, February, 2025

Sample-Efficient Reinforcement Learning From Human Feedback via Information-Directed Sampling.

[BibT_eX]

[DOI]

IEEE Trans. Inf. Theory, 2025

Is Pessimism Provably Efficient for Offline Reinforcement Learning?

[BibT_eX]

[DOI]

Ying Jin

Math. Oper. Res., 2025

Multi-Channel Deep Pulse-Coupled Net: A Novel Bearing Fault Diagnosis Framework.

[BibT_eX]

[DOI]

IET Image Process., 2025

Decoding Rewards in Competitive Games: Inverse Game Theory with Entropy Regularization.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

The Sample Complexity of Online Strategic Decision Making with Information Asymmetry and Knowledge Transportability.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

In-Context Linear Regression Demystified: Training Dynamics and Mechanistic Interpretability of Multi-Head Softmax Attention.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

In-Context Reinforcement Learning From Suboptimal Historical Data.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

An Instrumental Value for Data Production and its Application to Data Pricing.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

BanditSpec: Adaptive Speculative Decoding via Bandit Algorithms.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Can Neural Networks Achieve Optimal Computational-statistical Tradeoff? An Analysis on Single-Index Model.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

InstaDrive: Instance-Aware Driving World Models for Realistic and Consistent Video Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

What and How does In-Context Learning Learn? Bayesian Model Averaging, Parameterization, and Generalization.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Artificial Intelligence and Statistics, 2025

Learning Task Representations from In-Context Learning.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2025

2024

False Correlation Reduction for Offline Reinforcement Learning.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., February, 2024

Pessimistic value iteration for multi-task data sharing in Offline Reinforcement Learning.

[BibT_eX]

[DOI]

Artif. Intell., January, 2024

Neural Temporal Difference and Q Learning Provably Converge to Global Optima.

[BibT_eX]

[DOI]

Math. Oper. Res., 2024

Learning Regularized Graphon Mean-Field Games with Unknown Graphons.

[BibT_eX]

[DOI]

J. Mach. Learn. Res., 2024

Learning Dynamic Mechanisms in Unknown Environments: A Reinforcement Learning Approach.

[BibT_eX]

[DOI]

J. Mach. Learn. Res., 2024

Instrumental Variable Value Iteration for Causal Offline Reinforcement Learning.

[BibT_eX]

[DOI]

J. Mach. Learn. Res., 2024

Enhancing Multi-Text Long Video Generation Consistency without Tuning: Time-Frequency Analysis, Prompt Alignment, and Theory.

[BibT_eX]

[DOI]

CoRR, 2024

Physical Informed Driving World Model.

[BibT_eX]

[DOI]

CoRR, 2024

AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information?

[BibT_eX]

[DOI]

CoRR, 2024

Unveiling the Statistical Foundations of Chain-of-Thought Prompting Methods.

[BibT_eX]

[DOI]

CoRR, 2024

Provable Statistical Rates for Consistency Diffusion Models.

[BibT_eX]

[DOI]

CoRR, 2024

STRIDE: A Tool-Assisted LLM Agent Framework for Strategic and Interactive Decision-Making.

[BibT_eX]

[DOI]

CoRR, 2024

A Mean-Field Analysis of Neural Gradient Descent-Ascent: Applications to Functional Conditional Moment Equations.

[BibT_eX]

[DOI]

CoRR, 2024

Unveil Conditional Diffusion Models with Classifier-free Guidance: A Sharp Statistical Theory.

[BibT_eX]

[DOI]

CoRR, 2024

Training Dynamics of Multi-Head Softmax Attention for In-Context Learning: Emergence, Convergence, and Optimality.

[BibT_eX]

[DOI]

CoRR, 2024

Unveiling Induction Heads: Provable Training Dynamics and Feature Learning in Transformers.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

On the Role of Information Structure in Reinforcement Learning for Partially-Observable Sequential Teams and Games.

[BibT_eX]

[DOI]

Awni Altabaa

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

How Does Goal Relabeling Improve Sample Efficiency?

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Mean Field Langevin Actor-Critic: Faster Convergence and Global Optimality beyond Lazy Learning.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

A General Framework for Sequential Decision-Making under Adaptivity Constraints.

[BibT_eX]

[DOI]

Nuoya Xiong

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Principled Penalty-based Methods for Bilevel Reinforcement Learning and RLHF.

[BibT_eX]

[DOI]

Han Shen

Tianyi Chen

Proceedings of the Forty-first International Conference on Machine Learning, 2024

From Words to Actions: Unveiling the Theoretical Underpinnings of LLM-Driven Autonomous Systems.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Theory of Consistency Diffusion Models: Distribution Estimation Meets Fast Sampling.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Sample-Efficient Multi-Agent RL: An Optimization Perspective.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Symmetric Mean-field Langevin Dynamics for Distributional Minimax Problems.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Sample-efficient Learning of Infinite-horizon Average-reward MDPs with General Function Approximation.

[BibT_eX]

[DOI]

Jianliang He

Han Zhong

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Training Dynamics of Multi-Head Softmax Attention for In-Context Learning: Emergence, Convergence, and Optimality (extended abstract).

[BibT_eX]

[DOI]

Proceedings of the Thirty Seventh Annual Conference on Learning Theory, June 30, 2024

2023

Being Trustworthy is Not Enough: How Untrustworthy Artificial Intelligence (AI) Can Deceive the End-Users and Gain Their Trust.

[BibT_eX]

[DOI]

Proc. ACM Hum. Comput. Interact., April, 2023

A Two-Timescale Stochastic Algorithm Framework for Bilevel Optimization: Complexity Analysis and Application to Actor-Critic.

[BibT_eX]

[DOI]

SIAM J. Optim., March, 2023

Double Duality: Variational Primal-Dual Policy Optimization for Constrained Reinforcement Learning.

[BibT_eX]

[DOI]

J. Mach. Learn. Res., 2023

Can Reinforcement Learning Find Stackelberg-Nash Equilibria in General-Sum Markov Games with Myopically Rational Followers?

[BibT_eX]

[DOI]

J. Mach. Learn. Res., 2023

Empowering Autonomous Driving with Large Language Models: A Safety Perspective.

[BibT_eX]

[DOI]

CoRR, 2023

Provably Efficient High-Dimensional Bandit Learning with Batched Feedbacks.

[BibT_eX]

[DOI]

CoRR, 2023

Actions Speak What You Want: Provably Sample-Efficient Reinforcement Learning of the Quantal Stackelberg Equilibrium from Strategic Feedbacks.

[BibT_eX]

[DOI]

Siyu Chen

Mengdi Wang

CoRR, 2023

Contextual Dynamic Pricing with Strategic Buyers.

[BibT_eX]

[DOI]

CoRR, 2023

Reinforcement Learning with Human Feedback: Learning Dynamic Choices via Pessimism.

[BibT_eX]

[DOI]

Zihao Li

Mengdi Wang

CoRR, 2023

One Objective to Rule Them All: A Maximization Objective Fusing Estimation and Planning for Exploration.

[BibT_eX]

[DOI]

CoRR, 2023

A Unified Framework of Policy Learning for Contextual Bandit with Confounding Bias and Missing Observations.

[BibT_eX]

[DOI]

CoRR, 2023

Finding Regularized Competitive Equilibria of Heterogeneous Agent Macroeconomic Models with Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, 2023

Partial Discharge Characteristics and Growth Stage Recognition of Electrical Tree in XLPE Insulation.

[BibT_eX]

[DOI]

IEEE Access, 2023

The Sample Complexity of Online Contract Design.

[BibT_eX]

[DOI]

Proceedings of the 24th ACM Conference on Economics and Computation, 2023

Online Performative Gradient Descent for Learning Nash Equilibria in Decision-Dependent Games.

[BibT_eX]

[DOI]

Zihan Zhu

Ethan X. Fang

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Learning Regularized Monotone Graphon Mean-Field Games.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Posterior Sampling for Competitive RL: Function Approximation and Partial Observation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Maximize to Explore: One Objective Function Fusing Estimation, Planning, and Exploration.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Diffusion Model is an Effective Planner and Data Synthesizer for Multi-Task Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Provably Efficient Generalized Lagrangian Policy Optimization for Safe Multi-Agent Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the Learning for Dynamics and Control Conference, 2023

Provably Efficient Representation Learning with Tractable Planning in Low-Rank POMDP.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

Learning to Incentivize Information Acquisition: Proper Scoring Rules Meet Principal-Agent Model.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

Local Optimization Achieves Global Optimality in Multi-Agent Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

Enforcing Hard Constraints with Soft Barriers: Safe Reinforcement Learning in Unknown Stochastic Environments.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

Optimistic Exploration with Learned Features Provably Solves Markov Decision Processes with Neural Dynamics.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Decentralized Optimistic Hyperpolicy Mirror Descent: Provably No-Regret Learning in Markov Games.

[BibT_eX]

[DOI]

Wenhao Zhan

Jason D. Lee

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Offline RL with No OOD Actions: In-Sample Learning via Implicit Value Regularization.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Can We Find Nash Equilibria at a Linear Rate in Markov Games?

[BibT_eX]

[DOI]

Zhuoqing Song

Jason D. Lee

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Pessimism in the Face of Confounders: Provably Efficient Offline Reinforcement Learning in Partially Observable Markov Decision Processes.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Represent to Control Partially Observed Systems: Representation Learning with Provable Sample Efficiency.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Joint Differentiable Optimization and Verification for Certified Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE 14th International Conference on Cyber-Physical Systems, 2023

Finding Regularized Competitive Equilibria of Heterogeneous Agent Macroeconomic Models via Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Artificial Intelligence and Statistics, 2023

2022

Offline Policy Optimization in RL with Variance Regularizaton.

[BibT_eX]

[DOI]

CoRR, 2022

Policy learning "without" overlap: Pessimism and generalized empirical Bernstein's inequality.

[BibT_eX]

[DOI]

CoRR, 2022

GEC: A Unified Framework for Interactive Decision Making in MDP, POMDP, and Beyond.

[BibT_eX]

[DOI]

CoRR, 2022

Offline Reinforcement Learning with Instrumental Variables in Confounded Markov Decision Processes.

[BibT_eX]

[DOI]

CoRR, 2022

Strategic Decision-Making in the Presence of Information Asymmetry: Provably Efficient RL with Algorithmic Instruments.

[BibT_eX]

[DOI]

Mengxin Yu

Jianqing Fan

CoRR, 2022

Embed to Control Partially Observed Systems: Representation Learning with Provable Sample Efficiency.

[BibT_eX]

[DOI]

CoRR, 2022

Sample-Efficient Reinforcement Learning for POMDPs with Linear Function Approximations.

[BibT_eX]

[DOI]

Qi Cai

CoRR, 2022

The Best of Both Worlds: Reinforcement Learning with Logarithmic Regret and Policy Switches.

[BibT_eX]

[DOI]

Grigoris Velegkas

Amin Karbasi

CoRR, 2022

Joint Differentiable Optimization and Verification for Certified Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, 2022

Sequential Information Design: Markov Persuasion Process and Its Efficient Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the EC '22: The 23rd ACM Conference on Economics and Computation, Boulder, CO, USA, July 11, 2022

Accelerate online reinforcement learning for building HVAC control with heterogeneous expert guidances.

[BibT_eX]

[DOI]

Proceedings of the 9th ACM International Conference on Systems for Energy-Efficient Buildings, 2022

Relational Reasoning via Set Transformers: Provable Efficiency and Applications to MARL.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

A Unifying Framework of Off-Policy General Value Function Evaluation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Reinforcement Learning with Logarithmic Regret and Policy Switches.

[BibT_eX]

[DOI]

Grigoris Velegkas

Amin Karbasi

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Learn to Match with No Regret: Reinforcement Learning in Markov Matching Markets.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Inducing Equilibria via Incentives: Simultaneous Design-and-Play Ensures Global Convergence.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Exponential Family Model-Based Reinforcement Learning via Score Matching.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Pessimistic Minimax Value Iteration: Provably Efficient Equilibrium Learning from Offline Datasets.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2022

Contrastive UCB: Provably Efficient Contrastive Self-Supervised Learning in Online Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2022

Pessimism meets VCG: Learning Dynamic Mechanism Design via Offline Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2022

Learning from Demonstration: Provably Efficient Adversarial Policy Imitation with Linear Function Approximation.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2022

Welfare Maximization in Competitive Equilibrium: Reinforcement Learning for Markov Exchange Economy.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2022

Provably Efficient Offline Reinforcement Learning for Partially Observable Markov Decision Processes.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2022

Human-in-the-loop: Provably Efficient Preference-based Reinforcement Learning with General Function Approximation.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2022

Adaptive Model Design for Markov Decision Process.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2022

Reinforcement Learning from Partial Observation: Linear Function Approximation with Provable Sample Efficiency.

[BibT_eX]

[DOI]

Qi Cai

Proceedings of the International Conference on Machine Learning, 2022

Reinforcement Learning under a Multi-agent Predictive State Representation Model: Method and Theory.

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Learning Representations, 2022

Towards General Function Approximation in Zero-Sum Markov Games.

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Learning Representations, 2022

Pessimistic Bootstrapping for Uncertainty-Driven Offline Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Learning Representations, 2022

Gap-Dependent Bounds for Two-Player Markov Games.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Artificial Intelligence and Statistics, 2022

2021

Finite-Sample Analysis for Decentralized Batch Multiagent Reinforcement Learning With Networked Agents.

[BibT_eX]

[DOI]

IEEE Trans. Autom. Control., 2021

Decentralized multi-agent reinforcement learning with networked agents: recent advances.

[BibT_eX]

[DOI]

Frontiers Inf. Technol. Electron. Eng., 2021

On Finite-Time Convergence of Actor-Critic Algorithm.

[BibT_eX]

[DOI]

IEEE J. Sel. Areas Inf. Theory, 2021

Efficient and doubly-robust methods for variable selection and parameter estimation in longitudinal data analysis.

[BibT_eX]

[DOI]

Comput. Stat., 2021

Generalized estimating equations for analyzing multivariate survival data.

[BibT_eX]

[DOI]

Commun. Stat. Simul. Comput., 2021

Exponential Family Model-Based Reinforcement Learning via Score Matching.

[BibT_eX]

[DOI]

CoRR, 2021

Can Reinforcement Learning Find Stackelberg-Nash Equilibria in General-Sum Markov Games with Myopic Followers?

[BibT_eX]

[DOI]

CoRR, 2021

ElegantRL-Podracer: Scalable and Elastic Library for Cloud-Native Deep Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, 2021

SCORE: Spurious COrrelation REduction for Offline Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, 2021

Optimistic Policy Optimization is Provably Efficient in Non-stationary MDPs.

[BibT_eX]

[DOI]

CoRR, 2021

Inducing Equilibria via Incentives: Simultaneous Design-and-Play Finds Global Optima.

[BibT_eX]

[DOI]

CoRR, 2021

Provably Efficient Generative Adversarial Imitation Learning for Online and Offline Setting with Linear Function Approximation.

[BibT_eX]

[DOI]

CoRR, 2021

Online Bootstrap Inference For Policy Evaluation in Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, 2021

A Unified Off-Policy Evaluation Approach for General Value Function.

[BibT_eX]

[DOI]

CoRR, 2021

Instrumental Variable Value Iteration for Causal Offline Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, 2021

A Momentum-Assisted Single-Timescale Stochastic Approximation Algorithm for Bilevel Optimization.

[BibT_eX]

[DOI]

CoRR, 2021

Wasserstein Flow Meets Replicator Dynamics: A Mean-Field Analysis of Representation Learning in Actor-Critic.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Offline Constrained Multi-Objective Reinforcement Learning via Pessimistic Dual Value Iteration.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Provably Efficient Causal Reinforcement Learning with Confounded Observational Data.

[BibT_eX]

[DOI]

Lingxiao Wang

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

BooVI: Provably Efficient Bootstrapped Value Iteration.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

A Near-Optimal Algorithm for Stochastic Bilevel Optimization via Double-Momentum.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Exponential Bellman Equation and Improved Regret Bounds for Risk-Sensitive Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Pessimism Meets Invariance: Provably Efficient Offline Mean-Field Multi-Agent RL.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Provably Sample Efficient Reinforcement Learning in Competitive Linear Quadratic Systems.

[BibT_eX]

[DOI]

Proceedings of the 3rd Annual Conference on Learning for Dynamics and Control, 2021

Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality.

[BibT_eX]

[DOI]

Proceedings of the 38th International Conference on Machine Learning, 2021

Learning While Playing in Mean-Field Games: Convergence and Optimality.

[BibT_eX]

[DOI]

Proceedings of the 38th International Conference on Machine Learning, 2021

Global Convergence of Policy Gradient for Linear-Quadratic Mean-Field Control/Game in Continuous Time.

[BibT_eX]

[DOI]

Proceedings of the 38th International Conference on Machine Learning, 2021

Reinforcement Learning for Cost-Aware Markov Decision Processes.

[BibT_eX]

[DOI]

Proceedings of the 38th International Conference on Machine Learning, 2021

On Reward-Free RL with Kernel and Neural Function Approximations: Single-Agent MDP and Markov Game.

[BibT_eX]

[DOI]

Proceedings of the 38th International Conference on Machine Learning, 2021

Provably Efficient Fictitious Play Policy Optimization for Zero-Sum Markov Games with Structured Transitions.

[BibT_eX]

[DOI]

Proceedings of the 38th International Conference on Machine Learning, 2021

Infinite-Dimensional Optimization for Zero-Sum Games via Variational Transport.

[BibT_eX]

[DOI]

Proceedings of the 38th International Conference on Machine Learning, 2021

Is Pessimism Provably Efficient for Offline RL?

[BibT_eX]

[DOI]

Ying Jin

Proceedings of the 38th International Conference on Machine Learning, 2021

Randomized Exploration in Reinforcement Learning with General Value Function Approximation.

[BibT_eX]

[DOI]

Proceedings of the 38th International Conference on Machine Learning, 2021

Decentralized Single-Timescale Actor-Critic on Zero-Sum Two-Player Stochastic Games.

[BibT_eX]

[DOI]

Proceedings of the 38th International Conference on Machine Learning, 2021

Risk-Sensitive Reinforcement Learning with Function Approximation: A Debiasing Approach.

[BibT_eX]

[DOI]

Yingjie Fei

Proceedings of the 38th International Conference on Machine Learning, 2021

Single-Timescale Actor-Critic Provably Finds Globally Optimal Policy.

[BibT_eX]

[DOI]

Zuyue Fu

Proceedings of the 9th International Conference on Learning Representations, 2021

Provably Efficient Actor-Critic for Risk-Sensitive and Robust Adversarial RL: A Linear-Quadratic Case.

[BibT_eX]

[DOI]

Yufeng Zhang

Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, 2021

Sample Elicitation.

[BibT_eX]

[DOI]

Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, 2021

Provably Efficient Safe Exploration via Primal-Dual Policy Optimization.

[BibT_eX]

[DOI]

Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, 2021

2020

A Novel Model Integrating Deep Learning for Land Use/Cover Change Reconstruction: A Case Study of Zhenlai County, Northeast China.

[BibT_eX]

[DOI]

Remote. Sens., 2020

Risk-Sensitive Deep RL: Variance-Constrained Actor-Critic Provably Finds Globally Optimal Policy.

[BibT_eX]

[DOI]

CoRR, 2020

Variational Transport: A Convergent Particle-BasedAlgorithm for Distributional Optimization.

[BibT_eX]

[DOI]

CoRR, 2020

Bridging Exploration and General Function Approximation in Reinforcement Learning: Provably Efficient Kernel and Neural Value Iterations.

[BibT_eX]

[DOI]

CoRR, 2020

Provable Fictitious Play for General Mean-Field Games.

[BibT_eX]

[DOI]

CoRR, 2020

Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth Nonlinear TD Learning.

[BibT_eX]

[DOI]

CoRR, 2020

Understanding Implicit Regularization in Over-Parameterized Nonlinear Statistical Model.

[BibT_eX]

[DOI]

Jianqing Fan

Mengxin Yu

CoRR, 2020

A Two-Timescale Framework for Bilevel Optimization: Complexity Analysis and Application to Actor-Critic.

[BibT_eX]

[DOI]

CoRR, 2020

Provably Efficient Neural Estimation of Structural Equation Model: An Adversarial Approach.

[BibT_eX]

[DOI]

CoRR, 2020

Neural Certificates for Safe Control Policies.

[BibT_eX]

[DOI]

CoRR, 2020

Generative Adversarial Imitation Learning with Neural Networks: Global Optimality and Convergence Rate.

[BibT_eX]

[DOI]

CoRR, 2020

Upper Confidence Primal-Dual Optimization: Stochastically Constrained Markov Decision Processes with Adversarial Losses and Unknown Transitions.

[BibT_eX]

[DOI]

CoRR, 2020

Can Temporal-Difference and Q-Learning Learn Representation? A Mean-Field Theory.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Provably Efficient Reinforcement Learning with Kernel and Neural Function Approximations.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Provably Efficient Neural GTD for Off-Policy Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Upper Confidence Primal-Dual Reinforcement Learning for CMDP with Adversarial Loss.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Provably Efficient Neural Estimation of Structural Equation Models: An Adversarial Approach.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Pontryagin Differentiable Programming: An End-to-End Learning and Control Framework.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Dynamic Regret of Policy Optimization in Non-Stationary Environments.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Risk-Sensitive Reinforcement Learning: Near-Optimal Risk-Sample Tradeoff in Regret.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

A Theoretical Analysis of Deep Q-Learning.

[BibT_eX]

[DOI]

Proceedings of the 2nd Annual Conference on Learning for Dynamics and Control, 2020

Generative Adversarial Imitation Learning with Neural Network Parameterization: Global Optimality and Convergence Rate.

[BibT_eX]

[DOI]

Proceedings of the 37th International Conference on Machine Learning, 2020

Breaking the Curse of Many Agents: Provable Mean Embedding Q-Iteration for Mean-Field Reinforcement Learning.

[BibT_eX]

[DOI]

Lingxiao Wang

Proceedings of the 37th International Conference on Machine Learning, 2020

On the Global Optimality of Model-Agnostic Meta-Learning.

[BibT_eX]

[DOI]

Proceedings of the 37th International Conference on Machine Learning, 2020

Robust One-Bit Recovery via ReLU Generative Networks: Near-Optimal Statistical Rate and Global Landscape Analysis.

[BibT_eX]

[DOI]

Shuang Qiu

Xiaohan Wei

Proceedings of the 37th International Conference on Machine Learning, 2020

Semiparametric Nonlinear Bipartite Graph Representation Learning with Provable Guarantees.

[BibT_eX]

[DOI]

Proceedings of the 37th International Conference on Machine Learning, 2020

Provably Efficient Exploration in Policy Optimization.

[BibT_eX]

[DOI]

Proceedings of the 37th International Conference on Machine Learning, 2020

Neural Policy Gradient Methods: Global Optimality and Rates of Convergence.

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Learning Representations, 2020

Actor-Critic Provably Finds Nash Equilibria of Linear-Quadratic Mean-Field Games.

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Learning Representations, 2020

On Computation and Generalization of Generative Adversarial Imitation Learning.

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Learning Representations, 2020

Learning Zero-Sum Simultaneous-Move Markov Games Using Function Approximation and Correlated Equilibrium.

[BibT_eX]

[DOI]

Proceedings of the Conference on Learning Theory, 2020

Provably efficient reinforcement learning with linear function approximation.

[BibT_eX]

[DOI]

Proceedings of the Conference on Learning Theory, 2020

2019

Misspecified nonconvex statistical optimization for sparse phase retrieval.

[BibT_eX]

[DOI]

Math. Program., 2019

High-dimensional Varying Index Coefficient Models via Stein's Identity.

[BibT_eX]

[DOI]

J. Mach. Learn. Res., 2019

Natural Actor-Critic Converges Globally for Hierarchical Linear Quadratic Regulator.

[BibT_eX]

[DOI]

CoRR, 2019

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms.

[BibT_eX]

[DOI]

CoRR, 2019

Credible Sample Elicitation by Deep Learning, for Deep Learning.

[BibT_eX]

[DOI]

CoRR, 2019

Robust One-Bit Recovery via ReLU Generative Networks: Improved Statistical Rates and Global Landscape Analysis.

[BibT_eX]

[DOI]

Shuang Qiu

Xiaohan Wei

CoRR, 2019

Fast Multi-Agent Temporal-Difference Learning via Homotopy Stochastic Primal-Dual Optimization.

[BibT_eX]

[DOI]

CoRR, 2019

On the Global Convergence of Actor-Critic: A Case for Linear Quadratic Regulator with Ergodic Cost.

[BibT_eX]

[DOI]

CoRR, 2019

Stochastic Convergence Results for Regularized Actor-Critic Methods.

[BibT_eX]

[DOI]

CoRR, 2019

Neural Proximal/Trust Region Policy Optimization Attains Globally Optimal Policy.

[BibT_eX]

[DOI]

CoRR, 2019

A Multi-Agent Off-Policy Actor-Critic Algorithm for Distributed Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, 2019

A Theoretical Analysis of Deep Q-Learning.

[BibT_eX]

[DOI]

Yuchen Xie

CoRR, 2019

Surface Charge Transport Characteristics of ZnO/Silicone Rubber Composites Under Impulse Superimposed on DC Voltage.

[BibT_eX]

[DOI]

Zhonglei Li

Boxue Du

IEEE Access, 2019

Policy Optimization Provably Converges to Nash Equilibria in Zero-Sum Linear Quadratic Games.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Convergent Policy Optimization for Safe Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Provably Global Convergence of Actor-Critic: A Case for Linear Quadratic Regulator with Ergodic Cost.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Statistical-Computational Tradeoff in Single Index Models.

[BibT_eX]

[DOI]

Lingxiao Wang

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Variance Reduced Policy Evaluation with Smooth Function Approximation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Neural Trust Region/Proximal Policy Optimization Attains Globally Optimal Policy.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Neural Temporal-Difference Learning Converges to Global Optima.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

On the statistical rate of nonlinear recovery in generative models with heavy-tailed data.

[BibT_eX]

[DOI]

Xiaohan Wei

Proceedings of the 36th International Conference on Machine Learning, 2019

Research Character Analyzation of Urban Security Based on Urban Resilience Using Big Data Method.

[BibT_eX]

[DOI]

Proceedings of the Big Data and Security - First International Conference, 2019

Learning Partially Observable Markov Decision Processes Using Coupled Canonical Polyadic Decomposition.

[BibT_eX]

[DOI]

Proceedings of the IEEE Data Science Workshop, 2019

A Communication-Efficient Multi-Agent Actor-Critic Algorithm for Distributed Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the 58th IEEE Conference on Decision and Control, 2019

Design of Single Channel Speech Separation System Based on Deep Clustering Model.

[BibT_eX]

[DOI]

Proceedings of the 18th IEEE/ACIS International Conference on Computer and Information Science, 2019

2018

On Semiparametric Exponential Family Graphical Models.

[BibT_eX]

[DOI]

Yang Ning

J. Mach. Learn. Res., 2018

Finite-Sample Analyses for Fully Decentralized Multi-Agent Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, 2018

Parametrized Deep Q-Networks Learning: Reinforcement Learning with Discrete-Continuous Hybrid Action Space.

[BibT_eX]

[DOI]

CoRR, 2018

Curse of Heterogeneity: Computational Barriers in Sparse Mixture Models and Phase Retrieval.

[BibT_eX]

[DOI]

CoRR, 2018

Provable Gaussian Embedding with One Observation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Multi-Agent Reinforcement Learning via Double Averaging Primal-Dual Optimization.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Contrastive Learning from Pairwise Measurements.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Fully Decentralized Multi-Agent Reinforcement Learning with Networked Agents.

[BibT_eX]

[DOI]

Proceedings of the 35th International Conference on Machine Learning, 2018

Networked Multi-Agent Reinforcement Learning in Continuous Spaces.

[BibT_eX]

[DOI]

Proceedings of the 57th IEEE Conference on Decision and Control, 2018

A Finite Sample Analysis of the Actor-Critic Algorithm.

[BibT_eX]

[DOI]

Proceedings of the 57th IEEE Conference on Decision and Control, 2018

Nonlinear Structured Signal Estimation in High Dimensions via Iterative Hard Thresholding.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Artificial Intelligence and Statistics, 2018

2017

Misspecified Nonconvex Statistical Optimization for Phase Retrieval.

[BibT_eX]

[DOI]

CoRR, 2017

Estimating High-dimensional Non-Gaussian Multiple Index Models via Stein's Lemma.

[BibT_eX]

[DOI]

Krishnakumar Balasubramanian

Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

High-dimensional Non-Gaussian Single Index Models via Thresholded Score Function Estimation.

[BibT_eX]

[DOI]

Krishnakumar Balasubramanian

Proceedings of the 34th International Conference on Machine Learning, 2017

2016

More Supervision, Less Computation: Statistical-Computational Tradeoffs in Weakly Supervised Learning.

[BibT_eX]

[DOI]

Xinyang Yi

Constantine Caramanis