Mengdi Wang

Orcid: 0000-0002-2101-9507

According to our database1, Mengdi Wang authored at least 162 papers between 2014 and 2025.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
Physics Supernova: AI Agent Matches Elite Gold Medalists at IPhO 2025.
CoRR, September, 2025

A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence.
CoRR, July, 2025

AgentDistill: Training-Free Agent Distillation with Generalizable MCP Boxes.
CoRR, June, 2025

Does Thinking More always Help? Understanding Test-Time Scaling in Reasoning Models.
CoRR, June, 2025

Toward a Theory of Agents as Tool-Use Decision-Makers.
CoRR, June, 2025

Alita: Generalist Agent Enabling Scalable Agentic Reasoning with Minimal Predefinition and Maximal Self-Evolution.
CoRR, May, 2025

On Path to Multimodal Historical Reasoning: HistBench and HistAgent.
CoRR, May, 2025

OTC: Optimal Tool Calls via Reinforcement Learning.
CoRR, April, 2025

NoWag: A Unified Framework for Shape Preserving Compression of Large Language Models.
CoRR, April, 2025

MATH-Perturb: Benchmarking LLMs' Math Reasoning Abilities against Hard Perturbations.
CoRR, February, 2025

Deep Reinforcement Learning for Efficient and Fair Allocation of Healthcare Resources.
Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence, 2025

Collab: Controlled Decoding using Mixture of Agents for LLM Alignment.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

A Common Pitfall of Margin-based Language Model Alignment: Gradient Entanglement.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Immune: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024
Efficient Reinforcement Learning With Impaired Observability: Learning to Act With Delayed and Missing State Observations.
IEEE Trans. Inf. Theory, October, 2024

Redefining the Game: MVAE-DFDPnet's Low-Dimensional Embeddings for Superior Drug-Protein Interaction Predictions.
IEEE J. Biomed. Health Informatics, July, 2024

Teamwork Reinforcement Learning With Concave Utilities.
IEEE Trans. Mob. Comput., May, 2024

Boosting the Convergence of Reinforcement Learning-Based Auto-Pruning Using Historical Data.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., February, 2024

Adversarial Attacks on Online Learning to Rank with Stochastic Click Models.
Trans. Mach. Learn. Res., 2024

Author Correction: A 5′ UTR language model for decoding untranslated regions of mRNA and function predictions.
Nat. Mac. Intell., 2024

A 5′ UTR language model for decoding untranslated regions of mRNA and function predictions.
Nat. Mac. Intell., 2024

On the Sample Complexity and Metastability of Heavy-tailed Policy Search in Continuous Control.
J. Mach. Learn. Res., 2024

LIAR: Leveraging Alignment (Best-of-N) to Jailbreak LLMs in Seconds.
CoRR, 2024

AIME: AI System Optimization via Multiple LLM Evaluators.
CoRR, 2024

Relative-Translation Invariant Wasserstein Distance.
CoRR, 2024

SAIL: Self-Improving Efficient Online Alignment of Large Language Models.
CoRR, 2024

AI Risk Management Should Incorporate Both Safety and Security.
CoRR, 2024

Diffusion Model for Data-Driven Black-Box Optimization.
CoRR, 2024

Unveil Conditional Diffusion Models with Classifier-free Guidance: A Sharp Statistical Theory.
CoRR, 2024

Regularized DeepIV with Model Selection.
CoRR, 2024

MaxMin-RLHF: Towards Equitable Alignment of Large Language Models with Diverse Human Preferences.
CoRR, 2024

Deep reinforcement learning identifies personalized intermittent androgen deprivation therapy for prostate cancer.
Briefings Bioinform., 2024

Nonparametric Classification on Low Dimensional Manifolds using Overparameterized Convolutional Residual Networks.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Fast Best-of-N Decoding via Speculative Rejection.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Offline Multitask Representation Learning for Reinforcement Learning.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Gradient Guidance for Diffusion Models: An Optimization Perspective.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Transfer Q-star : Principled Decoding for LLM Alignment.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

A Theoretical Perspective for Speculative Decoding Algorithm.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Conversational Dueling Bandits in Generalized Linear Models.
Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2024

Is Inverse Reinforcement Learning Harder than Standard Reinforcement Learning? A Theoretical Perspective.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Information-Directed Pessimism for Offline Reinforcement Learning.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

MaxMin-RLHF: Alignment with Diverse Human Preferences.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Sample-Efficient Learning of POMDPs with Multiple Observations In Hindsight.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

PARL: A Unified Framework for Policy Alignment in Reinforcement Learning from Human Feedback.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Tree Search-Based Evolutionary Bandits for Protein Sequence Optimization.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

Visual Adversarial Examples Jailbreak Aligned Large Language Models.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Primal-Dual First-Order Methods for Affinely Constrained Multi-block Saddle Point Problems.
SIAM J. Optim., June, 2023

1xN Pattern for Pruning Convolutional Neural Networks.
IEEE Trans. Pattern Anal. Mach. Intell., April, 2023

Learning Good State and Action Representations for Markov Decision Process via Tensor Decomposition.
J. Mach. Learn. Res., 2023

Double Duality: Variational Primal-Dual Policy Optimization for Constrained Reinforcement Learning.
J. Mach. Learn. Res., 2023

Is Inverse Reinforcement Learning Harder than Standard Reinforcement Learning?
CoRR, 2023

Sample Complexity of Preference-Based Nonparametric Off-Policy Evaluation with Deep Networks.
CoRR, 2023

Federated Multi-Level Optimization over Decentralized Networks.
CoRR, 2023

Sample Complexity of Neural Policy Mirror Descent for Policy Optimization on Low-Dimensional Manifolds.
CoRR, 2023

Deep Reinforcement Learning for Efficient and Fair Allocation of Health Care Resources.
CoRR, 2023

Aligning Agent Policy with Externalities: Reward Design via Bilevel RL.
CoRR, 2023

Provable Benefits of Policy Learning from Human Preferences in Contextual Bandit Problems.
CoRR, 2023

Scaling In-Context Demonstrations with Structured Attention.
CoRR, 2023

Nonparametric Classification on Low Dimensional Manifolds using Overparameterized Convolutional Residual Networks.
CoRR, 2023

Visual Adversarial Examples Jailbreak Large Language Models.
CoRR, 2023

Score Approximation, Estimation and Distribution Recovery of Diffusion Models on Low-Dimensional Data.
CoRR, 2023

Unified Off-Policy Learning to Rank: a Reinforcement Learning Perspective.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Reward-Directed Conditional Diffusion: Provable Distribution Estimation and Reward Improvement.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Posterior Sampling with Delayed Feedback for Reinforcement Learning with Linear Function Approximation.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Efficient RL with Impaired Observability: Learning to Act with Delayed and Missing State Observations.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

STEERING : Stein Information Directed Exploration for Model-Based Reinforcement Learning.
Proceedings of the International Conference on Machine Learning, 2023

Deep Reinforcement Learning for Cost-Effective Medical Diagnosis.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Offline Reinforcement Learning with Differentiable Function Approximation is Provably Efficient.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Representation Learning for Low-rank General-sum Markov Games.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Provable Benefits of Representational Transfer in Reinforcement Learning.
Proceedings of the Thirty Sixth Annual Conference on Learning Theory, 2023

Byzantine-Robust Online and Offline Distributed Reinforcement Learning.
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2023

2022
Energy system digitization in the era of AI: A three-layered approach toward carbon neutrality.
Patterns, 2022

Learning Markov Models Via Low-Rank Optimization.
Oper. Res., 2022

Near Sample-Optimal Reduction-based Policy Learning for Average Reward MDP.
CoRR, 2022

Energy System Digitization in the Era of AI: A Three-Layered Approach towards Carbon Neutrality.
CoRR, 2022

Representation Learning for General-sum Low-rank Markov Games.
CoRR, 2022

Provably Efficient Reinforcement Learning for Online Adaptive Influence Maximization.
CoRR, 2022

Communication Efficient Distributed Learning for Kernelized Contextual Bandits.
CoRR, 2022

Sample Complexity of Nonparametric Off-Policy Evaluation on Low-Dimensional Manifolds using Deep Networks.
CoRR, 2022

Parameter-Efficient Sparsity for Large Language Models Fine-Tuning.
CoRR, 2022

Offline stochastic shortest path: Learning, evaluation and towards optimality.
Proceedings of the Uncertainty in Artificial Intelligence, 2022

Bandit Theory and Thompson Sampling-Guided Directed Evolution for Sequence Optimization.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Efficient Reinforcement Learning in Block MDPs: A Model-free Representation Learning approach.
Proceedings of the International Conference on Machine Learning, 2022

Near-optimal Offline Reinforcement Learning with Linear Representation: Leveraging Variance Information with Pessimism.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Neural Bandits for Protein Sequence Optimization.
Proceedings of the 56th Annual Conference on Information Sciences and Systems, 2022

Multi-Agent Reinforcement Learning with General Utilities via Decentralized Shadow Reward Actor-Critic.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
Cautious Reinforcement Learning via Distributional Risk in the Dual Domain.
IEEE J. Sel. Areas Inf. Theory, 2021

Voting-Based Multiagent Reinforcement Learning for Intelligent IoT.
IEEE Internet Things J., 2021

You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Nature Gradient.
CoRR, 2021

MARL with General Utilities via Decentralized Shadow Reward Actor-Critic.
CoRR, 2021

1×N Block Pattern for Network Sparsity.
CoRR, 2021

Bootstrapping Statistical Inference for Off-Policy Evaluation.
CoRR, 2021

On the Convergence and Sample Efficiency of Variance-Reduced Policy Gradient Method.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Learning Good State and Action Representations via Tensor Decomposition.
Proceedings of the IEEE International Symposium on Information Theory, 2021

Bootstrapping Fitted Q-Evaluation for Off-Policy Inference.
Proceedings of the 38th International Conference on Machine Learning, 2021

Sparse Feature Selection Makes Batch Reinforcement Learning More Sample Efficient.
Proceedings of the 38th International Conference on Machine Learning, 2021

Towards Compact CNNs via Collaborative Compression.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Intermittent Communications in Decentralized Shadow Reward Actor-Critic.
Proceedings of the 2021 60th IEEE Conference on Decision and Control (CDC), 2021

Beyond Cumulative Returns via Reinforcement Learning over State-Action Occupancy Measures.
Proceedings of the 2021 American Control Conference, 2021

Generalization Bounds for Stochastic Saddle Point Problems.
Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, 2021

Online Sparse Reinforcement Learning.
Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, 2021

2020
Spectral State Compression of Markov Processes.
IEEE Trans. Inf. Theory, 2020

Adaptive Low-Nonnegative-Rank Approximation for State Aggregation of Markov Chains.
SIAM J. Matrix Anal. Appl., 2020

A Single Timescale Stochastic Approximation Method for Nested Stochastic Optimization.
SIAM J. Optim., 2020

Randomized Linear Programming Solves the Markov Decision Problem in Nearly Linear (Sometimes Sublinear) Time.
Math. Oper. Res., 2020

Bridging Exploration and General Function Approximation in Reinforcement Learning: Provably Efficient Kernel and Neural Value Iterations.
CoRR, 2020

Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation.
CoRR, 2020

Variational Policy Gradient Method for Reinforcement Learning with General Utilities.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Provably Efficient Reinforcement Learning with Kernel and Neural Function Approximations.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Generalized Leverage Score Sampling for Neural Networks.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Fast Training of Deep Learning Models over Multiple GPUs.
Proceedings of the Middleware '20: 21st International Middleware Conference, 2020

Model-Based Reinforcement Learning with Value-Targeted Regression.
Proceedings of the 2nd Annual Conference on Learning for Dynamics and Control, 2020

Reinforcement Learning in Feature Space: Matrix Bandit, Kernels, and Regret Bound.
Proceedings of the 37th International Conference on Machine Learning, 2020

Model-Based Reinforcement Learning with Value-Targeted Regression.
Proceedings of the 37th International Conference on Machine Learning, 2020

A History-Based Auto-Tuning Framework for Fast and High-Performance DNN Design on GPU.
Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020

Improved Sample Complexity for Stochastic Compositional Variance Reduced Gradient.
Proceedings of the 2020 American Control Conference, 2020

Solving Discounted Stochastic Two-Player Games with Near-Optimal Time and Sample Complexity.
Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, 2020

Sketching Transformed Matrices with Applications to Natural Language Processing.
Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, 2020

2019
Multilevel Stochastic Gradient Methods for Nested Composition Optimization.
SIAM J. Optim., 2019

Blessing of massive scale: spatial graphical model estimation with a total cardinality constraint approach.
Math. Program., 2019

Picasso: A Sparse Learning Library for High Dimensional Data Analysis in R and Python.
J. Mach. Learn. Res., 2019

Approximation Hardness for A Class of Sparse Optimization Problems.
J. Mach. Learn. Res., 2019

Continuous Control with Contexts, Provably.
CoRR, 2019

Voting-Based Multi-Agent Reinforcement Learning.
CoRR, 2019

Feature-Based Q-Learning for Two-Player Stochastic Games.
CoRR, 2019

Reinforcement Leaning in Feature Space: Matrix Bandit, Kernels, and Regret Bound.
CoRR, 2019

Sample-Optimal Parametric Q-Learning with Linear Transition Models.
CoRR, 2019

Online Factorization and Partition of Complex Networks by Random Walk.
Proceedings of the Thirty-Fifth Conference on Uncertainty in Artificial Intelligence, 2019

Learning low-dimensional state embeddings and metastable clusters from time series data.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Maximum Likelihood Tensor Decomposition of Markov Decision Process.
Proceedings of the IEEE International Symposium on Information Theory, 2019

Characterizing Deep Learning Training Workloads on Alibaba-PAI.
Proceedings of the IEEE International Symposium on Workload Characterization, 2019

Sample-Optimal Parametric Q-Learning Using Linearly Additive Features.
Proceedings of the 36th International Conference on Machine Learning, 2019

Learning to Control in Metric Space with Optimal Regret.
Proceedings of the 57th Annual Allerton Conference on Communication, 2019

2018
Near-optimal stochastic approximation for online principal component estimation.
Math. Program., 2018

Graph-Adaptive Pruning for Efficient Inference of Convolutional Neural Networks.
CoRR, 2018

State Aggregation Learning from Markov Transition Data.
CoRR, 2018

Diffusion Approximations for Online Principal Component Estimation and Global Convergence.
CoRR, 2018

Improved Oracle Complexity for Stochastic Compositional Variance Reduced Gradient.
CoRR, 2018

State Compression of Markov Processes via Empirical Low-Rank Estimation.
CoRR, 2018

Variance Reduced Value Iteration and Faster Algorithms for Solving Markov Decision Processes.
Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, 2018

Near-Optimal Time and Sample Complexities for Solving Markov Decision Processes with a Generative Model.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Dimensionality Reduction for Stationary Time Series via Stochastic Nonconvex Optimization.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Estimation of Markov Chain via Rank-constrained Likelihood.
Proceedings of the 35th International Conference on Machine Learning, 2018

Efficient Deep Learning Inference Based on Model Compression.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2018

Minimax-Optimal Privacy-Preserving Sparse PCA in Distributed Systems.
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2018

2017
Vanishing Price of Decentralization in Large Coordinative Nonconvex Optimization.
SIAM J. Optim., 2017

Stochastic compositional gradient descent: algorithms for minimizing compositions of expected-value functions.
Math. Program., 2017

Dynamic Factorization and Partition of Complex Networks.
CoRR, 2017

Diffusion Approximations for Online Principal Component Estimation and Global Convergence.
Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

Strong NP-Hardness for Sparse Optimization with Concave Penalty Functions.
Proceedings of the 34th International Conference on Machine Learning, 2017

Finite-sum Composition Optimization via Variance Reduced Gradient Descent.
Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, 2017

2016
Stochastic First-Order Methods with Random Constraint Projection.
SIAM J. Optim., 2016

A stochastic compositional gradient method using Markov samples.
Proceedings of the Winter Simulation Conference, 2016

An online primal-dual method for discounted Markov decision processes.
Proceedings of the 55th IEEE Conference on Decision and Control, 2016

2015
Incremental constraint projection methods for variational inequalities.
Math. Program., 2015

A Distributed Tracking Algorithm for Reconstruction of Graph Signals.
IEEE J. Sel. Top. Signal Process., 2015

Random Multi-Constraint Projection: Stochastic Gradient Methods for Convex Optimization with Many Constraints.
CoRR, 2015

Averaging random projection: A fast online solution for large-scale constrained stochastic optimization.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

2014
Stabilization of Stochastic Iterative Methods for Singular and Nearly Singular Linear Systems.
Math. Oper. Res., 2014

Multi-task nonconvex optimization with total budget constraint: A distributed algorithm using Monte Carlo estimates.
Proceedings of the 19th International Conference on Digital Signal Processing, 2014

Learning distributed jointly sparse systems by collaborative LMS.
Proceedings of the IEEE International Conference on Acoustics, 2014


  Loading...