Jason D. Lee

Orcid: 0000-0003-0064-7800

Affiliations:
  • Stanford University, Institute of Computational and Mathematical Engineering


According to our database1, Jason D. Lee authored at least 163 papers between 2007 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Horizon-Free Regret for Linear Markov Decision Processes.
CoRR, 2024

Computational-Statistical Gaps in Gaussian Single-Index Models.
CoRR, 2024

How Well Can Transformers Emulate In-context Newton's Method?
CoRR, 2024

How Transformers Learn Causal Structure with Gradient Descent.
CoRR, 2024

LoRA Training in the NTK Regime has No Spurious Local Minima.
CoRR, 2024

Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark.
CoRR, 2024

BitDelta: Your Fine-Tune May Only Be Worth One Bit.
CoRR, 2024

An Information-Theoretic Analysis of In-Context Learning.
CoRR, 2024

Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads.
CoRR, 2024

2023
Policy Mirror Descent for Regularized Reinforcement Learning: A Generalized Framework with Linear Convergence.
SIAM J. Optim., June, 2023

Towards Optimal Statistical Watermarking.
CoRR, 2023

Optimal Multi-Distribution Learning.
CoRR, 2023

Dichotomy of Early and Late Phase Implicit Biases Can Provably Induce Grokking.
CoRR, 2023

Learning Hierarchical Polynomials with Three-Layer Neural Networks.
CoRR, 2023

Provably Efficient CVaR RL in Low-rank MDPs.
CoRR, 2023

REST: Retrieval-Based Speculative Decoding.
CoRR, 2023

Settling the Sample Complexity of Online Reinforcement Learning.
CoRR, 2023

Teaching Arithmetic to Small Transformers.
CoRR, 2023

Scaling In-Context Demonstrations with Structured Attention.
CoRR, 2023

Solving Robust MDPs through No-Regret Dynamics.
CoRR, 2023

How to Query Human Feedback Efficiently in RL?
CoRR, 2023

Reward Collapse in Aligning Large Language Models.
CoRR, 2023

Provable Offline Reinforcement Learning with Human Feedback.
CoRR, 2023

Refined Value-Based Offline RL under Realizability and Partial Coverage.
CoRR, 2023

Sample Complexity for Quadratic Bandits: Hessian Dependent Bounds and Optimal Algorithms.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Implicit Bias of Gradient Descent for Logistic Regression at the Edge of Stability.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Offline Minimax Soft-Q-learning Under Realizability and Partial Coverage.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Provable Guarantees for Nonlinear Feature Learning in Three-Layer Neural Networks.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Fine-Tuning Language Models with Just Forward Passes.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Smoothing the Landscape Boosts the Signal for SGD: Optimal Sample Complexity for Learning Single Index Models.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Reward-agnostic Fine-tuning: Provable Statistical Benefits of Hybrid Reinforcement Learning.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Regret Guarantees for Online Deep Control.
Proceedings of the Learning for Dynamics and Control Conference, 2023

Computationally Efficient PAC RL in POMDPs with Latent Determinism and Conditional Embeddings.
Proceedings of the International Conference on Machine Learning, 2023

Understanding Incremental Learning of Gradient Descent: A Fine-grained Analysis of Matrix Sensing.
Proceedings of the International Conference on Machine Learning, 2023

Looped Transformers as Programmable Computers.
Proceedings of the International Conference on Machine Learning, 2023

Efficient displacement convex optimization with particle gradient descent.
Proceedings of the International Conference on Machine Learning, 2023

Local Optimization Achieves Global Optimality in Multi-Agent Reinforcement Learning.
Proceedings of the International Conference on Machine Learning, 2023

PAC Reinforcement Learning for Predictive State Representations.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Decentralized Optimistic Hyperpolicy Mirror Descent: Provably No-Regret Learning in Markov Games.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Can We Find Nash Equilibria at a Linear Rate in Markov Games?
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Self-Stabilization: The Implicit Bias of Gradient Descent at the Edge of Stability.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Provably Efficient Reinforcement Learning via Surprise Bound.
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2023

Optimal Sample Complexity Bounds for Non-convex Optimization under Kurdyka-Lojasiewicz Condition.
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2023

Provable Hierarchy-Based Meta-Reinforcement Learning.
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2023

2022
Reconstructing Training Data from Model Gradient, Provably.
CoRR, 2022

Neural Networks can Learn Representations with Gradient Descent.
CoRR, 2022

Nearly Minimax Algorithms for Linear Bandits with Shared Representation.
CoRR, 2022

Provably Efficient Reinforcement Learning in Partially Observable Dynamical Systems.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

On the Effective Number of Linear Regions in Shallow Univariate ReLU Networks: Convergence Guarantees and Implicit Bias.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

From Gradient Flow on Population Loss to Learning with Stochastic Gradient Descent.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Identifying good directions to escape the NTK regime and efficiently learn low-degree plus sparse polynomials.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Implicit Bias of Gradient Descent on Reparametrized Models: On Equivalence to Mirror Descent.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Towards General Function Approximation in Zero-Sum Markov Games.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Competitive Multi-Agent Reinforcement Learning with Self-Supervised Representation.
Proceedings of the IEEE International Conference on Acoustics, 2022

Offline Reinforcement Learning with Realizability and Single-policy Concentrability.
Proceedings of the Conference on Learning Theory, 2-5 July 2022, London, UK., 2022

Optimization-Based Separations for Neural Networks.
Proceedings of the Conference on Learning Theory, 2-5 July 2022, London, UK., 2022

Neural Networks can Learn Representations with Gradient Descent.
Proceedings of the Conference on Learning Theory, 2-5 July 2022, London, UK., 2022

Provably Efficient Policy Optimization for Two-Player Zero-Sum Markov Games.
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2022

2021
Linearized ADMM Converges to Second-Order Stationary Points for Non-Convex Problems.
IEEE Trans. Signal Process., 2021

On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift.
J. Mach. Learn. Res., 2021

Provable Regret Bounds for Deep Online Learning and Control.
CoRR, 2021

A Short Note on the Relationship of Information Gain and Eluder Dimension.
CoRR, 2021

MUSBO: Model-based Uncertainty Regularized and Sample Efficient Batch Optimization for Deployment Constrained Reinforcement Learning.
CoRR, 2021

Provably Efficient Policy Gradient Methods for Two-Player Zero-Sum Markov Games.
CoRR, 2021

Predicting What You Already Know Helps: Provable Self-Supervised Learning.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Optimal Gradient-based Algorithms for Non-concave Bandit Optimization.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Going Beyond Linear RL: Sample Efficient Neural Function Approximation.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Label Noise SGD Provably Prefers Flat Global Minimizers.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

How Fine-Tuning Allows for Effective Meta-Learning.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Near-Optimal Linear Regression under Distribution Shift.
Proceedings of the 38th International Conference on Machine Learning, 2021

Bilinear Classes: A Structural Framework for Provable Generalization in RL.
Proceedings of the 38th International Conference on Machine Learning, 2021

A Theory of Label Propagation for Subpopulation Shift.
Proceedings of the 38th International Conference on Machine Learning, 2021

How Important is the Train-Validation Split in Meta-Learning?
Proceedings of the 38th International Conference on Machine Learning, 2021

Impact of Representation Learning in Linear Bandits.
Proceedings of the 9th International Conference on Learning Representations, 2021

Few-Shot Learning via Learning the Representation, Provably.
Proceedings of the 9th International Conference on Learning Representations, 2021

Shape Matters: Understanding the Implicit Bias of the Noise Covariance.
Proceedings of the Conference on Learning Theory, 2021

Modeling from Features: a Mean-field Framework for Over-parameterized Deep Neural Networks.
Proceedings of the Conference on Learning Theory, 2021

2020
Stochastic Subgradient Method Converges on Tame Functions.
Found. Comput. Math., 2020

Provable Benefits of Representation Learning in Linear Bandits.
CoRR, 2020

Distributed Estimation for Principal Component Analysis: a Gap-free Approach.
CoRR, 2020

Steepest Descent Neural Architecture Optimization: Escaping Local Optimum with Signed Neural Splitting.
CoRR, 2020

Agnostic Q-learning with Function Approximation in Deterministic Systems: Tight Bounds on Approximation Error and Sample Complexity.
CoRR, 2020

Beyond Lazy Training for Over-parameterized Tensor Decomposition.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Sanity-Checking Pruning Methods: Random Tickets can Win the Jackpot.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Implicit Bias in Deep Linear Classification: Initialization Scale vs Training Accuracy.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Generalized Leverage Score Sampling for Neural Networks.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Convergence of Meta-Learning with Task-Specific Adaptation over Partial Parameters.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

How to Characterize The Landscape of Overparameterized Convolutional Neural Networks.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Agnostic $Q$-learning with Function Approximation in Deterministic Systems: Near-Optimal Bounds on Approximation Error and Sample Complexity.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Towards Understanding Hierarchical Learning: Benefits of Neural Representations.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Optimal transport mapping via input convex neural networks.
Proceedings of the 37th International Conference on Machine Learning, 2020

SGD Learns One-Layer Networks in WGANs.
Proceedings of the 37th International Conference on Machine Learning, 2020

Beyond Linearization: On Quadratic and Higher-Order Approximation of Wide Neural Networks.
Proceedings of the 8th International Conference on Learning Representations, 2020

Kernel and Rich Regimes in Overparametrized Models.
Proceedings of the Conference on Learning Theory, 2020

Optimality and Approximation with Policy Gradient Methods in Markov Decision Processes.
Proceedings of the Conference on Learning Theory, 2020

2019
Theoretical Insights Into the Optimization Landscape of Over-Parameterized Shallow Neural Networks.
IEEE Trans. Inf. Theory, 2019

First-order methods almost always avoid strict saddle points.
Math. Program., 2019

When Does Non-Orthogonal Tensor Decomposition Have No Spurious Local Minima?
CoRR, 2019

Incremental Methods for Weakly Convex Optimization.
CoRR, 2019

Convergence of Adversarial Training in Overparametrized Networks.
CoRR, 2019

Regularization Matters: Generalization and Optimization of Neural Nets v.s. their Induced Kernel.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Solving a Class of Non-Convex Min-Max Games Using Iterative First Order Methods.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Convergence of Adversarial Training in Overparametrized Neural Networks.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Neural Temporal-Difference Learning Converges to Global Optima.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Lexicographic and Depth-Sensitive Margins in Homogeneous and Non-Homogeneous Deep Models.
Proceedings of the 36th International Conference on Machine Learning, 2019

Gradient Descent Finds Global Minima of Deep Neural Networks.
Proceedings of the 36th International Conference on Machine Learning, 2019

Convergence of Gradient Descent on Separable Data.
Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, 2019

2018
Solving Non-Convex Non-Concave Min-Max Games Under Polyak-Łojasiewicz Condition.
CoRR, 2018

On the Margin Theory of Feedforward Neural Networks.
CoRR, 2018

Provably Correct Automatic Subdifferentiation for Qualified Programs.
CoRR, 2018

Convergence of Gradient Descent on Separable Data.
CoRR, 2018

Gradient Primal-Dual Algorithm Converges to Second-Order Stationary Solutions for Nonconvex Distributed Optimization.
CoRR, 2018

Solving Approximate Wasserstein GANs to Stationarity.
CoRR, 2018

On the Convergence and Robustness of Training GANs with Regularized Optimal Transport.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Adding One Neuron Can Eliminate All Bad Local Minima.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Provably Correct Automatic Sub-Differentiation for Qualified Programs.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Implicit Bias of Gradient Descent on Linear Convolutional Networks.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Algorithmic Regularization in Learning Deep Homogeneous Models: Layers are Automatically Balanced.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Gradient Primal-Dual Algorithm Converges to Second-Order Stationary Solution for Nonconvex Distributed Optimization Over Networks.
Proceedings of the 35th International Conference on Machine Learning, 2018

Characterizing Implicit Bias in Terms of Optimization Geometry.
Proceedings of the 35th International Conference on Machine Learning, 2018

Gradient Descent Learns One-hidden-layer CNN: Don't be Afraid of Spurious Local Minima.
Proceedings of the 35th International Conference on Machine Learning, 2018

On the Power of Over-parametrization in Neural Networks with Quadratic Activation.
Proceedings of the 35th International Conference on Machine Learning, 2018

No Spurious Local Minima in a Two Hidden Unit ReLU Network.
Proceedings of the 6th International Conference on Learning Representations, 2018

When is a Convolutional Filter Easy to Learn?
Proceedings of the 6th International Conference on Learning Representations, 2018

Learning One-hidden-layer Neural Networks with Landscape Design.
Proceedings of the 6th International Conference on Learning Representations, 2018

2017
Communication-efficient Sparse Regression.
J. Mach. Learn. Res., 2017

Distributed Stochastic Variance Reduced Gradient Methods by Sampling Extra Data with Replacement.
J. Mach. Learn. Res., 2017

First-order Methods Almost Always Avoid Saddle Points.
CoRR, 2017

An inexact subsampled proximal Newton-type method for large-scale machine learning.
CoRR, 2017

A Flexible Framework for Hypothesis Testing in High-dimensions.
CoRR, 2017

Gradient Descent Can Take Exponential Time to Escape Saddle Points.
Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

On the Learnability of Fully-Connected Neural Networks.
Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, 2017

Sketching Meets Random Projection in the Dual: A Provable Recovery Algorithm for Big and High-dimensional Data.
Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, 2017

Black-box Importance Sampling.
Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, 2017

2016
Gradient Descent Converges to Minimizers.
CoRR, 2016

Communication-efficient distributed statistical learning.
CoRR, 2016

Matrix Completion has No Spurious Local Minimum.
Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

L1-regularized Neural Networks are Improperly Learnable in Polynomial Time.
Proceedings of the 33nd International Conference on Machine Learning, 2016

A Kernelized Stein Discrepancy for Goodness-of-fit Tests.
Proceedings of the 33nd International Conference on Machine Learning, 2016

Gradient Descent Only Converges to Minimizers.
Proceedings of the 29th Conference on Learning Theory, 2016

2015
Matrix completion and low-rank SVD via fast alternating least squares.
J. Mach. Learn. Res., 2015

Learning Halfspaces and Neural Networks with Random Initialization.
CoRR, 2015

ℓ<sub>1</sub>-regularized Neural Networks are Improperly Learnable in Polynomial Time.
CoRR, 2015

Communication-efficient sparse regression: a one-shot approach.
CoRR, 2015

Distributed Stochastic Variance Reduced Gradient Methods.
CoRR, 2015

Selective Inference and Learning Mixed Graphical Models.
CoRR, 2015

Evaluating the statistical significance of biclusters.
Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015

2014
Proximal Newton-Type Methods for Minimizing Composite Functions.
SIAM J. Optim., 2014

Exact Post Model Selection Inference for Marginal Screening.
Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, 2014

Scalable Methods for Nonnegative Matrix Factorizations of Near-separable Tall-and-skinny Matrices.
Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, 2014

2013
On model selection consistency of regularized M-estimators.
CoRR, 2013

On model selection consistency of penalized M-estimators: a geometric theory.
Proceedings of the Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013

Using multiple samples to learn mixture models.
Proceedings of the Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013

Structure Learning of Mixed Graphical Models.
Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics, 2013

2012
Proximal Newton-type Methods for Minimizing Convex Objective Functions in Composite Form
CoRR, 2012

Learning Mixed Graphical Models
CoRR, 2012

Proximal Newton-type methods for convex optimization.
Proceedings of the Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3-6, 2012

2010
Practical Large-Scale Optimization for Max-norm Regularization.
Proceedings of the Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010. Proceedings of a meeting held 6-9 December 2010, 2010

2009
A distributed concurrent on-line test scheduling protocol for many-core NoC-based systems.
Proceedings of the 27th International Conference on Computer Design, 2009

2008
An On-Demand Test Triggering Mechanism for NoC-Based Safety-Critical Systems.
Proceedings of the 9th International Symposium on Quality of Electronic Design (ISQED 2008), 2008

In-field NoC-based SoC testing with distributed test vector storage.
Proceedings of the 26th International Conference on Computer Design, 2008

2007
SAPP: scalable and adaptable peak power management in nocs.
Proceedings of the 2007 International Symposium on Low Power Electronics and Design, 2007

A Safety Analysis Framework for COTS Microprocessors in Safety-Critical Applications.
Proceedings of the Tenth IEEE International Symposium on High Assurance Systems Engineering (HASE 2007), 2007


  Loading...