Zeyuan Allen Zhu

CoRR, December, 2025

Physics of Language Models: Part 1, Learning Hierarchical Language Structures.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2025

Making Hard Problems Easier with Custom Data Distributions and Loss Regularization: A Case Study in Modular Arithmetic.

[BibT_eX]

[DOI]

IACR Cryptol. ePrint Arch., 2025

Physics of Language Models: Part 3.2, Knowledge Manipulation.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024

Reverse Training to Nurse the Reversal Curse.

[BibT_eX]

[DOI]

CoRR, 2024

Physics of Language Models: Part 3.1, Knowledge Storage and Extraction.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

2023

SALSA VERDE: a machine learning attack on Learning with Errors with sparse small secrets.

[BibT_eX]

[DOI]

IACR Cryptol. ePrint Arch., 2023

Physics of Language Models: Part 1, Context-Free Grammar.

[BibT_eX]

[DOI]

CoRR, 2023

SALSA VERDE: a machine learning attack on LWE with sparse small secrets.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Forward Super-Resolution: How Can GANs Learn Hierarchical Generative Models for Real-World Distributions.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Backward Feature Correction: How Deep Learning Performs Deep (Hierarchical) Learning.

[BibT_eX]

[DOI]

Proceedings of the Thirty Sixth Annual Conference on Learning Theory, 2023

2022

LoRA: Low-Rank Adaptation of Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Learning Representations, 2022

2021

Near-optimal discrete optimization for experimental design: a regret minimization approach.

[BibT_eX]

[DOI]

Math. Program., 2021

LoRA: Low-Rank Adaptation of Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2021

Byzantine-Resilient Non-Convex Stochastic Gradient Descent.

[BibT_eX]

[DOI]

Faeze Ebrahimianghazani

Jerry Li

Dan Alistarh

Proceedings of the 9th International Conference on Learning Representations, 2021

Feature Purification: How Adversarial Training Performs Robust Deep Learning.

[BibT_eX]

[DOI]

Proceedings of the 62nd IEEE Annual Symposium on Foundations of Computer Science, 2021

2020

Backward Feature Correction: How Deep Learning Performs Deep Learning.

[BibT_eX]

[DOI]

CoRR, 2020

2019

Nearly linear-time packing and covering LP solvers - Achieving width-independence and -convergence.

[BibT_eX]

[DOI]

Math. Program., 2019

On the Convergence Rate of Training Recurrent Neural Networks.

[BibT_eX]

[DOI]

Zhao Song

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers.

[BibT_eX]

[DOI]

Yingyu Liang

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Can SGD Learn Recurrent Neural Networks with Provable Generalization?

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

What Can ResNet Learn Efficiently, Going Beyond Kernels?

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

A Convergence Theory for Deep Learning via Over-Parameterization.

[BibT_eX]

[DOI]

Zhao Song

Proceedings of the 36th International Conference on Machine Learning, 2019

2018

How To Make the Gradients Small Stochastically.

[BibT_eX]

[DOI]

CoRR, 2018

Operator scaling via geodesically convex optimization, invariant theory and polynomial identity testing.

[BibT_eX]

[DOI]

Ankit Garg

Rafael Mendes de Oliveira

Avi Wigderson

Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, 2018

Is Q-Learning Provably Efficient?

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

The Lingering of Gradients: How to Reuse Gradients Over Time.

[BibT_eX]

[DOI]

David Simchi-Levi

Xinshang Wang

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

NEON2: Finding Local Minima via First-Order Oracles.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Natasha 2: Faster Non-Convex Optimization Than SGD.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

How To Make the Gradients Small Stochastically: Even Faster Convex and Nonconvex SGD.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Byzantine Stochastic Gradient Descent.

[BibT_eX]

[DOI]

Dan Alistarh

Jerry Li

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Make the Minority Great Again: First-Order Regret Bound for Contextual Bandits.

[BibT_eX]

[DOI]

Sébastien Bubeck

Proceedings of the 35th International Conference on Machine Learning, 2018

Katyusha X: Practical Momentum Method for Stochastic Sum-of-Nonconvex Optimization.

[BibT_eX]

[DOI]

Proceedings of the 35th International Conference on Machine Learning, 2018

2017

Follow the Compressed Leader: Faster Algorithms for Matrix Multiplicative Weight Updates.

[BibT_eX]

[DOI]

CoRR, 2017

Katyusha: the first direct acceleration of stochastic gradient methods.

[BibT_eX]

[DOI]

Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, 2017

Finding approximate local minima faster than gradient descent.

[BibT_eX]

[DOI]

Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, 2017

Linear Convergence of a Frank-Wolfe Type Algorithm over Trace-Norm Balls.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

Linear Coupling: An Ultimate Unification of Gradient and Mirror Descent.

[BibT_eX]

[DOI]

Proceedings of the 8th Innovations in Theoretical Computer Science Conference, 2017

Near-Optimal Design of Experiments via Regret Minimization.

[BibT_eX]

[DOI]

Proceedings of the 34th International Conference on Machine Learning, 2017

Follow the Compressed Leader: Faster Online Learning of Eigenvectors and Faster MMWU.

[BibT_eX]

[DOI]

Proceedings of the 34th International Conference on Machine Learning, 2017

Faster Principal Component Regression and Stable Matrix Chebyshev Approximation.

[BibT_eX]

[DOI]

Proceedings of the 34th International Conference on Machine Learning, 2017

Doubly Accelerated Methods for Faster CCA and Generalized Eigendecomposition.

[BibT_eX]

[DOI]

Proceedings of the 34th International Conference on Machine Learning, 2017

Natasha: Faster Non-Convex Stochastic Optimization via Strongly Non-Convex Parameter.

[BibT_eX]

[DOI]

Proceedings of the 34th International Conference on Machine Learning, 2017

Much Faster Algorithms for Matrix Scaling.

[BibT_eX]

[DOI]

Rafael Mendes de Oliveira

Avi Wigderson

Proceedings of the 58th IEEE Annual Symposium on Foundations of Computer Science, 2017

First Efficient Convergence for Streaming k-PCA: A Global, Gap-Free, and Near-Optimal Rate.

[BibT_eX]

[DOI]

Proceedings of the 58th IEEE Annual Symposium on Foundations of Computer Science, 2017

2016

Reconstructing Markov processes from independent and anonymous experiments.

[BibT_eX]

[DOI]

Discret. Appl. Math., 2016

Faster Principal Component Regression via Optimal Polynomial Approximation to sgn(x).

[BibT_eX]

[DOI]

CoRR, 2016

Fast Global Convergence of Online PCA.

[BibT_eX]

[DOI]

CoRR, 2016

Katyusha: Accelerated Variance Reduction for Faster SGD.

[BibT_eX]

[DOI]

CoRR, 2016

Finding Approximate Local Minima for Nonconvex Optimization in Linear Time.

[BibT_eX]

[DOI]

CoRR, 2016

Expanders via Local Edge Flips.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, 2016

Using Optimization to Obtain a Width-Independent, Parallel, Simpler, and Faster Positive SDP Solver.

[BibT_eX]

[DOI]

Yin Tat Lee

Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, 2016

Exploiting the Structure: Stochastic Gradient Methods Using Raw Clusters.

[BibT_eX]

[DOI]

Karthik Sridharan

Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Even Faster SVD Decomposition Yet Without Agonizing Pain.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Optimal Black-Box Reductions Between Optimization Objectives.

[BibT_eX]

[DOI]

Elad Hazan

Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Improved SVRG for Non-Strongly-Convex or Sum-of-Non-Convex Objectives.

[BibT_eX]

[DOI]

Proceedings of the 33nd International Conference on Machine Learning, 2016

Even Faster Accelerated Coordinate Descent Using Non-Uniform Sampling.

[BibT_eX]

[DOI]

Proceedings of the 33nd International Conference on Machine Learning, 2016

Variance Reduction for Faster Non-Convex Optimization.

[BibT_eX]

[DOI]

Elad Hazan

Proceedings of the 33nd International Conference on Machine Learning, 2016

Optimization Algorithms for Faster Computational Geometry.

[BibT_eX]

[DOI]

Zhenyu Liao

Proceedings of the 43rd International Colloquium on Automata, Languages, and Programming, 2016

2015

Shorter arithmetization of nondeterministic computations.

[BibT_eX]

[DOI]

Theor. Comput. Sci., 2015

Even Faster Accelerated Coordinate Descent Using Non-Uniform Sampling.

[BibT_eX]

[DOI]

CoRR, 2015

UniVR: A Universal Variance Reduction Framework for Proximal Stochastic Gradient Method.

[BibT_eX]

[DOI]

CoRR, 2015

Nearly-Linear Time Positive LP Solver with Faster Convergence Rate.

[BibT_eX]

[DOI]

Proceedings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing, 2015

Spectral Sparsification and Regret Minimization Beyond Matrix Multiplicative Updates.

[BibT_eX]

[DOI]

Zhenyu Liao

Proceedings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing, 2015

Using Optimization to Break the Epsilon Barrier: A Faster and Simpler Width-Independent Algorithm for Solving Positive Linear Programs in Parallel.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, 2015

Restricted Isometry Property for General p-Norms.

[BibT_eX]

[DOI]

Rati Gelashvili

Ilya P. Razenshteyn

Proceedings of the 31st International Symposium on Computational Geometry, 2015

2014

Nearly-Linear Time Packing and Covering LP Solver with Faster Convergence Rate Than $O(1/\varepsilon^2)$.

[BibT_eX]

[DOI]

CoRR, 2014

A Novel, Simple Interpretation of Nesterov's Accelerated Method as a Combination of Gradient and Mirror Descent.

[BibT_eX]

[DOI]

CoRR, 2014

Using Optimization to Find Maximum Inscribed Balls and Minimum Enclosing Balls.

[BibT_eX]

[DOI]

Zhenyu Liao

CoRR, 2014

Johnson-Lindenstrauss Compression with Neuroscience-Based Constraints.

[BibT_eX]

[DOI]

CoRR, 2014

Knightian Robustness of the Vickrey Mechanism.

[BibT_eX]

[DOI]

CoRR, 2014

Knightian Robustness of Single-Parameter Domains.

[BibT_eX]

[DOI]

CoRR, 2014

Knightian Analysis of the VCG Mechanism in Unrestricted Combinatorial Auctions.

[BibT_eX]

[DOI]

CoRR, 2014

Knightian Robustness from Regret Minimization.

[BibT_eX]

[DOI]

CoRR, 2014

Bridging Utility Maximization and Regret Minimization.

[BibT_eX]

[DOI]

CoRR, 2014

Flow-Based Algorithms for Local Graph Clustering.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms, 2014

Knightian self uncertainty in the vcg mechanism for unrestricted combinatorial auctions.

[BibT_eX]

[DOI]

Proceedings of the ACM Conference on Economics and Computation, 2014

2013

A simple, combinatorial algorithm for solving SDD systems in nearly-linear time.

[BibT_eX]

[DOI]

Proceedings of the Symposium on Theory of Computing Conference, 2013

A Local Algorithm for Finding Well-Connected Clusters.

[BibT_eX]

[DOI]

Silvio Lattanzi

Vahab S. Mirrokni

Proceedings of the 30th International Conference on Machine Learning, 2013

2012

Randomized accuracy-aware program transformations for efficient approximate computations.

[BibT_eX]

[DOI]

Proceedings of the 39th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 2012

Mechanism design with approximate valuations.

[BibT_eX]

[DOI]

Proceedings of the Innovations in Theoretical Computer Science 2012, 2012

2011

Knightian Auctions

[BibT_eX]

[DOI]

CoRR, 2011

Optimal Pricing in Social Networks with Incomplete Information.

[BibT_eX]

[DOI]

Proceedings of the Internet and Network Economics - 7th International Workshop, 2011

2010

Survey & Experiment: Towards the Learning Accuracy

[BibT_eX]

[DOI]