Cong Fang

CoRR, October, 2025

Conda: Column-Normalized Adam for Training Large Language Models Faster.

[BibT_eX]

[DOI]

CoRR, September, 2025

Hessian-Aware Zeroth-Order Optimization.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., June, 2025

Scaling Law for Stochastic Gradient Descent in Quadratically Parameterized Linear Regression.

[BibT_eX]

[DOI]

CoRR, February, 2025

Optimal Algorithms in Linear Regression under Covariate Shift: On the Importance of Precondition.

[BibT_eX]

[DOI]

CoRR, February, 2025

Fundamental Computational Limits in Pursuing Invariant Causal Prediction and Invariance-Guided Regularization.

[BibT_eX]

[DOI]

CoRR, January, 2025

Learning Curves of Stochastic Gradient Descent in Kernel Regression.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

SEPARATE: A Simple Low-rank Projection for Gradient Compression in Modern Large-scale Model Training Process.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024

Designing Universally-Approximating Deep Neural Networks: A First-Order Optimization Approach.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., September, 2024

The Optimality of (Accelerated) SGD for High-Dimensional Quadratic Optimization.

[BibT_eX]

[DOI]

CoRR, 2024

On the Algorithmic Bias of Aligning Large Language Models with RLHF: Preference Collapse and Matching Regularization.

[BibT_eX]

[DOI]

CoRR, 2024

Causality Pursuit from Heterogeneous Environments via Neural Adversarial Invariance Learning.

[BibT_eX]

[DOI]

CoRR, 2024

INSIGHT: End-to-End Neuro-Symbolic Visual Reinforcement Learning with Language Explanations.

[BibT_eX]

[DOI]

CoRR, 2024

The Implicit Bias of Heterogeneity towards Invariance and Causality.

[BibT_eX]

[DOI]

Yang Xu

Yihong Gu

CoRR, 2024

The Implicit Bias of Heterogeneity towards Invariance: A Study of Multi-Environment Matrix Sensing.

[BibT_eX]

[DOI]

Yang Xu

Yihong Gu

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Separation and Bias of Deep Equilibrium Models on Expressivity and Learning Dynamics.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Optimizing over Multiple Distributions under Generalized Quasar-Convexity Condition.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Quantum Algorithms and Lower Bounds for Finite-Sum Optimization.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

End-to-End Neuro-Symbolic Reinforcement Learning with Textual Explanations.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Relational Learning in Pre-Trained Models: A Theory from Hypergraph Recovery Perspective.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

2023

Accelerated Gradient Algorithms with Adaptive Subspace Search for Instance-Faster Optimization.

[BibT_eX]

[DOI]

CoRR, 2023

CORE: Common Random Reconstruction for Distributed Optimization with Provable Low Communication Complexity.

[BibT_eX]

[DOI]

CoRR, 2023

Task-Robust Pre-Training for Worst-Case Downstream Adaptation.

[BibT_eX]

[DOI]

CoRR, 2023

Policy Representation via Diffusion Probability Model for Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, 2023

Environment Invariant Linear Least Squares.

[BibT_eX]

[DOI]

CoRR, 2023

Provable Particle-based Primal-Dual Algorithm for Mixed Nash Equilibrium.

[BibT_eX]

[DOI]

CoRR, 2023

Task-Robust Pre-Training for Worst-Case Downstream Adaptation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Double Randomized Underdamped Langevin with Dimension-Independent Convergence Guarantee.

[BibT_eX]

[DOI]

Yuanshi Liu

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Zeroth-order Optimization with Weak Dimension Dependency.

[BibT_eX]

[DOI]

Proceedings of the Thirty Sixth Annual Conference on Learning Theory, 2023

On the Lower Bound of Minimizing Polyak-Łojasiewicz functions.

[BibT_eX]

[DOI]

Pengyun Yue

Proceedings of the Thirty Sixth Annual Conference on Learning Theory, 2023

2022

Convex Formulation of Overparameterized Deep Neural Networks.

[BibT_eX]

[DOI]

IEEE Trans. Inf. Theory, 2022

Training Neural Networks by Lifted Proximal Operator Machines.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2022

Alternating Direction Method of Multipliers for Machine Learning

[BibT_eX]

[DOI]

Huan Li

Springer, ISBN: 978-981-16-9839-2, 2022

2021

Mathematical Models of Overparameterized Neural Networks.

[BibT_eX]

[DOI]

Hanze Dong

Proc. IEEE, 2021

Layer-Peeled Model: Toward Understanding Well-Trained Deep Neural Networks.

[BibT_eX]

[DOI]

CoRR, 2021

Modeling from Features: a Mean-field Framework for Over-parameterized Deep Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the Conference on Learning Theory, 2021

2020

Decentralized Accelerated Gradient Methods With Increasing Penalty Parameters.

[BibT_eX]

[DOI]

IEEE Trans. Signal Process., 2020

Accelerated First-Order Optimization Algorithms for Machine Learning.

[BibT_eX]

[DOI]

Huan Li

Proc. IEEE, 2020

Improved Analysis of Clipping Algorithms for Non-convex Optimization.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

How to Characterize The Landscape of Overparameterized Convolutional Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Accelerated Optimization for Machine Learning - First-Order Algorithms

[BibT_eX]

[DOI]

Huan Li

Springer, ISBN: 978-981-15-2909-2, 2020

2019

Over Parameterized Two-level Neural Networks Can Learn Near Optimal Feature Representations.

[BibT_eX]

[DOI]

Hanze Dong

CoRR, 2019

Learning Compact Partial Differential Equations for Color Images with Efficiency.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Sharp Analysis for Nonconvex SGD Escaping from Saddle Points.

[BibT_eX]

[DOI]

Proceedings of the Conference on Learning Theory, 2019

Complexities in Projection-Free Stochastic Non-convex Minimization.

[BibT_eX]

[DOI]

Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, 2019

Lifted Proximal Operator Machines.

[BibT_eX]

[DOI]

Jia Li

Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018

Dictionary learning with structured noise.

[BibT_eX]

[DOI]

Neurocomputing, 2018

Hessian-Aware Zeroth-Order Optimization for Black-Box Adversarial Attack.

[BibT_eX]

[DOI]

CoRR, 2018

Accelerating Asynchronous Algorithms for Convex Optimization by Momentum Compensation.

[BibT_eX]

[DOI]

Yameng Huang

CoRR, 2018

SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path-Integrated Differential Estimator.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

2017

Feature learning via partial differential equation with applications to face recognition.

[BibT_eX]

[DOI]

Pattern Recognit., 2017

Faster and Non-ergodic O(1/K) Stochastic Alternating Direction Method of Multipliers.

[BibT_eX]

[DOI]

Feng Cheng

Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

Parallel Asynchronous Stochastic Variance Reduction for Nonconvex Optimization.

[BibT_eX]

[DOI]