Difan Zou

CoRR, October, 2025

Learning under Quantization for High-Dimensional Linear Regression.

[BibT_eX]

[DOI]

Dechen Zhang

CoRR, October, 2025

How Does Label Noise Gradient Descent Improve Generalization in the Low SNR Regime?

[BibT_eX]

[DOI]

CoRR, October, 2025

Kernel Regression in Structured Non-IID Settings: Theory and Implications for Denoising Score Learning.

[BibT_eX]

[DOI]

CoRR, October, 2025

Hierarchical Koopman Diffusion: Fast Generation with Interpretable Diffusion Trajectory.

[BibT_eX]

[DOI]

Hanru Bai

Weiyang Ding

CoRR, October, 2025

Understanding the Generalization of Stochastic Gradient Adam in Learning Neural Networks.

[BibT_eX]

[DOI]

CoRR, October, 2025

Speculative Jacobi-Denoising Decoding for Accelerating Autoregressive Text-to-image Generation.

[BibT_eX]

[DOI]

CoRR, October, 2025

Does higher interpretability imply better utility? A Pairwise Analysis on Sparse Autoencoders.

[BibT_eX]

[DOI]

CoRR, October, 2025

F-Adapter: Frequency-Adaptive Parameter-Efficient Fine-Tuning in Scientific Machine Learning.

[BibT_eX]

[DOI]

CoRR, September, 2025

On the Complexity Theory of Masked Discrete Diffusion: From poly(1/ε) to Nearly ε-Free.

[BibT_eX]

[DOI]

CoRR, September, 2025

On the Collapse Errors Induced by the Deterministic Sampler for Diffusion Models.

[BibT_eX]

[DOI]

CoRR, August, 2025

Towards Theoretical Understanding of Transformer Test-Time Computing: Investigation on In-Context Linear Regression.

[BibT_eX]

[DOI]

CoRR, August, 2025

STGAN: Spatial-Temporal Graph Autoregression Network for Pavement Distress Deterioration Prediction.

[BibT_eX]

[DOI]

IEEE Trans. Intell. Transp. Syst., July, 2025

Self-Contradiction as Self-Improvement: Mitigating the Generation-Understanding Gap in MLLMs.

[BibT_eX]

[DOI]

CoRR, July, 2025

A Random Matrix Analysis of In-context Memorization for Nonlinear Attention.

[BibT_eX]

[DOI]

CoRR, June, 2025

On the Mechanism of Reasoning Pattern Selection in Reinforcement Learning for Language Models.

[BibT_eX]

[DOI]

Xingwu Chen

Tianle Li

CoRR, June, 2025

Model Unlearning via Sparse Autoencoder Subspace Guided Projections.

[BibT_eX]

[DOI]

CoRR, May, 2025

Physics-Informed Distillation of Diffusion Models for PDE-Constrained Generation.

[BibT_eX]

[DOI]

Yi Zhang

CoRR, May, 2025

Almost Linear Convergence under Minimal Score Assumptions: Quantized Transition Diffusion.

[BibT_eX]

[DOI]

CoRR, May, 2025

Capturing Conditional Dependence via Auto-regressive Diffusion Models.

[BibT_eX]

[DOI]

CoRR, April, 2025

Gradient Descent Robustly Learns the Intrinsic Dimension of Data in Training Convolutional Neural Networks.

[BibT_eX]

[DOI]

CoRR, April, 2025

Per-example gradient regularization improves learning signals from noisy data.

[BibT_eX]

[DOI]

Xuran Meng

Yuan Cao

Mach. Learn., March, 2025

On the Robustness of Transformers against Context Hijacking for Linear Classification.

[BibT_eX]

[DOI]

CoRR, February, 2025

Hyperspherical Energy Transformer with Recurrent Depth.

[BibT_eX]

[DOI]

Yunzhe Hu

Alvaro J. Castro Rivadeneira

Dong Xu

CoRR, February, 2025

Can Diffusion Models Learn Hidden Inter-Feature Rules Behind Images?

[BibT_eX]

[DOI]

CoRR, February, 2025

Masked Autoencoders Are Effective Tokenizers for Diffusion Models.

[BibT_eX]

[DOI]

CoRR, February, 2025

Towards Understanding Fine-Tuning Mechanisms of LLMs via Circuit Analysis.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Can Diffusion Models Learn Hidden Inter-Feature Rules Behind Images?

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Masked Autoencoders Are Effective Tokenizers for Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

How Does Critical Batch Size Scale in Pre-training?

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

HyPoGen: Optimization-Biased Hypernetworks for Generalizable Policy Generation.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Beyond Surface Structure: A Causal Assessment of LLMs' Comprehension ability.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

On the Feature Learning in Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Parallelized Autoregressive Visual Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

SWE-Fixer: Training Open-Source LLMs for Effective and Efficient GitHub Issue Resolution.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2025

2024

Challenges of COVID-19 Case Forecasting in the US, 2020-2021.

[BibT_eX]

[DOI]

Joseph Chadi Lemaitre

Kaitlin Rainwater-Lovett

Ana L. Pastore y Piontti

Alessandro Vespignani

Przemyslaw J. Porebski

Srinivasan Venkatramanan

PLoS Comput. Biol., 2024

Initialization Matters: On the Benign Overfitting of Two-Layer ReLU CNN with Fully Trainable Layers.

[BibT_eX]

[DOI]

CoRR, 2024

Towards a Theoretical Understanding of Memorization in Diffusion Models.

[BibT_eX]

[DOI]

CoRR, 2024

Self-Control of LLM Behaviors by Compressing Suffix Gradient into Prefix Controller.

[BibT_eX]

[DOI]

CoRR, 2024

A Human-Like Reasoning Framework for Multi-Phases Planning Task with Large Language Models.

[BibT_eX]

[DOI]

Chengxing Xie

CoRR, 2024

The Dog Walking Theory: Rethinking Convergence in Federated Learning.

[BibT_eX]

[DOI]

CoRR, 2024

On the Benefits of Over-parameterization for Out-of-Distribution Generalization.

[BibT_eX]

[DOI]

CoRR, 2024

Improving Implicit Regularization of SGD with Preconditioning for Least Square Problems.

[BibT_eX]

[DOI]

Chuan Wu

CoRR, 2024

An Improved Analysis of Langevin Algorithms with Prior Diffusion for Non-Log-Concave Sampling.

[BibT_eX]

[DOI]

CoRR, 2024

The Implicit Bias of Adam on Separable Data.

[BibT_eX]

[DOI]

Chenyang Zhang

Yuan Cao

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Reverse Transition Kernel: A Flexible Framework to Accelerate Diffusion Inference.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

An In-depth Investigation of Sparse Rate Reduction in Transformer-like Models.

[BibT_eX]

[DOI]

Yunzhe Hu

Dong Xu

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

How Transformers Utilize Multi-Head Attention in In-Context Learning? A Case Study on Sparse Linear Regression.

[BibT_eX]

[DOI]

Xingwu Chen

Lei Zhao

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Slight Corruption in Pre-training Data Makes Better Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Benign Overfitting in Two-Layer ReLU Convolutional Neural Networks for XOR Data.

[BibT_eX]

[DOI]

Xuran Meng

Yuan Cao

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Faster Sampling via Stochastic Gradient Proximal Sampler.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Improving Group Robustness on Spurious Correlation Requires Preciser Group Inference.

[BibT_eX]

[DOI]

Yujin Han

Proceedings of the Forty-first International Conference on Machine Learning, 2024

What Can Transformer Learn with Varying Depth? Case Studies on Sequence Learning Tasks.

[BibT_eX]

[DOI]

Xingwu Chen

Proceedings of the Forty-first International Conference on Machine Learning, 2024

How Many Pretraining Tasks Are Needed for In-Context Learning of Linear Regression?

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

PRES: Toward Scalable Memory-Based Dynamic Graph Neural Networks.

[BibT_eX]

[DOI]

Chuan Wu

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Benign Oscillation of Stochastic Gradient Descent with Large Learning Rate.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Optimized Transmit Beamformers for Dual-Function RadCom System.

[BibT_eX]

[DOI]

Proceedings of the IEEE Globecom Workshops 2024, 2024

Faster Sampling without Isoperimetry via Diffusion-based Monte Carlo.

[BibT_eX]

[DOI]

Proceedings of the Thirty Seventh Annual Conference on Learning Theory, June 30, 2024

On the Limitation and Experience Replay for GNNs in Continual Learning.

[BibT_eX]

[DOI]

Chuan Wu

Proceedings of the Conference on Lifelong Learning Agents, 2024

2023

Benign Oscillation of Stochastic Gradient Descent with Large Learning Rates.

[BibT_eX]

[DOI]

CoRR, 2023

Less is More: On the Feature Redundancy of Pretrained Models When Transferring to Few-shot Tasks.

[BibT_eX]

[DOI]

CoRR, 2023

Learning High-Dimensional Single-Neuron ReLU Networks with Finite Samples.

[BibT_eX]

[DOI]

CoRR, 2023

The Benefits of Mixup for Feature Learning.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

Finite-Sample Analysis of Learning High-Dimensional Single ReLU Neuron.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

Towards Robust Graph Incremental Learning on Evolving Graphs.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

Understanding the Generalization of Adam in Learning Neural Networks with Proper Regularization.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

The Implicit Bias of Batch Normalization in Linear Models and Two-layer Linear Convolutional Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the Thirty Sixth Annual Conference on Learning Theory, 2023

2022

Understanding the Role of Optimization Algorithms in Learning Over-parameterized Models

[BibT_eX]

[DOI]

PhD thesis, 2022

Two-Dimensional Intensity Distribution and Adaptive Power Allocation for Ultraviolet Ad-Hoc Network.

[BibT_eX]

[DOI]

IEEE Trans. Green Commun. Netw., 2022

Risk Bounds of Multi-Pass SGD for Least Squares in the Interpolation Regime.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

The Power and Limitation of Pretraining-Finetuning for Linear Regression under Covariate Shift.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Last Iterate Risk Bounds of SGD with Decaying Stepsize for Overparameterized Linear Regression.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2022

Self-training Converts Weak Learners to Strong Learners in Mixture Models.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Artificial Intelligence and Statistics, 2022

2021

Laplacian Smoothing Stochastic Gradient Markov Chain Monte Carlo.

[BibT_eX]

[DOI]

SIAM J. Sci. Comput., 2021

Faster Convergence of Stochastic Gradient Langevin Dynamics for Non-Log-Concave Sampling.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence, 2021

The Benefits of Implicit Regularization from SGD in Least Squares Problems.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

On the Convergence of Hamiltonian Monte Carlo with Stochastic Gradients.

[BibT_eX]

[DOI]

Proceedings of the 38th International Conference on Machine Learning, 2021

Provable Robustness of Adversarial Training for Learning Halfspaces with Noise.

[BibT_eX]

[DOI]

Spencer Frei

Proceedings of the 38th International Conference on Machine Learning, 2021

Direction Matters: On the Implicit Bias of Stochastic Gradient Descent with Moderate Learning Rate.

[BibT_eX]

[DOI]

Proceedings of the 9th International Conference on Learning Representations, 2021

How Much Over-parameterization Is Sufficient to Learn Deep ReLU Networks?

[BibT_eX]

[DOI]

Proceedings of the 9th International Conference on Learning Representations, 2021

Benign Overfitting of Constant-Stepsize SGD for Linear Regression.

[BibT_eX]

[DOI]

Proceedings of the Conference on Learning Theory, 2021

2020

Gradient descent optimizes over-parameterized deep ReLU networks.

[BibT_eX]

[DOI]

Mach. Learn., 2020

Direction Matters: On the Implicit Regularization Effect of Stochastic Gradient Descent with Moderate Learning Rate.

[BibT_eX]

[DOI]

CoRR, 2020

On the Global Convergence of Training Deep Linear ResNets.

[BibT_eX]

[DOI]

Philip M. Long

Proceedings of the 8th International Conference on Learning Representations, 2020

Improving Adversarial Robustness Requires Revisiting Misclassified Examples.

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Learning Representations, 2020

Two-dimensional Intensity Distribution and Connectivity in Ultraviolet Ad-Hoc Network.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Communications, 2020

2019

Signal Characterization and Achievable Transmission Rate of VLC Under Receiver Nonlinearity.

[BibT_eX]

[DOI]

IEEE Access, 2019

Layer-Dependent Importance Sampling for Training Deep and Large Graph Convolutional Networks.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

An Improved Analysis of Training Over-parameterized Deep Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Stochastic Gradient Hamiltonian Monte Carlo Methods with Recursive Variance Reduction.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Sampling from Non-Log-Concave Distributions via Variance-Reduced Gradient Langevin Dynamics.

[BibT_eX]

[DOI]

Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, 2019

2018

Signal Detection Under Short-Interval Sampling of Continuous Waveforms for Optical Wireless Scattering Communication.

[BibT_eX]

[DOI]

IEEE Trans. Wirel. Commun., 2018

Secrecy Rate of MISO Optical Wireless Scattering Communications.

[BibT_eX]

[DOI]

IEEE Trans. Commun., 2018

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks.

[BibT_eX]

[DOI]

CoRR, 2018

Subsampled Stochastic Variance-Reduced Gradient Langevin Dynamics.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fourth Conference on Uncertainty in Artificial Intelligence, 2018

Global Convergence of Langevin Dynamics Based Algorithms for Nonconvex Optimization.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Stochastic Variance-Reduced Hamilton Monte Carlo Methods.

[BibT_eX]

[DOI]

Proceedings of the 35th International Conference on Machine Learning, 2018

2017

Saving Gradient and Negative Curvature Computations: Finding Local Minima More Efficiently.

[BibT_eX]

[DOI]

Yaodong Yu

CoRR, 2017

Analysis on Practical Photon Counting Receiver in Optical Scattering Communication.

[BibT_eX]

[DOI]

CoRR, 2017

Characterization of a Practical Photon Counting Receiver in Optical Scattering Communication.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE Global Communications Conference, 2017

2016

Turbulence channel modeling and non-parametric estimation for optical wireless scattering communication.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Communication Systems, 2016

Performance of non-line-of-sight ultraviolet scattering communication under different altitudes.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE/CIC International Conference on Communications in China, 2016

Optical wireless scattering communication system with a non-ideal photon-counting receiver.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE Global Conference on Signal and Information Processing, 2016

2014

Improving the NLOS optical scattering channel via beam reshaping.

[BibT_eX]

[DOI]

Shang-Bin Li