Jeffrey Pennington

Noah Constant

Trans. Mach. Learn. Res., 2024

Training Language Models on the Knowledge Graph: Insights on Hallucinations and Their Detectability.

[BibT_eX]

[DOI]

CoRR, 2024

High dimensional analysis reveals conservative sharpening and a stochastic edge of stability.

[BibT_eX]

[DOI]

Atish Agarwala

CoRR, 2024

4+3 Phases of Compute-Optimal Neural Scaling Laws.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Scaling Exponents Across Parameterizations and Optimizers.

[BibT_eX]

[DOI]

Leslie Pack Kaelbling

Jaehoon Lee

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Small-scale proxies for large-scale Transformer training instabilities.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

2023

Temperature check: theory and practice for training models with softmax-cross-entropy losses.

[BibT_eX]

[DOI]

Atish Agarwala

Samuel Stern Schoenholz

Yann N. Dauphin

Trans. Mach. Learn. Res., 2023

Frontier Language Models are not Robust to Adversarial Arithmetic, or "What do I need to say so you agree 2+2=5?

[BibT_eX]

[DOI]

CoRR, 2023

Second-order regression models exhibit progressive sharpening to the edge of stability.

[BibT_eX]

[DOI]

Atish Agarwala

Fabian Pedregosa

Proceedings of the International Conference on Machine Learning, 2023

2022

Precise Learning Curves and Higher-Order Scaling Limits for Dot Product Kernel Regression.

[BibT_eX]

[DOI]

CoRR, 2022

Precise Learning Curves and Higher-Order Scalings for Dot-product Kernel Regression.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Implicit Regularization or Implicit Conditioning? Exact Risk Trajectories of SGD in High Dimensions.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Synergy and Symmetry in Deep Learning: Interactions between the Data, Model, and Inference Algorithm.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2022

Wide Bayesian neural networks have a simple weight posterior: theory and accelerated sampling.

[BibT_eX]

[DOI]

Jiri Hron

Roman Novak

Proceedings of the International Conference on Machine Learning, 2022

Anisotropic Random Feature Regression in High Dimensions.

[BibT_eX]

[DOI]

Gabriel Mel

Proceedings of the Tenth International Conference on Learning Representations, 2022

Online education for data science: Opportunities and challenges.

[BibT_eX]

[DOI]

Proceedings of the AMIA 2022, 2022

A Random Matrix Perspective on Mixtures of Nonlinearities in High Dimensions.

[BibT_eX]

[DOI]

Jake A. Levinson

Proceedings of the International Conference on Artificial Intelligence and Statistics, 2022

2021

Covariate Shift in High-Dimensional Random Feature Regression.

[BibT_eX]

[DOI]

Nilesh Tripuraneni

CoRR, 2021

Overparameterization Improves Robustness to Covariate Shift in High Dimensions.

[BibT_eX]

[DOI]

Nilesh Tripuraneni

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Exploring the Uncertainty Properties of Neural Networks' Implicit Priors in the Infinite-Width Limit.

[BibT_eX]

[DOI]

Proceedings of the 9th International Conference on Learning Representations, 2021

2020

Exact posterior distributions of wide Bayesian neural networks.

[BibT_eX]

[DOI]

CoRR, 2020

Finite Versus Infinite Neural Networks: an Empirical Study.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

The Surprising Simplicity of the Early-Time Learning Dynamics of Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Understanding Double Descent Requires A Fine-Grained Bias-Variance Decomposition.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Disentangling Trainability and Generalization in Deep Neural Networks.

[BibT_eX]

[DOI]

Samuel Stern Schoenholz

Proceedings of the 37th International Conference on Machine Learning, 2020

The Neural Tangent Kernel in High Dimensions: Triple Descent and a Multi-Scale Theory of Generalization.

[BibT_eX]

[DOI]

Proceedings of the 37th International Conference on Machine Learning, 2020

Provable Benefit of Orthogonal Initialization in Optimizing Deep Linear Networks.

[BibT_eX]

[DOI]

Wei Hu

Proceedings of the 8th International Conference on Learning Representations, 2020

2019

Disentangling trainability and generalization in deep learning.

[BibT_eX]

[DOI]

CoRR, 2019

A Random Matrix Perspective on Mixtures of Nonlinearities for Deep Learning.

[BibT_eX]

[DOI]

Jake Levinson

CoRR, 2019

Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent.

[BibT_eX]

[DOI]

CoRR, 2019

Dynamical Isometry and a Mean Field Theory of LSTMs and GRUs.

[BibT_eX]

[DOI]

CoRR, 2019

Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

A Mean Field Theory of Batch Normalization.

[BibT_eX]

[DOI]

Greg Yang

Vinay Rao

Proceedings of the 7th International Conference on Learning Representations, 2019

Bayesian Deep Convolutional Networks with Many Channels are Gaussian Processes.

[BibT_eX]

[DOI]

Proceedings of the 7th International Conference on Learning Representations, 2019

KAMA-NNs: Low-dimensional Rotation Based Neural Networks.

[BibT_eX]

[DOI]

Krzysztof Choromanski

Aldo Pacchiano

Yunhao Tang

Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, 2019

2018

Bayesian Convolutional Neural Networks with Many Channels are Gaussian Processes.

[BibT_eX]

[DOI]

CoRR, 2018

The Spectrum of the Fisher Information Matrix of a Single-Hidden-Layer Neural Network.

[BibT_eX]

[DOI]

Pratik Worah

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10, 000-Layer Vanilla Convolutional Neural Networks.

[BibT_eX]

[DOI]

Yasaman Bahri

Proceedings of the 35th International Conference on Machine Learning, 2018

Dynamical Isometry and a Mean Field Theory of RNNs: Gating Enables Signal Propagation in Recurrent Neural Networks.

[BibT_eX]

[DOI]

Minmin Chen

Proceedings of the 35th International Conference on Machine Learning, 2018

Sensitivity and Generalization in Neural Networks: an Empirical Study.

[BibT_eX]

[DOI]

Proceedings of the 6th International Conference on Learning Representations, 2018

Deep Neural Networks as Gaussian Processes.

[BibT_eX]

[DOI]

Proceedings of the 6th International Conference on Learning Representations, 2018

The emergence of spectral universality in deep networks.

[BibT_eX]

[DOI]

Surya Ganguli

Proceedings of the International Conference on Artificial Intelligence and Statistics, 2018

2017

A Correspondence Between Random Neural Networks and Statistical Field Theory.

[BibT_eX]

[DOI]

CoRR, 2017

Nonlinear random matrix theory for deep learning.

[BibT_eX]

[DOI]

Pratik Worah

Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice.

[BibT_eX]

[DOI]

Surya Ganguli

Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

Geometry of Neural Network Loss Surfaces via Random Matrix Theory.

[BibT_eX]

[DOI]

Yasaman Bahri

Proceedings of the 34th International Conference on Machine Learning, 2017

2016

Clinical Data Research Network Lessons Learned.

[BibT_eX]

[DOI]

Proceedings of the Summit on Clinical Research Informatics, 2016

2015

Spherical Random Features for Polynomial Kernels.

[BibT_eX]

[DOI]

Felix X. Yu

Sanjiv Kumar

Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015

2014

Glove: Global Vectors for Word Representation.

[BibT_eX]

[DOI]