Christopher De Sa

CoRR, January, 2026

2025

Gradient Descent on Logistic Regression: Do Large Step-Sizes Work with Data on the Sphere?

[BibT_eX]

[DOI]

CoRR, July, 2025

Model-Preserving Adaptive Rounding.

[BibT_eX]

[DOI]

Albert Tseng

Zhaofeng Sun

CoRR, May, 2025

Extracting memorized pieces of (copyrighted) books from open-weight language models.

[BibT_eX]

[DOI]

CoRR, May, 2025

Machine Unlearning Doesn't Do What You Think: Lessons for Generative AI Policy and Research.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

Zeroth-Order Fine-Tuning of LLMs with Transferable Static Sparsity.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Compute-Optimal LLMs Provably Generalize Better with Scale.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024

ModuLoRA: Finetuning 2-Bit LLMs on Consumer GPUs by Integrating with Modular Quantizers.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2024

Machine Unlearning Doesn't Do What You Think: Lessons for Generative AI Policy, Research, and Practice.

[BibT_eX]

[DOI]

CoRR, 2024

Searching for Efficient Linear Layers over a Continuous Space of Structured Matrices.

[BibT_eX]

[DOI]

CoRR, 2024

Gradient Descent on Logistic Regression with Non-Separable Data and Large Step Sizes.

[BibT_eX]

[DOI]

CoRR, 2024

Zeroth-Order Fine-Tuning of LLMs with Extreme Sparsity.

[BibT_eX]

[DOI]

CoRR, 2024

STAT: Shrinking Transformers After Training.

[BibT_eX]

[DOI]

CoRR, 2024

QTIP: Quantization with Trellises and Incoherence Processing.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Diffusion Models With Learned Adaptive Noise.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Searching for Efficient Linear Layers over a Continuous Space of Structured Matrices.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

QuIP#: Even Better LLM Quantization with Hadamard Incoherence and Lattice Codebooks.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Shadow Cones: A Generalized Framework for Partial Order Embeddings.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Arbitrariness and Social Prediction: The Confounding Role of Variance in Fair Classification.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

Decentralized Learning: Theoretical Optimality and Practical Improvements.

[BibT_eX]

[DOI]

J. Mach. Learn. Res., 2023

Report of the 1st Workshop on Generative AI and Law.

[BibT_eX]

[DOI]

CoRR, 2023

ModuLoRA: Finetuning 3-Bit LLMs on Consumer GPUs by Integrating with Modular Quantizers.

[BibT_eX]

[DOI]

CoRR, 2023

Shadow Cones: Unveiling Partial Orders in Hyperbolic Space.

[BibT_eX]

[DOI]

CoRR, 2023

Scale up with Order: Finding Good Data Permutations for Distributed Training.

[BibT_eX]

[DOI]

CoRR, 2023

Variance, Self-Consistency, and Arbitrariness in Fair Classification.

[BibT_eX]

[DOI]

CoRR, 2023

Inference for probabilistic dependency graphs.

[BibT_eX]

[DOI]

Oliver E. Richardson

Joseph Y. Halpern

Proceedings of the Uncertainty in Artificial Intelligence, 2023

Neural Caches for Monte Carlo Partial Differential Equation Solvers.

[BibT_eX]

[DOI]

Proceedings of the SIGGRAPH Asia 2023 Conference Papers, 2023

Coneheads: Hierarchy Aware Attention.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Riemannian Residual Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

CD-GraB: Coordinating Distributed Example Orders for Provably Accelerated Training.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

QuIP: 2-Bit Quantization of Large Language Models With Guarantees.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

TART: A plug-and-play Transformer module for task-agnostic reasoning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

InfoDiffusion: Representation Learning Using Information Maximizing Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

CocktailSGD: Fine-tuning Foundation Models over 500Mbps Networks.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

STEP: Learning N: M Structured Sparsity Masks from Scratch with Precondition.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

Random Laplacian Features for Learning with Hyperbolic Space.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Maximizing Communication Efficiency for Large-scale Training via 0/1 Adam.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

2022

MCTensor: A High-Precision Deep Learning Library with Multi-Component Floating-Point.

[BibT_eX]

[DOI]

CoRR, 2022

Non-Determinism and the Lawlessness of ML Code.

[BibT_eX]

[DOI]

Jonathan Frankle

CoRR, 2022

Structured Pruning is All You Need for Pruning CNNs at Initialization.

[BibT_eX]

[DOI]

CoRR, 2022

HyLa: Hyperbolic Laplacian Features For Graph Learning.

[BibT_eX]

[DOI]

CoRR, 2022

Understanding Hyperdimensional Computing for Parallel Single-Pass Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

From Gradient Flow on Population Loss to Learning with Stochastic Gradient Descent.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

GraB: Finding Provably Better Data Permutations than Random Reshuffling.

[BibT_eX]

[DOI]

Wentao Guo

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Model Preserving Compression for Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Low-Precision Stochastic Gradient Langevin Dynamics.

[BibT_eX]

[DOI]

Andrew Gordon Wilson

Proceedings of the International Conference on Machine Learning, 2022

How Low Can We Go: Trading Memory for Error in Low-Precision Training.

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Learning Representations, 2022

A General Analysis of Example-Selection for Stochastic Gradient Descent.

[BibT_eX]

[DOI]

Si Yi Meng

Proceedings of the Tenth International Conference on Learning Representations, 2022

Non-Determinism and the Lawlessness of Machine Learning Code.

[BibT_eX]

[DOI]

Jonathan Frankle

Proceedings of the 2022 Symposium on Computer Science and Law, 2022

2021

Pruning Neural Networks with Interpolative Decompositions.

[BibT_eX]

[DOI]

CoRR, 2021

Model Selection's Disparate Impact in Real-World Deep Learning Applications.

[BibT_eX]

[DOI]

CoRR, 2021

Variance Reduction in Training Forecasting Models with Subgroup Sampling.

[BibT_eX]

[DOI]

CoRR, 2021

Low-Precision Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, 2021

Hyperparameter Optimization Is Deceiving Us, and How to Stop It.

[BibT_eX]

[DOI]

CoRR, 2021

Representing Hyperbolic Space Accurately using Multi-Component Floats.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Equivariant Manifold Flows.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Hyperparameter Optimization Is Deceiving Us, and How to Stop It.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

PipeMare: Asynchronous Pipeline Parallel DNN Training.

[BibT_eX]

[DOI]

Proceedings of the Fourth Conference on Machine Learning and Systems, 2021

'Tecnologica cosa': Modeling Storyteller Personalities in Boccaccio's 'Decameron'.

[BibT_eX]

[DOI]

Proceedings of the 5th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, 2021

Optimal Complexity in Decentralized Training.

[BibT_eX]

[DOI]

Proceedings of the 38th International Conference on Machine Learning, 2021

Variance Reduced Training with Stratified Sampling for Forecasting Models.

[BibT_eX]

[DOI]

Proceedings of the 38th International Conference on Machine Learning, 2021

Low-Precision Reinforcement Learning: Running Soft Actor-Critic in Half Precision.

[BibT_eX]

[DOI]

Proceedings of the 38th International Conference on Machine Learning, 2021

Accuracy-Efficiency Trade-Offs and Accountability in Distributed ML Systems.

[BibT_eX]

[DOI]

Karen Levy

Proceedings of the EAAMO 2021: ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization, Virtual Event, USA, October 5, 2021

Meta-Learning Divergences for Variational Inference.

[BibT_eX]

[DOI]

Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, 2021

2020

Revisiting BFloat16 Training.

[BibT_eX]

[DOI]

Pedram Zamirai

Jian Zhang

CoRR, 2020

Meta-Learning for Variational Inference.

[BibT_eX]

[DOI]

CoRR, 2020

Regulating Accuracy-Efficiency Trade-Offs in Distributed Machine Learning Systems.

[BibT_eX]

[DOI]

Karen Levy

CoRR, 2020

Towards Optimal Convergence Rate in Decentralized Stochastic Training.

[BibT_eX]

[DOI]

Zheng Li

CoRR, 2020

MixML: A Unified Analysis of Weakly Consistent Parallel Learning.

[BibT_eX]

[DOI]

Jack Nash

CoRR, 2020

Optimizing JPEG Quantization for Classification Networks.

[BibT_eX]

[DOI]

Zhijing Li

Adrian Sampson

CoRR, 2020

Asymptotically Optimal Exact Minibatch Metropolis-Hastings.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Random Reshuffling is Not Always Better.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Neural Manifold Ordinary Differential Equations.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Moniqua: Modulo Quantized Communication in Decentralized SGD.

[BibT_eX]

[DOI]

Proceedings of the 37th International Conference on Machine Learning, 2020

Differentiating through the Fréchet Mean.

[BibT_eX]

[DOI]

Proceedings of the 37th International Conference on Machine Learning, 2020

AMAGOLD: Amortized Metropolis Adjustment for Efficient Stochastic Gradient MCMC.

[BibT_eX]

[DOI]

Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, 2020

2019

Cloud-Hosted Intelligence for Real-time IoT Applications.

[BibT_eX]

[DOI]

Ken Birman

Bharath Hariharan

ACM SIGOPS Oper. Syst. Rev., 2019

Overwrite Quantization: Opportunistic Outlier Handling for Neural Network Accelerators.

[BibT_eX]

[DOI]

Ritchie Zhao

Dimitris S. Papailiopoulos

Zhiru Zhang

CoRR, 2019

SysML: The New Frontier of Machine Learning Systems.

[BibT_eX]

[DOI]

Alexandros G. Dimakis

Anastasios Kyrillidis

Shivaram Venkataraman

CoRR, 2019

Poisson-Minibatching for Gibbs Sampling with Convergence Rate Guarantees.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

QPyTorch: A Low-Precision Arithmetic Simulation Framework.

[BibT_eX]

[DOI]

Proceedings of the Fifth Workshop on Energy Efficient Machine Learning and Cognitive Computing, 2019

Numerically Accurate Hyperbolic Embeddings Using Tiling-Based Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Dimension-Free Bounds for Low-Precision Training.

[BibT_eX]

[DOI]

Zheng Li

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Channel Gating Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Boosting the Performance of CNN Accelerators with Dynamic Fine-Grained Channel Gating.

[BibT_eX]

[DOI]

Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019

Improving Neural Network Quantization without Retraining using Outlier Channel Splitting.

[BibT_eX]

[DOI]

Proceedings of the 36th International Conference on Machine Learning, 2019

SWALP : Stochastic Weight Averaging in Low Precision Training.

[BibT_eX]

[DOI]

Proceedings of the 36th International Conference on Machine Learning, 2019

A Kernel Theory of Modern Data Augmentation.

[BibT_eX]

[DOI]

Proceedings of the 36th International Conference on Machine Learning, 2019

Distributed Learning with Sublinear Communication.

[BibT_eX]

[DOI]

Proceedings of the 36th International Conference on Machine Learning, 2019

A Formal Framework for Probabilistic Unclean Databases.

[BibT_eX]

[DOI]

Proceedings of the 22nd International Conference on Database Theory, 2019

Building Efficient Deep Neural Networks With Unitary Group Convolutions.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018

Soft optoelectronic sensory foams with proprioception.

[BibT_eX]

[DOI]

Robert F. Shepherd

Sci. Robotics, 2018

Channel Gating Neural Networks.

[BibT_eX]

[DOI]

CoRR, 2018

High-Accuracy Low-Precision Training.

[BibT_eX]

[DOI]

CoRR, 2018

A Two-pronged Progress in Structured Dense Matrix Vector Multiplication.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, 2018

The Convergence of Stochastic Gradient Descent in Asynchronous Shared Memory.

[BibT_eX]

[DOI]

Dan Alistarh

Nikola Konstantinov

Proceedings of the 2018 ACM Symposium on Principles of Distributed Computing, 2018

Representation Tradeoffs for Hyperbolic Embeddings.

[BibT_eX]

[DOI]

Proceedings of the 35th International Conference on Machine Learning, 2018

Minibatch Gibbs Sampling on Large Graphical Models.

[BibT_eX]

[DOI]

Vincent Chen

Wing Wong

Proceedings of the 35th International Conference on Machine Learning, 2018

Accelerated Stochastic Power Iteration.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Artificial Intelligence and Statistics, 2018

2017

Incremental knowledge base construction using DeepDive.

[BibT_eX]

[DOI]

VLDB J., 2017

Flipper: A Systematic Approach to Debugging Training Sets.

[BibT_eX]

[DOI]

Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics, 2017

Gaussian Quadrature for Kernel Features.

[BibT_eX]

[DOI]

Tri Dao

Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

Understanding and Optimizing Asynchronous Low-Precision Stochastic Gradient Descent.

[BibT_eX]

[DOI]

Proceedings of the 44th Annual International Symposium on Computer Architecture, 2017

Ensuring Rapid Mixing and Low Bias for Asynchronous Gibbs Sampling.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, 2017

2016

DeepDive: Declarative Knowledge Base Construction.

[BibT_eX]

[DOI]

SIGMOD Rec., 2016

Parallel SGD: When does averaging help?

[BibT_eX]

[DOI]

CoRR, 2016

Socratic Learning.

[BibT_eX]

[DOI]

CoRR, 2016

Data Programming: Creating Large Training Sets, Quickly.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Scan Order in Gibbs Sampling: Models in Which it Matters and Bounds on How Much.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Have abstraction and eat performance, too: optimized heterogeneous computing with parallel patterns.

[BibT_eX]

[DOI]

Proceedings of the 2016 International Symposium on Code Generation and Optimization, 2016

Generating Configurable Hardware from Parallel Patterns.

[BibT_eX]

[DOI]

Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems, 2016

2015

Incremental Knowledge Base Construction Using DeepDive.

[BibT_eX]

[DOI]

Proc. VLDB Endow., 2015

Taming the Wild: A Unified Analysis of Hogwild-Style Algorithms.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015

Rapidly Mixing Gibbs Sampling for a Class of Factor Graphs Using Hierarchy Width.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015

Global Convergence of Stochastic Gradient Descent for Some Non-convex Matrix Problems.

[BibT_eX]

[DOI]

Proceedings of the 32nd International Conference on Machine Learning, 2015

2014

Global Convergence of Stochastic Gradient Descent for Some Nonconvex Matrix Problems.

[BibT_eX]

[DOI]