We stand with Ukraine

We stand with Ukraine

Sashank J. Reddi

Affiliations:

Carnegie Mellon University, Machine Learning Department

According to our database¹, Sashank J. Reddi authored at least 78 papers between 2010 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

Online presence:

on cs.cmu.edu

On csauthors.net:

Bibliography

2025

Structured Preconditioners in Adaptive Optimization: A Unified Analysis.

[DOI]

,

,

Sashank J. Reddi

,

,

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Bipartite Ranking From Multiple Labels: On Loss Versus Label Aggregation.

[DOI]

,

,

Harikrishna Narasimhan

,

Aditya Krishna Menon

,

Wittawat Jitkrittum

,

,

Sashank J. Reddi

,

,

MohammadHossein Bateni

,

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Reasoning with Latent Thoughts: On the Power of Looped Transformers.

[DOI]

,

Nishanth Dikkala

,

,

,

Sashank J. Reddi

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Efficient stagewise pretraining via progressive subnetworks.

[DOI]

Abhishek Panigrahi

,

,

,

Sobhan Miryoosefi

,

Sashank J. Reddi

,

,

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024

On the Role of Depth and Looping for In-Context Learning with Task Diversity.

[DOI]

Khashayar Gatmiry

,

,

Sashank J. Reddi

,

Stefanie Jegelka

,

CoRR, 2024

A Little Help Goes a Long Way: Efficient LLM Training by Leveraging Small LMs.

[DOI]

Ankit Singh Rawat

,

Veeranjaneyulu Sadhanala

,

Afshin Rostamizadeh

,

Ayan Chakrabarti

,

Wittawat Jitkrittum

,

Vladimir Feinberg

,

,

Hrayr Harutyunyan

,

,

,

Rakesh Shivanna

,

Sashank J. Reddi

,

Aditya Krishna Menon

,

,

CoRR, 2024

Efficient Document Ranking with Learnable Late Interactions.

[DOI]

,

,

,

Sashank J. Reddi

,

Sadeep Jayasumana

,

Ankit Singh Rawat

,

Aditya Krishna Menon

,

,

CoRR, 2024

Landscape-Aware Growing: The Power of a Little LAG.

[DOI]

,

,

Sobhan Miryoosefi

,

Sashank J. Reddi

,

CoRR, 2024

On the Inductive Bias of Stacking Towards Improving Reasoning.

[DOI]

,

,

Shankar Krishnan

,

Sobhan Miryoosefi

,

Sashank Jakkam Reddi

,

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning?

[DOI]

Khashayar Gatmiry

,

,

Sashank J. Reddi

,

Stefanie Jegelka

,

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Simplicity Bias via Global Convergence of Sharpness Minimization.

[DOI]

Khashayar Gatmiry

,

,

Sashank J. Reddi

,

Stefanie Jegelka

Proceedings of the Forty-first International Conference on Machine Learning, 2024

2023

The Inductive Bias of Flatness Regularization for Deep Matrix Factorization.

[DOI]

Khashayar Gatmiry

,

,

Ching-Yao Chuang

,

Sashank J. Reddi

,

,

Stefanie Jegelka

CoRR, 2023

Depth Dependence of μP Learning Rates in ReLU MLPs.

[DOI]

,

,

,

Sashank J. Reddi

,

Srinadh Bhojanapalli

,

CoRR, 2023

What is the Inductive Bias of Flatness Regularization? A Study of Deep Matrix Factorization Models.

[DOI]

Khashayar Gatmiry

,

,

,

Sashank J. Reddi

,

Stefanie Jegelka

,

Ching-Yao Chuang

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Efficient Training of Language Models using Few-Shot Learning.

[DOI]

Sashank J. Reddi

,

Sobhan Miryoosefi

,

,

Shankar Krishnan

,

,

,

Proceedings of the International Conference on Machine Learning, 2023

The Lazy Neuron Phenomenon: On Emergence of Activation Sparsity in Transformers.

[DOI]

,

,

Srinadh Bhojanapalli

,

,

Ankit Singh Rawat

,

Sashank J. Reddi

,

,

,

,

,

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Differentially Private Adaptive Optimization with Delayed Preconditioners.

[DOI]

,

,

,

Sashank J. Reddi

,

Hugh Brendan McMahan

,

Proceedings of the Eleventh International Conference on Learning Representations, 2023

2022

On the Algorithmic Stability and Generalization of Adaptive Optimization Methods.

[DOI]

,

,

Sashank J. Reddi

,

Barnabás Póczos

CoRR, 2022

Large Models are Parsimonious Learners: Activation Sparsity in Trained Transformers.

[DOI]

,

,

Srinadh Bhojanapalli

,

,

Ankit Singh Rawat

,

Sashank J. Reddi

,

,

,

,

,

CoRR, 2022

FedLite: A Scalable Approach for Federated Learning on Resource-constrained Clients.

[DOI]

,

,

Ankit Singh Rawat

,

Sashank J. Reddi

,

,

,

CoRR, 2022

In defense of dual-encoders for neural ranking.

[DOI]

Aditya Krishna Menon

,

Sadeep Jayasumana

,

Ankit Singh Rawat

,

,

Sashank J. Reddi

,

Proceedings of the International Conference on Machine Learning, 2022

Private Adaptive Optimization with Side information.

[DOI]

,

,

Sashank J. Reddi

,

Proceedings of the International Conference on Machine Learning, 2022

Robust Training of Neural Networks Using Scale Invariant Architectures.

[DOI]

,

Srinadh Bhojanapalli

,

,

Sashank J. Reddi

,

Proceedings of the International Conference on Machine Learning, 2022

2021

A Field Guide to Federated Optimization.

[DOI]

,

Zachary Charles

,

,

,

H. Brendan McMahan

,

Blaise Agüera y Arcas

,

Maruan Al-Shedivat

,

,

Salman Avestimehr

,

,

,

Suhas N. Diggavi

,

,

Advait Gadhikar

,

Zachary Garrett

,

Antonious M. Girgis

,

,

,

,

Samuel Horváth

,

,

,

,

,

,

,

Sai Praneeth Karimireddy

,

,

,

,

,

,

,

Sashank J. Reddi

,

Peter Richtárik

,

,

,

Mahdi Soltanolkotabi

,

,

Ananda Theertha Suresh

,

Sebastian U. Stich

,

Ameet Talwalkar

,

,

Blake E. Woodworth

,

,

,

,

,

,

,

Chunxiang Zheng

,

,

CoRR, 2021

Distilling Double Descent.

[DOI]

,

Aditya Krishna Menon

,

Harikrishna Narasimhan

,

Ankit Singh Rawat

,

Sashank J. Reddi

,

CoRR, 2021

Efficient Training of Retrieval Models using Negative Cache.

[DOI]

,

Sashank J. Reddi

,

,

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Breaking the centralized barrier for cross-device federated learning.

[DOI]

Sai Praneeth Karimireddy

,

,

,

,

Sashank J. Reddi

,

Sebastian U. Stich

,

Ananda Theertha Suresh

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Federated Composite Optimization.

[DOI]

,

,

Sashank J. Reddi

Proceedings of the 38th International Conference on Machine Learning, 2021

Disentangling Sampling and Labeling Bias for Learning in Large-output Spaces.

[DOI]

Ankit Singh Rawat

,

Aditya Krishna Menon

,

Wittawat Jitkrittum

,

Sadeep Jayasumana

,

,

Sashank J. Reddi

,

Proceedings of the 38th International Conference on Machine Learning, 2021

A statistical perspective on distillation.

[DOI]

Aditya Krishna Menon

,

Ankit Singh Rawat

,

Sashank J. Reddi

,

,

Proceedings of the 38th International Conference on Machine Learning, 2021

Adaptive Federated Optimization.

[DOI]

Sashank J. Reddi

,

Zachary Charles

,

,

Zachary Garrett

,

,

,

,

Hugh Brendan McMahan

Proceedings of the 9th International Conference on Learning Representations, 2021

RankDistil: Knowledge Distillation for Ranking.

[DOI]

Sashank J. Reddi

,

Rama Kumar Pasumarthi

,

Aditya Krishna Menon

,

Ankit Singh Rawat

,

,

,

,

Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, 2021

2020

Mime: Mimicking Centralized Stochastic Algorithms in Federated Learning.

[DOI]

Sai Praneeth Karimireddy

,

,

,

,

Sashank J. Reddi

,

Sebastian U. Stich

,

Ananda Theertha Suresh

CoRR, 2020

Why distillation helps: a statistical perspective.

[DOI]

Aditya Krishna Menon

,

Ankit Singh Rawat

,

Sashank J. Reddi

,

,

CoRR, 2020

Doubly-stochastic mining for heterogeneous retrieval.

[DOI]

Ankit Singh Rawat

,

Aditya Krishna Menon

,

,

,

Sashank J. Reddi

,

CoRR, 2020

Adaptive Sampling Distributed Stochastic Variance Reduced Gradient for Heterogeneous Distributed Datasets.

[DOI]

Ilqar Ramazanli

,

,

,

Sashank J. Reddi

,

Barnabás Póczos

CoRR, 2020

Why are Adaptive Methods Good for Attention Models?

[DOI]

,

Sai Praneeth Karimireddy

,

,

,

Sashank J. Reddi

,

,

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

O(n) Connections are Expressive Enough: Universal Approximability of Sparse Transformers.

[DOI]

,

,

Srinadh Bhojanapalli

,

Ankit Singh Rawat

,

Sashank J. Reddi

,

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

SCAFFOLD: Stochastic Controlled Averaging for Federated Learning.

[DOI]

Sai Praneeth Karimireddy

,

,

,

Sashank J. Reddi

,

Sebastian U. Stich

,

Ananda Theertha Suresh

Proceedings of the 37th International Conference on Machine Learning, 2020

Low-Rank Bottleneck in Multi-head Attention Models.

[DOI]

Srinadh Bhojanapalli

,

,

Ankit Singh Rawat

,

Sashank J. Reddi

,

Proceedings of the 37th International Conference on Machine Learning, 2020

Are Transformers universal approximators of sequence-to-sequence functions?

[DOI]

,

Srinadh Bhojanapalli

,

Ankit Singh Rawat

,

Sashank J. Reddi

,

Proceedings of the 8th International Conference on Learning Representations, 2020

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes.

[DOI]

,

,

Sashank J. Reddi

,

,

,

Srinadh Bhojanapalli

,

,

,

,

Proceedings of the 8th International Conference on Learning Representations, 2020

Learning to Learn by Zeroth-Order Oracle.

[DOI]

,

,

Sashank J. Reddi

,

,

Proceedings of the 8th International Conference on Learning Representations, 2020

Can gradient clipping mitigate label noise?

[DOI]

Aditya Krishna Menon

,

Ankit Singh Rawat

,

Sashank J. Reddi

,

Proceedings of the 8th International Conference on Learning Representations, 2020

2019

Why ADAM Beats SGD for Attention Models.

[DOI]

,

Sai Praneeth Karimireddy

,

,

,

Sashank J. Reddi

,

,

CoRR, 2019

SCAFFOLD: Stochastic Controlled Averaging for On-Device Federated Learning.

[DOI]

Sai Praneeth Karimireddy

,

,

,

Sashank J. Reddi

,

Sebastian U. Stich

,

Ananda Theertha Suresh

CoRR, 2019

AdaCliP: Adaptive Clipping for Private SGD.

[DOI]

Venkatadheeraj Pichapati

,

Ananda Theertha Suresh

,

,

Sashank J. Reddi

,

CoRR, 2019

Breaking the Glass Ceiling for Embedding-Based Classifiers for Large Output Spaces.

[DOI]

,

,

,

Daniel Niels Holtmann-Rice

,

,

Sashank J. Reddi

,

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Escaping Saddle Points with Adaptive Gradient Methods.

[DOI]

,

Sashank J. Reddi

,

,

,

Proceedings of the 36th International Conference on Machine Learning, 2019

Stochastic Negative Mining for Learning with Large Output Spaces.

[DOI]

Sashank J. Reddi

,

,

,

Daniel Niels Holtmann-Rice

,

,

Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, 2019

2018

Adaptive Methods for Nonconvex Optimization.

[DOI]

,

Sashank J. Reddi

,

Devendra Singh Sachan

,

,

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

On the Convergence of Adam and Beyond.

[DOI]

Sashank J. Reddi

,

,

Proceedings of the 6th International Conference on Learning Representations, 2018

A Generic Approach for Escaping Saddle points.

[DOI]

Sashank J. Reddi

,

,

,

Barnabás Póczos

,

Francis R. Bach

,

Ruslan Salakhutdinov

,

Alexander J. Smola

Proceedings of the International Conference on Artificial Intelligence and Statistics, 2018

2017

New Optimization Methods for Modern Machine Learning.

[DOI]

Sashank J. Reddi

PhD thesis, 2017

2016

Fast stochastic optimization on Riemannian manifolds.

[DOI]

,

Sashank J. Reddi

,

CoRR, 2016

Fast Stochastic Methods for Nonsmooth Nonconvex Optimization.

[DOI]

Sashank J. Reddi

,

,

Barnabás Póczos

,

Alexander J. Smola

CoRR, 2016

Fast Incremental Method for Nonconvex Optimization.

[DOI]

Sashank J. Reddi

,

,

Barnabás Póczos

,

Alexander J. Smola

CoRR, 2016

AIDE: Fast and Communication Efficient Distributed Optimization.

[DOI]

Sashank J. Reddi

,

,

Peter Richtárik

,

Barnabás Póczos

,

Alexander J. Smola

CoRR, 2016

Riemannian SVRG: Fast Stochastic Optimization on Riemannian Manifolds.

[DOI]

,

Sashank J. Reddi

,

Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Proximal Stochastic Methods for Nonsmooth Nonconvex Finite-Sum Optimization.

[DOI]

Sashank J. Reddi

,

,

Barnabás Póczos

,

Alexander J. Smola

Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Variance Reduction in Stochastic Gradient Langevin Dynamics.

[DOI]

Kumar Avinava Dubey

,

Sashank J. Reddi

,

Sinead A. Williamson

,

Barnabás Póczos

,

Alexander J. Smola

,

Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Stochastic Variance Reduction for Nonconvex Optimization.

[DOI]

Sashank J. Reddi

,

,

,

Barnabás Póczos

,

Alexander J. Smola

Proceedings of the 33nd International Conference on Machine Learning, 2016

Fast incremental method for smooth nonconvex optimization.

[DOI]

Sashank J. Reddi

,

,

Barnabás Póczos

,

Alexander J. Smola

Proceedings of the 55th IEEE Conference on Decision and Control, 2016

Stochastic Frank-Wolfe methods for nonconvex optimization.

[DOI]

Sashank J. Reddi

,

,

Barnabás Póczos

,

Alexander J. Smola

Proceedings of the 54th Annual Allerton Conference on Communication, 2016

2015

Adaptivity and Computation-Statistics Tradeoffs for Kernel and Distance based High Dimensional Two Sample Testing.

[DOI]

,

Sashank J. Reddi

,

Barnabás Póczos

,

,

Larry A. Wasserman

CoRR, 2015

Communication Efficient Coresets for Empirical Loss Minimization.

[DOI]

Sashank J. Reddi

,

Barnabás Póczos

,

Alexander J. Smola

Proceedings of the Thirty-First Conference on Uncertainty in Artificial Intelligence, 2015

Large-scale randomized-coordinate descent methods with non-separable linear constraints.

[DOI]

Sashank J. Reddi

,

,

,

,

Proceedings of the Thirty-First Conference on Uncertainty in Artificial Intelligence, 2015

On Variance Reduction in Stochastic Gradient Descent and its Asynchronous Variants.

[DOI]

Sashank J. Reddi

,

,

,

Barnabás Póczos

,

Alexander J. Smola

Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015

On the High Dimensional Power of a Linear-Time Two Sample Test under Mean-shift Alternatives.

[DOI]

Sashank J. Reddi

,

,

Barnabás Póczos

,

,

Larry A. Wasserman

Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, 2015

Doubly Robust Covariate Shift Correction.

[DOI]

Sashank Jakkam Reddi

,

Barnabás Póczos

,

Alexander J. Smola

Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015

On the Decreasing Power of Kernel and Distance Based Nonparametric Hypothesis Tests in High Dimensions.

[DOI]

,

Sashank Jakkam Reddi

,

Barnabás Póczos

,

,

Larry A. Wasserman

Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015

2014

Kernel MMD, the Median Heuristic and Distance Correlation in High Dimensions.

[DOI]

Sashank J. Reddi

,

,

Barnabás Póczos

,

,

Larry A. Wasserman

CoRR, 2014

On the High-dimensional Power of Linear-time Kernel Two-Sample Testing under Mean-difference Alternatives.

[DOI]

,

Sashank J. Reddi

,

Barnabás Póczos

,

,

Larry A. Wasserman

CoRR, 2014

k-NN Regression on Functional Data with Incomplete Observations.

[DOI]

Sashank J. Reddi

,

Barnabás Póczos

Proceedings of the Thirtieth Conference on Uncertainty in Artificial Intelligence, 2014

2013

Scale Invariant Conditional Dependence Measures.

[DOI]

Sashank J. Reddi

,

Barnabás Póczos

Proceedings of the 30th International Conference on Machine Learning, 2013

2012

Incentive Decision Processes.

[DOI]

Sashank Jakkam Reddi

,

Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence, 2012

A Maximum Likelihood Approach For Selecting Sets of Alternatives.

[DOI]

Ariel D. Procaccia

,

Sashank Jakkam Reddi

,

Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence, 2012

2010

MAP estimation in Binary MRFs via Bipartite Multi-cuts.

[DOI]

Sashank Jakkam Reddi

,

Sunita Sarawagi

,

Sundar Vishwanathan

Proceedings of the Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010. Proceedings of a meeting held 6-9 December 2010, 2010

Loading...