Dimitris S. Papailiopoulos

CoRR, May, 2026

Open-World Evaluations for Measuring Frontier AI Capabilities.

[BibT_eX]

[DOI]

CoRR, May, 2026

MEMENTO: Teaching LLMs to Manage Their Own Context.

[BibT_eX]

[DOI]

CoRR, April, 2026

Endless Terminals: Scaling RL Environments for Terminal Agents.

[BibT_eX]

[DOI]

Kanishk Gandhi

Shivam Garg

Noah D. Goodman

CoRR, January, 2026

2025

Wait, Wait, Wait... Why Do Reasoning Models Loop?

[BibT_eX]

[DOI]

Charilaos Pipis

Shivam Garg

Vasilis Kontonis

Vaishnavi Shrivastava

Akshay Krishnamurthy

CoRR, December, 2025

ReJump: A Tree-Jump Representation for Analyzing and Improving LLM Reasoning.

[BibT_eX]

[DOI]

CoRR, December, 2025

Not All Bits Are Equal: Scale-Dependent Memory Optimization Strategies for Reasoning Models.

[BibT_eX]

[DOI]

CoRR, October, 2025

Sample More to Think Less: Group Filtered Policy Optimization for Concise Reasoning.

[BibT_eX]

[DOI]

Vaishnavi Shrivastava

Ahmed Awadallah

Vidhisha Balachandran

Shivam Garg

Harkirat S. Behl

CoRR, August, 2025

Extrapolation by Association: Length Generalization Transfer in Transformers.

[BibT_eX]

[DOI]

CoRR, June, 2025

Phi-4-reasoning Technical Report.

[BibT_eX]

[DOI]

CoRR, April, 2025

Task Vectors in In-Context Learning: Emergence, Formation, and Benefit.

[BibT_eX]

[DOI]

Ziqian Lin

Robert D. Nowak

CoRR, January, 2025

Everything Everywhere All at Once: LLMs can In-Context Learn Multiple Tasks in Superposition.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Self-Improving Transformers Overcome Easy-to-Hard and Length Generalization Challenges.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Lexico: Extreme KV Cache Compression via Sparse Coding over Universal Dictionaries.

[BibT_eX]

[DOI]

Junhyuck Kim

Jongho Park

Jaewoong Cho

Proceedings of the Forty-second International Conference on Machine Learning, 2025

VersaPRM: Multi-Domain Process Reward Model via Synthetic Reasoning Data.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

From Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs by Finetuning on Synthetic Data.

[BibT_eX]

[DOI]

Zheyang Xiong

Vasilis Papageorgiou

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

How Well Can Transformers Emulate In-Context Newton's Method?

[BibT_eX]

[DOI]

Tianhao Wang

Jason D. Lee

Proceedings of the International Conference on Artificial Intelligence and Statistics, 2025

2024

Predictive Pipelined Decoding: A Compute-Latency Trade-off for Exact LLM Decoding.

[BibT_eX]

[DOI]

Seongjun Yang

Gibbeum Lee

Jaewoong Cho

Trans. Mach. Learn. Res., 2024

Mini-Batch Optimization of Contrastive Loss.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2024

Can Mamba Learn How To Learn? A Comparative Study on In-Context Learning Tasks.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

CHAI: Clustered Head Attention for Efficient LLM Inference.

[BibT_eX]

[DOI]

Carole-Jean Wu

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Looped Transformers are Better at Learning Learning Algorithms.

[BibT_eX]

[DOI]

Robert D. Nowak

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Teaching Arithmetic to Small Transformers.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

2023

Dissecting Chain-of-Thought: A Study on Compositional In-Context Learning of MLPs.

[BibT_eX]

[DOI]

CoRR, 2023

The Expressive Power of Tuning Only the Norm Layers.

[BibT_eX]

[DOI]

CoRR, 2023

Transformers as Algorithms: Generalization and Implicit Model Selection in In-context Learning.

[BibT_eX]

[DOI]

Muhammed Emrullah Ildiz

CoRR, 2023

Dissecting Chain-of-Thought: Compositionality through In-Context Filtering and Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Cuttlefish: Low-Rank Model Training without All the Tuning.

[BibT_eX]

[DOI]

Pongsakorn U.-Chupala

Yoshiki Tanaka

Eric P. Xing

Proceedings of the Sixth Conference on Machine Learning and Systems, 2023

Transformers as Algorithms: Generalization and Stability in In-context Learning.

[BibT_eX]

[DOI]

Muhammed Emrullah Ildiz

Proceedings of the International Conference on Machine Learning, 2023

Looped Transformers as Programmable Computers.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

The Expressive Power of Tuning Only the Normalization Layers.

[BibT_eX]

[DOI]

Proceedings of the Thirty Sixth Annual Conference on Learning Theory, 2023

Prompted LLMs as Chatbot Modules for Long Open-domain Conversation.

[BibT_eX]

[DOI]

Gibbeum Lee

Volker Hartmann

Jongho Park

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

2022

A Better Way to Decay: Proximal Gradient Training Algorithms for Neural Nets.

[BibT_eX]

[DOI]

Jifan Zhang

Joseph Shenouda

Robert D. Nowak

CoRR, 2022

Rare Gems: Finding Lottery Tickets at Initialization.

[BibT_eX]

[DOI]

CoRR, 2022

Rare Gems: Finding Lottery Tickets at Initialization.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

LIFT: Language-Interfaced Fine-Tuning for Non-language Machine Learning Tasks.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

On the Utility of Gradient Compression in Distributed Training Systems.

[BibT_eX]

[DOI]

Proceedings of the Fifth Conference on Machine Learning and Systems, 2022

GenLabel: Mixup Relabeling using Generative Models.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2022

Permutation-Based SGD: Is Random Optimal?

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Learning Representations, 2022

Utilizing Language-Image Pretraining for Efficient and Robust Bilingual Word Alignment.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

Finding Nearly Everything within Random Binary Networks.

[BibT_eX]

[DOI]

Jy-yong Sohn

Proceedings of the International Conference on Artificial Intelligence and Statistics, 2022

2021

Finding Everything within Random Binary Networks.

[BibT_eX]

[DOI]

Jy-yong Sohn

CoRR, 2021

An Exponential Improvement on the Memorization Capacity of Deep Threshold Networks.

[BibT_eX]

[DOI]

Amin Karbasi

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Pufferfish: Communication-efficient Models At No Extra Cost.

[BibT_eX]

[DOI]

Proceedings of the Fourth Conference on Machine Learning and Systems, 2021

Adaptive Gradient Communication via Critical Learning Regime Identification.

[BibT_eX]

[DOI]

Proceedings of the Fourth Conference on Machine Learning and Systems, 2021

2020

Machine Learning From Distributed, Streaming Data [From the Guest Editors].

[BibT_eX]

[DOI]

Waheed U. Bajwa

Volkan Cevher

Anna Scaglione

IEEE Signal Process. Mag., 2020

Accordion: Adaptive Gradient Communication via Critical Learning Regime Identification.

[BibT_eX]

[DOI]

CoRR, 2020

Optimal Lottery Tickets via SubsetSum: Logarithmic Over-Parameterization is Sufficient.

[BibT_eX]

[DOI]

CoRR, 2020

Attack of the Tails: Yes, You Really Can Backdoor Federated Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Optimal Lottery Tickets via Subset Sum: Logarithmic Over-Parameterization is Sufficient.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Bad Global Minima Exist and SGD Can Reach Them.

[BibT_eX]

[DOI]

Shengchao Liu

Dimitris Achlioptas

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Closing the convergence gap of SGD without replacement.

[BibT_eX]

[DOI]

Anant Gupta

Proceedings of the 37th International Conference on Machine Learning, 2020

Federated Learning with Matched Averaging.

[BibT_eX]

[DOI]

Mikhail Yurochkin

Yuekai Sun

Yasaman Khazaeni

Proceedings of the 8th International Conference on Learning Representations, 2020

2019

Convergence and Margin of Adversarial Training on Separable Data.

[BibT_eX]

[DOI]

Stephen J. Wright

CoRR, 2019

SysML: The New Frontier of Machine Learning Systems.

[BibT_eX]

[DOI]

CoRR, 2019

ErasureHead: Distributed Gradient Descent without Delays Using Approximate Gradient Coding.

[BibT_eX]

[DOI]

CoRR, 2019

DETOX: A Redundancy-based Framework for Faster and More Robust Gradient Aggregation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Does Data Augmentation Lead to Positive Margin?

[BibT_eX]

[DOI]

Proceedings of the 36th International Conference on Machine Learning, 2019

A Geometric Perspective on the Transferability of Adversarial Directions.

[BibT_eX]

[DOI]

Harrison Rosenberg

Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, 2019

2018

Coding Theory for Inference, Learning and Optimization (Dagstuhl Seminar 18112).

[BibT_eX]

[DOI]

Po-Ling Loh

Arya Mazumdar

Rüdiger L. Urbanke

Dagstuhl Reports, 2018

Gradient Coding via the Stochastic Block Model.

[BibT_eX]

[DOI]

CoRR, 2018

DRACO: Robust Distributed Training via Redundant Gradients.

[BibT_eX]

[DOI]

Lingjiao Chen

CoRR, 2018

ATOMO: Communication-efficient Learning via Atomic Sparsification.

[BibT_eX]

[DOI]

Stephen J. Wright

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

The Effect of Network Width on the Performance of Large-batch Training.

[BibT_eX]

[DOI]

Lingjiao Chen

Jinman Zhao

Paraschos Koutris

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Gradient Coding Using the Stochastic Block Model.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Symposium on Information Theory, 2018

DRACO: Byzantine-resilient Distributed Training via Redundant Gradients.

[BibT_eX]

[DOI]

Lingjiao Chen

Proceedings of the 35th International Conference on Machine Learning, 2018

Stability and Generalization of Learning Algorithms that Converge to Global Optima.

[BibT_eX]

[DOI]

Proceedings of the 35th International Conference on Machine Learning, 2018

Gradient Diversity: a Key Ingredient for Scalable Distributed Learning.

[BibT_eX]

[DOI]

Dong Yin

Ashwin Pananjady

Peter L. Bartlett

Proceedings of the International Conference on Artificial Intelligence and Statistics, 2018

2017

Perturbed Iterate Analysis for Asynchronous Stochastic Optimization.

[BibT_eX]

[DOI]

Horia Mania

Xinghao Pan

Benjamin Recht

Michael I. Jordan

SIAM J. Optim., 2017

Approximate Gradient Coding via Sparse Random Graphs.

[BibT_eX]

[DOI]

Jordan S. Ellenberg

CoRR, 2017

Gradient Diversity Empowers Distributed Learning.

[BibT_eX]

[DOI]

Dong Yin

Ashwin Pananjady

Peter L. Bartlett

CoRR, 2017

Coded computation for multicore setups.

[BibT_eX]

[DOI]

Ramtin Pedarsani

Proceedings of the 2017 IEEE International Symposium on Information Theory, 2017

2016

CYCLADES: Conflict-free Asynchronous Machine Learning.

[BibT_eX]

[DOI]

Xinghao Pan

Stephen Tu

CoRR, 2016

Speeding up distributed machine learning using codes.

[BibT_eX]

[DOI]

Ramtin Pedarsani

Proceedings of the IEEE International Symposium on Information Theory, 2016

Bipartite Correlation Clustering: Maximizing Agreements.

[BibT_eX]

[DOI]

Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, 2016

2015

On the Worst-Case Approximability of Sparse PCA.

[BibT_eX]

[DOI]

Siu On Chan

Aviad Rubinstein

CoRR, 2015

Parallel Correlation Clustering on Big Graphs.

[BibT_eX]

[DOI]

Xinghao Pan

Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015

Sparse PCA via Bipartite Matchings.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015

Orthogonal NMF through Subspace Exploration.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015

2014

The Sparse Principal Component of a Constant-Rank Matrix.

[BibT_eX]

[DOI]

IEEE Trans. Inf. Theory, 2014

Provable deterministic leverage score sampling.

[BibT_eX]

[DOI]

Christos Boutsidis

Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2014

Locality and availability in distributed storage.

[BibT_eX]

[DOI]

Ankit Singh Rawat

Sriram Vishwanath

Proceedings of the 2014 IEEE International Symposium on Information Theory, Honolulu, HI, USA, June 29, 2014

On codes with availability for distributed storage.

[BibT_eX]

[DOI]

Ankit Singh Rawat

Sriram Vishwanath

Proceedings of the 6th International Symposium on Communications, 2014

Finding Dense Subgraphs via Low-Rank Bilinear Optimization.

[BibT_eX]

[DOI]

Ioannis Mitliagkas

Constantine Caramanis

Proceedings of the 31th International Conference on Machine Learning, 2014

Nonnegative Sparse PCA with Provable Guarantees.

[BibT_eX]

[DOI]

Proceedings of the 31th International Conference on Machine Learning, 2014

Combinatorial QPs via a low-dimensional subspace sampling.

[BibT_eX]

[DOI]

Proceedings of the 48th Annual Conference on Information Sciences and Systems, 2014

2013

Maximum-Likelihood Noncoherent PAM Detection.

[BibT_eX]

[DOI]

Georgina Abou Elkheir

IEEE Trans. Commun., 2013

XORing Elephants: Novel Erasure Codes for Big Data.

[BibT_eX]

[DOI]

Maheswaran Sathiamoorthy

Ramkumar Vadali

Scott Chen

Dhruba Borthakur

Proc. VLDB Endow., 2013

Optimal locally repairable codes and connections to matroid theory.

[BibT_eX]

[DOI]

Itzhak Tamo

Proceedings of the 2013 IEEE International Symposium on Information Theory, 2013

Sparse PCA through Low-rank Approximations.

[BibT_eX]

[DOI]

Stavros Korokythakis

Proceedings of the 30th International Conference on Machine Learning, 2013

Availability and locality in distributed storage.

[BibT_eX]

[DOI]

Ankit Singh Rawat

Proceedings of the IEEE Global Conference on Signal and Information Processing, 2013

2012

Feedback in the K-user interference channel.

[BibT_eX]

[DOI]

Changho Suh

Proceedings of the 2012 IEEE International Symposium on Information Theory, 2012

Locally repairable codes.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Symposium on Information Theory, 2012

Simple regenerating codes: Network coding for cloud storage.

[BibT_eX]

[DOI]

Jianqiang Luo

Cheng Huang

Jin Li

Proceedings of the IEEE INFOCOM 2012, Orlando, FL, USA, March 25-30, 2012, 2012

Maximum-likelihood blind PAM detection.

[BibT_eX]

[DOI]

Georgina Abou Elkheir

Proceedings of IEEE International Conference on Communications, 2012

A repair framework for scalar MDS codes.

[BibT_eX]

[DOI]

Karthikeyan Shanmugam

Giuseppe Caire

Proceedings of the 50th Annual Allerton Conference on Communication, 2012

2011

Distributed storage codes through Hadamard designs.

[BibT_eX]

[DOI]

Proceedings of the 2011 IEEE International Symposium on Information Theory Proceedings, 2011

Sparse principal component of a rank-deficient matrix.

[BibT_eX]

[DOI]

Proceedings of the 2011 IEEE International Symposium on Information Theory Proceedings, 2011

Repair optimal erasure codes through hadamard designs.

[BibT_eX]

[DOI]

Viveck R. Cadambe

Proceedings of the 49th Annual Allerton Conference on Communication, 2011

2010

Maximum-likelihood noncoherent OSTBC detection with polynomial complexity.

[BibT_eX]

[DOI]

IEEE Trans. Wirel. Commun., 2010

Interference Alignment as a Rank Constrained Rank Minimization.

[BibT_eX]

[DOI]

Proceedings of the Global Communications Conference, 2010

Distributed storage codes meet multiple-access wiretap channels.

[BibT_eX]

[DOI]

Proceedings of the 48th Annual Allerton Conference on Communication, 2010

MCMC methods for integer least-squares problems.

[BibT_eX]

[DOI]

Babak Hassibi

Proceedings of the 48th Annual Allerton Conference on Communication, 2010

2008

Polynomial-complexity maximum-likelihood block noncoherent MPSK detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2008

Efficient computation of the M-phase vector that maximizes a rank-deficient quadratic form.

[BibT_eX]

[DOI]

Proceedings of the 42nd Annual Conference on Information Sciences and Systems, 2008

Efficient maximum-likelihood noncoherent orthogonal STBC detection.

[BibT_eX]

[DOI]