Rohan Anil

According to our database¹, Rohan Anil authored at least 33 papers between 2016 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2025

Fast and Simplex: 2-Simplicial Attention in Triton.

[BibT_eX]

[DOI]

CoRR, July, 2025

2024

A Little Help Goes a Long Way: Efficient LLM Training by Leveraging Small LMs.

[BibT_eX]

[DOI]

Ankit Singh Rawat

Veeranjaneyulu Sadhanala

CoRR, 2024

Michelangelo: Long Context Evaluations Beyond Haystacks via Latent Structure Queries.

[BibT_eX]

[DOI]

CoRR, 2024

Learning from straggler clients in federated learning.

[BibT_eX]

[DOI]

CoRR, 2024

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context.

[BibT_eX]

[DOI]

Jean-Baptiste Alayrac

et al.

CoRR, 2024

Combining Axes Preconditioners through Kronecker Approximation for Deep Learning.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

2023

Layerwise Bregman Representation Learning of Neural Networks with Applications to Knowledge Distillation.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2023

Heterogeneous Federated Learning Using Knowledge Codistillation.

[BibT_eX]

[DOI]

CoRR, 2023

Benchmarking Neural Network Training Algorithms.

[BibT_eX]

[DOI]

CoRR, 2023

PaLM 2 Technical Report.

[BibT_eX]

[DOI]

Kathy Meier-Hellstern

Gustavo Hernández Ábrego

Christopher A. Choquette-Choo

et al.

CoRR, 2023

Sketchy: Memory-efficient Adaptive Regularization with Frequent Directions.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

A Computationally Efficient Sparsified Online Newton Method.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

2022

Layerwise Bregman Representation Learning with Applications to Knowledge Distillation.

[BibT_eX]

[DOI]

CoRR, 2022

N-Grammer: Augmenting Transformers with latent n-grams.

[BibT_eX]

[DOI]

CoRR, 2022

Learning from Randomly Initialized Neural Network Features.

[BibT_eX]

[DOI]

CoRR, 2022

Step-size Adaptation Using Exponentiated Gradient Updates.

[BibT_eX]

[DOI]

CoRR, 2022

On the Factory Floor: ML Engineering for Industrial-Scale Ads Recommendation Models.

[BibT_eX]

[DOI]

Proceedings of the 5th Workshop on Online Recommender Systems and User Modeling co-located with the 16th ACM Conference on Recommender Systems, 2022

Large-Scale Differentially Private BERT.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

Knowledge distillation: A good teacher is patient and consistent.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

LocoProp: Enhancing BackProp via Local Loss Optimization.

[BibT_eX]

[DOI]

Ehsan Amid

Rohan Anil

Manfred K. Warmuth

Proceedings of the International Conference on Artificial Intelligence and Statistics, 2022

2021

A Large Batch Optimizer Reality Check: Traditional, Generic Optimizers Suffice Across Batch Sizes.

[BibT_eX]

[DOI]

Zachary Nado

Justin Gilmer

Christopher J. Shallue

Rohan Anil

George E. Dahl

CoRR, 2021

Efficiently Identifying Task Groupings for Multi-Task Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

2020

Measuring and Harnessing Transference in Multi-Task Learning.

[BibT_eX]

[DOI]

CoRR, 2020

Disentangling Adaptive Gradient Methods from Learning Rates.

[BibT_eX]

[DOI]

CoRR, 2020

Second Order Optimization Made Practical.

[BibT_eX]

[DOI]

CoRR, 2020

Stochastic Optimization with Laggard Data Pipelines.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

2019

Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling.

[BibT_eX]

[DOI]

CoRR, 2019

Memory-Efficient Adaptive Optimization for Large-Scale Learning.

[BibT_eX]

[DOI]

CoRR, 2019

Memory Efficient Adaptive Optimization.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Robust Bi-Tempered Logistic Loss Based on Bregman Divergences.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

TF-Ranking: Scalable TensorFlow Library for Learning-to-Rank.

[BibT_eX]

[DOI]

Rama Kumar Pasumarthi

Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2019

2018

Large scale distributed neural network training through online distillation.

[BibT_eX]

[DOI]

Proceedings of the 6th International Conference on Learning Representations, 2018

2016

Wide & Deep Learning for Recommender Systems.

[BibT_eX]

[DOI]

Proceedings of the 1st Workshop on Deep Learning for Recommender Systems, 2016

Rohan Anil

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...