Amirkeivan Mohtashami

Matteo Pagliardini

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024

QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs.

[BibT_eX]

[DOI]

Saleh Ashkboos

CoRR, 2024

DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging.

[BibT_eX]

[DOI]

Matteo Pagliardini

François Fleuret

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs.

[BibT_eX]

[DOI]

Saleh Ashkboos

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

2023

Social Learning: Towards Collaborative Learning with Large Language Models.

[BibT_eX]

[DOI]

Blaise Agüera y Arcas

CoRR, 2023

MEDITRON-70B: Scaling Medical Pretraining for Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

CoTFormer: More Tokens With Attention Make Up For Less Depth.

[BibT_eX]

[DOI]

Matteo Pagliardini

CoRR, 2023

Landmark Attention: Random-Access Infinite Context Length for Transformers.

[BibT_eX]

[DOI]

CoRR, 2023

Learning Translation Quality Evaluation on Low Resource Languages from Large Language Models.

[BibT_eX]

[DOI]

Mauro Verzetti

Paul K. Rubenstein

CoRR, 2023

Random-Access Infinite Context Length for Transformers.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Special Properties of Gradient Descent with Large Learning Rates.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

2022

On Avoiding Local Minima Using Gradient Descent With Large Learning Rates.

[BibT_eX]

[DOI]

CoRR, 2022

Characterizing & Finding Good Data Orderings for Fast Convergence of Sequential Gradient Methods.

[BibT_eX]

[DOI]

CoRR, 2022

Masked Training of Neural Networks with Partial Gradients.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Artificial Intelligence and Statistics, 2022

2021

Simultaneous Training of Partially Masked Neural Networks.

[BibT_eX]

[DOI]

CoRR, 2021

Critical Parameters for Scalable Distributed Learning with Large Batches and Asynchronous Updates.

[BibT_eX]

[DOI]

Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, 2021

2020

The Splay-List: A Distribution-Adaptive Concurrent Skip-List.

[BibT_eX]

[DOI]

Vitaly Aksenov

Dan Alistarh

Alexandra Drozdova