Tijmen Blankevoort

Raghuraman Krishnamoorthi

Vikas Chandra

CoRR, February, 2025

ParetoQ: Improving Scaling Laws in Extremely Low-bit LLM Quantization.

[BibT_eX]

[DOI]

Raghuraman Krishnamoorthi

Vikas Chandra

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

SpinQuant: LLM Quantization with Learned Rotations.

[BibT_eX]

[DOI]

Vikas Chandra

Yuandong Tian

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Bitune: Leveraging Bidirectional Attention to Improve Decoder-Only LLMs.

[BibT_eX]

[DOI]

Dawid Jan Kopiczko

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

2024

Llama Guard 3-1B-INT4: Compact and Efficient Safeguard for Human-AI Conversations.

[BibT_eX]

[DOI]

CoRR, 2024

Bitune: Bidirectional Instruction-Tuning.

[BibT_eX]

[DOI]

Dawid Jan Kopiczko

CoRR, 2024

Think Big, Generate Quick: LLM-to-SLM for Fast Autoregressive Decoding.

[BibT_eX]

[DOI]

Tycho F. A. van der Ouderaa

CoRR, 2024

GPTVQ: The Blessing of Dimensionality for LLM Quantization.

[BibT_eX]

[DOI]

CoRR, 2024

The LLM Surgeon.

[BibT_eX]

[DOI]

Markus Nagel

Mart van Baalen

Proceedings of the Twelfth International Conference on Learning Representations, 2024

VeRA: Vector-based Random Matrix Adaptation.

[BibT_eX]

[DOI]

Dawid Jan Kopiczko

Proceedings of the Twelfth International Conference on Learning Representations, 2024

InterroGate: Learning to Share, Specialize, and Prune Representations for Multi-task Learning.

[BibT_eX]

[DOI]

Tycho F. A. van der Ouderaa

Proceedings of the 35th British Machine Vision Conference, 2024

2023

The LLM Surgeon.

[BibT_eX]

[DOI]

CoRR, 2023

FP8 versus INT8 for efficient deep learning inference.

[BibT_eX]

[DOI]

CoRR, 2023

Scalarization for Multi-Task and Multi-Domain Learning at Scale.

[BibT_eX]

[DOI]

Amelie Royer

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Pruning vs Quantization: Which is Better?

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Quantizable Transformers: Removing Outliers by Helping Attention Heads Do Nothing.

[BibT_eX]

[DOI]

Yelysei Bondarenko

Markus Nagel

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

QBitOpt: Fast and Accurate Bitwidth Reallocation during Training.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

MSViT: Dynamic Mixed-scale Tokenization for Vision Transformers.

[BibT_eX]

[DOI]

Jakob Drachmann Havtorn

Amélie Royer

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Efficient Neural PDE-Solvers using Quantization Aware Training.

[BibT_eX]

[DOI]

Winfried van den Dool

Max Welling

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

A Practical Mixed Precision Algorithm for Post-Training Quantization.

[BibT_eX]

[DOI]

Proceedings of the 34th British Machine Vision Conference Workshop Proceedings, 2023

2022

Neural Network Quantization with AI Model Efficiency Toolkit (AIMET).

[BibT_eX]

[DOI]

CoRR, 2022

FP8 Quantization: The Power of the Exponent.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Overcoming Oscillations in Quantization-Aware Training.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2022

Cyclical Pruning for Sparse Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2022

Simple and Efficient Architectures for Semantic Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2022

Simulated Quantization, Real Power Savings.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2022

Revisiting single-gated Mixtures of Experts.

[BibT_eX]

[DOI]

Amelie Royer

Ilia Karmanov

Andrii Skliar

Proceedings of the 33rd British Machine Vision Conference 2022, 2022

2021

A White Paper on Neural Network Quantization.

[BibT_eX]

[DOI]

CoRR, 2021

Distilling Optimal Neural Networks: Rapid Search in Diverse Spaces.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Understanding and Overcoming the Challenges of Efficient Transformer Quantization.

[BibT_eX]

[DOI]

Yelysei Bondarenko

Markus Nagel

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

2020

Learned Threshold Pruning.

[BibT_eX]

[DOI]

CoRR, 2020

Gradient 𝓁<sub>1</sub> Regularization for Quantization Robustness.

[BibT_eX]

[DOI]

CoRR, 2020

Bayesian Bits: Unifying Quantization and Pruning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Up or Down? Adaptive Rounding for Post-Training Quantization.

[BibT_eX]

[DOI]

Proceedings of the 37th International Conference on Machine Learning, 2020

Batch-shaping for learning conditional channel gated networks.

[BibT_eX]

[DOI]

Max Welling

Proceedings of the 8th International Conference on Learning Representations, 2020

Gradient $\ell_1$ Regularization for Quantization Robustness.

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Learning Representations, 2020

Differentiable Joint Pruning and Quantization for Hardware Efficiency.

[BibT_eX]

[DOI]

Ying Wang

Yadong Lu

Proceedings of the Computer Vision - ECCV 2020, 2020

LSQ+: Improving low-bit quantization through learnable offsets and better initialization.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Conditional Channel Gated Networks for Task-Aware Continual Learning.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019

Taxonomy and Evaluation of Structured Compression of Convolutional Neural Networks.

[BibT_eX]

[DOI]

CoRR, 2019

Batch-Shaped Channel Gated Networks.

[BibT_eX]

[DOI]