Elias Frantar

Orcid: 0009-0004-8073-8845

According to our database1, Elias Frantar authored at least 32 papers between 2020 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
Compression Scaling Laws:Unifying Sparsity and Quantization.
CoRR, February, 2025

TACO Vision Models Can Be Efficiently Specialized via Few-Shot Task-Aware Compression.
Trans. Mach. Learn. Res., 2025

MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models.
Proceedings of the 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2025

2024
Accurate Neural Network Pruning Requires Rethinking Sparse Optimization.
Trans. Mach. Learn. Res., 2024

L-GreCo: Layerwise-adaptive Gradient Compression For Efficient Data-parallel Deep Learning.
Proceedings of the Seventh Annual Conference on Machine Learning and Systems, 2024

QMoE: Sub-1-Bit Compression of Trillion Parameter Models.
Proceedings of the Seventh Annual Conference on Machine Learning and Systems, 2024

Error Feedback Can Accurately Compress Preconditioners.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

SPADE: Sparsity-Guided Debugging for Deep Neural Networks.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Extreme Compression of Large Language Models via Additive Quantization.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Scaling Laws for Sparsely-Connected Foundation Models.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

QUIK: Towards End-to-end 4-Bit Inference on Generative Large Language Models.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

2023
QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models.
CoRR, 2023

Towards End-to-end 4-Bit Inference on Generative Large Language Models.
CoRR, 2023

Sparse Fine-tuning for Inference Acceleration of Large Language Models.
CoRR, 2023

QIGen: Generating Efficient Kernels for Quantized Inference on Large Language Models.
CoRR, 2023

JaxPruner: A concise library for sparsity research.
CoRR, 2023

Vision Models Can Be Efficiently Specialized via Few-Shot Task-Aware Compression.
CoRR, 2023

ZipLM: Hardware-Aware Structured Pruning of Language Models.
CoRR, 2023

CAP: Correlation-Aware Pruning for Highly-Accurate Sparse Vision Models.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

ZipLM: Inference-Aware Structured Pruning of Language Models.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

SparseGPT: Massive Language Models Can be Accurately Pruned in One-Shot.
Proceedings of the International Conference on Machine Learning, 2023

OPTQ: Accurate Quantization for Generative Pre-trained Transformers.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

2022
L-GreCo: An Efficient and General Framework for Layerwise-Adaptive Gradient Compression.
CoRR, 2022

GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers.
CoRR, 2022

oViT: An Accurate Second-Order Pruning Framework for Vision Transformers.
CoRR, 2022

Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

SPDY: Accurate Pruning with Speedup Guarantees.
Proceedings of the International Conference on Machine Learning, 2022

The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

2021
Efficient Matrix-Free Approximations of Second-Order Information, with Applications to Pruning and Optimization.
CoRR, 2021

M-FAC: Efficient Matrix-Free Approximations of Second-Order Information.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

2020
On the Sample Complexity of Adversarial Multi-Source PAC Learning.
Proceedings of the 37th International Conference on Machine Learning, 2020


  Loading...