Elias Frantar

According to our database1, Elias Frantar authored at least 24 papers between 2020 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Extreme Compression of Large Language Models via Additive Quantization.
CoRR, 2024

2023
QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models.
CoRR, 2023

Towards End-to-end 4-Bit Inference on Generative Large Language Models.
CoRR, 2023

Sparse Fine-tuning for Inference Acceleration of Large Language Models.
CoRR, 2023

Scaling Laws for Sparsely-Connected Foundation Models.
CoRR, 2023

Accurate Neural Network Pruning Requires Rethinking Sparse Optimization.
CoRR, 2023

QIGen: Generating Efficient Kernels for Quantized Inference on Large Language Models.
CoRR, 2023

SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression.
CoRR, 2023

JaxPruner: A concise library for sparsity research.
CoRR, 2023

Vision Models Can Be Efficiently Specialized via Few-Shot Task-Aware Compression.
CoRR, 2023

ZipLM: Hardware-Aware Structured Pruning of Language Models.
CoRR, 2023

CAP: Correlation-Aware Pruning for Highly-Accurate Sparse Vision Models.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

ZipLM: Inference-Aware Structured Pruning of Language Models.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

SparseGPT: Massive Language Models Can be Accurately Pruned in One-Shot.
Proceedings of the International Conference on Machine Learning, 2023

OPTQ: Accurate Quantization for Generative Pre-trained Transformers.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

2022
L-GreCo: An Efficient and General Framework for Layerwise-Adaptive Gradient Compression.
CoRR, 2022

GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers.
CoRR, 2022

oViT: An Accurate Second-Order Pruning Framework for Vision Transformers.
CoRR, 2022

Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

SPDY: Accurate Pruning with Speedup Guarantees.
Proceedings of the International Conference on Machine Learning, 2022

The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

2021
Efficient Matrix-Free Approximations of Second-Order Information, with Applications to Pruning and Optimization.
CoRR, 2021

M-FAC: Efficient Matrix-Free Approximations of Second-Order Information.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

2020
On the Sample Complexity of Adversarial Multi-Source PAC Learning.
Proceedings of the 37th International Conference on Machine Learning, 2020


  Loading...