Gabriele Oliaro

Orcid: 0000-0001-5406-0736

According to our database1, Gabriele Oliaro authored at least 20 papers between 2021 and 2026.

Collaborative distances:

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

Online presence:

On csauthors.net:

Bibliography

2026
FastKernels: Benchmarking GPU Kernel Generation in Production.
CoRR, May, 2026

Event Tensor: A Unified Abstraction for Compiling Dynamic Megakernel.
CoRR, April, 2026

Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems.
ACM Comput. Surv., January, 2026

FlexLLM: Token-Level Co-Serving of LLM Inference and Finetuning with SLO Guarantees.
Proceedings of the 23rd USENIX Symposium on Networked Systems Design and Implementation, 2026

AdaServe: Accelerating Multi-SLO LLM Serving with SLO-Customized Speculative Decoding.
Proceedings of the 21st European Conference on Computer Systems, 2026

2025
OWL: Overcoming Window Length-Dependence in Speculative Decoding for Long-Context Inputs.
CoRR, October, 2025

SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning.
CoRR, April, 2025

AdaServe: SLO-Customized LLM Serving with Fine-Grained Speculative Decoding.
CoRR, January, 2025

SuffixDecoding: Extreme Speculative Decoding for Emerging AI Applications.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

2024
SuffixDecoding: A Model-Free Approach to Speeding Up Large Language Model Inference.
CoRR, 2024

FlexLLM: A System for Co-Serving Large Language Model Inference and Parameter-Efficient Finetuning.
CoRR, 2024

Quantized Side Tuning: Fast and Memory-Efficient Tuning of Quantized Large Language Models.
CoRR, 2024

Reproducibility Report for ACM SIGMOD 2024 Paper: 'Hierarchical Cut Labelling - Scaling Up Distance Queries on Road Networks'.
Proceedings of the Reproducibility Reports of the 2024 International Conference on Management of Data, 2024

SpecInfer: Accelerating Large Language Model Serving with Tree-based Speculative Inference and Verification.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

Optimal Kernel Orchestration for Tensor Programs with Korch.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

Quantized Side Tuning: Fast and Memory-Efficient Tuning of Quantized Large Language Models.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023
SpecInfer: Accelerating Generative LLM Serving with Speculative Inference and Token Tree Verification.
CoRR, 2023

Direct Telemetry Access.
Proceedings of the ACM SIGCOMM 2023 Conference, 2023

2022
Direct Telemetry Access.
CoRR, 2022

2021
Zero-CPU Collection with Direct Telemetry Access.
Proceedings of the HotNets '21: The 20th ACM Workshop on Hot Topics in Networks, 2021


  Loading...