Gabriele Oliaro

Orcid: 0000-0001-5406-0736

According to our database¹, Gabriele Oliaro authored at least 20 papers between 2021 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Bibliography

2026

FastKernels: Benchmarking GPU Kernel Generation in Production.

[BibT_eX]

[DOI]

CoRR, May, 2026

Event Tensor: A Unified Abstraction for Compiling Dynamic Megakernel.

[BibT_eX]

[DOI]

CoRR, April, 2026

Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems.

[BibT_eX]

[DOI]

ACM Comput. Surv., January, 2026

FlexLLM: Token-Level Co-Serving of LLM Inference and Finetuning with SLO Guarantees.

[BibT_eX]

[DOI]

Proceedings of the 23rd USENIX Symposium on Networked Systems Design and Implementation, 2026

AdaServe: Accelerating Multi-SLO LLM Serving with SLO-Customized Speculative Decoding.

[BibT_eX]

[DOI]

Proceedings of the 21st European Conference on Computer Systems, 2026

2025

OWL: Overcoming Window Length-Dependence in Speculative Decoding for Long-Context Inputs.

[BibT_eX]

[DOI]

CoRR, October, 2025

SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning.

[BibT_eX]

[DOI]

CoRR, April, 2025

AdaServe: SLO-Customized LLM Serving with Fine-Grained Speculative Decoding.

[BibT_eX]

[DOI]

CoRR, January, 2025

SuffixDecoding: Extreme Speculative Decoding for Emerging AI Applications.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

2024

SuffixDecoding: A Model-Free Approach to Speeding Up Large Language Model Inference.

[BibT_eX]

[DOI]

CoRR, 2024

FlexLLM: A System for Co-Serving Large Language Model Inference and Parameter-Efficient Finetuning.

[BibT_eX]

[DOI]

CoRR, 2024

Quantized Side Tuning: Fast and Memory-Efficient Tuning of Quantized Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

Reproducibility Report for ACM SIGMOD 2024 Paper: 'Hierarchical Cut Labelling - Scaling Up Distance Queries on Road Networks'.

[BibT_eX]

[DOI]

Proceedings of the Reproducibility Reports of the 2024 International Conference on Management of Data, 2024

SpecInfer: Accelerating Large Language Model Serving with Tree-based Speculative Inference and Verification.

[BibT_eX]

[DOI]

Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

Optimal Kernel Orchestration for Tensor Programs with Korch.

[BibT_eX]

[DOI]

Muyan Hu

Ashwin Venkatram

Shreyashri Biswas

Balamurugan Marimuthu

Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

Quantized Side Tuning: Fast and Memory-Efficient Tuning of Quantized Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023

SpecInfer: Accelerating Generative LLM Serving with Speculative Inference and Token Tree Verification.

[BibT_eX]

[DOI]

CoRR, 2023

Direct Telemetry Access.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGCOMM 2023 Conference, 2023

2022

Direct Telemetry Access.

[BibT_eX]

[DOI]

Jonatan Langlet

Ran Ben Basat

Sivaramakrishnan Ramanathan

CoRR, 2022

2021

Zero-CPU Collection with Direct Telemetry Access.

[BibT_eX]

[DOI]

Jonatan Langlet

Ran Ben-Basat

Sivaramakrishnan Ramanathan

Proceedings of the HotNets '21: The 20th ACM Workshop on Hot Topics in Networks, 2021

Gabriele Oliaro

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...