Alexey Tumanov

Orcid: 0009-0005-7862-1477

According to our database¹, Alexey Tumanov authored at least 69 papers between 2007 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

KLAS: Using Similarity to Stitch Neural Networks for Improved Accuracy-Efficiency Tradeoffs.

[BibT_eX]

[DOI]

CoRR, May, 2026

Revati: Transparent GPU-Free Time-Warp Emulation for LLM Serving.

[BibT_eX]

[DOI]

CoRR, January, 2026

Maya: Optimizing Deep Learning Training Workloads using GPU Runtime Emulation.

[BibT_eX]

[DOI]

Proceedings of the 21st European Conference on Computer Systems, 2026

LayoutBench: Performance Benchmarking of Cloud Storage Layouts for Multimedia Data.

[BibT_eX]

[DOI]

Proceedings of the Sixth European Workshop on Machine Learning and Systems, EuroMLSys 2026, 2026

2025

Toward Weight Sharing Paradigm for Efficient AI: Training and Inference Serving.

[BibT_eX]

[DOI]

ACM SIGOPS Oper. Syst. Rev., July, 2025

EMPIRIC: Exploring Missing Pieces in KV Cache Compression for Reducing Computation, Storage, and Latency in Long-Context LLM Inference.

[BibT_eX]

[DOI]

ACM SIGOPS Oper. Syst. Rev., July, 2025

Efficient LLM Inference via Chunked Prefills.

[BibT_eX]

[DOI]

ACM SIGOPS Oper. Syst. Rev., July, 2025

On Evaluating Performance of LLM Inference Serving Systems.

[BibT_eX]

[DOI]

CoRR, July, 2025

Maya: Optimizing Deep Learning Training Workloads using Emulated Virtual Accelerators.

[BibT_eX]

[DOI]

CoRR, March, 2025

∇QDARTS: Quantization as an Elastic Dimension to Differentiable NAS.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2025

SuperServe: Fine-Grained Inference Serving for Unpredictable Workloads.

[BibT_eX]

[DOI]

Proceedings of the 22nd USENIX Symposium on Networked Systems Design and Implementation, 2025

RocketKV: Accelerating Long-Context LLM Inference via Two-Stage KV Cache Compression.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Client Availability in Federated Learning: It Matters!

[BibT_eX]

[DOI]

Proceedings of the 5th Workshop on Machine Learning and Systems, 2025

VillainNet: Targeted Poisoning Attacks Against SuperNets Along the Accuracy-Latency Pareto Frontier.

[BibT_eX]

[DOI]

Brendan Saltaformaggio

Proceedings of the 2025 ACM SIGSAC Conference on Computer and Communications Security, 2025

2024

PLUM: Improving Inference Efficiency By Leveraging Repetition-Sparsity Trade-Off.

[BibT_eX]

[DOI]

Sachit Kuhar

Yash Jain

Alexey Tumanov

Trans. Mach. Learn. Res., 2024

Initialization using Update Approximation is a Silver Bullet for Extremely Efficient Low-Rank Fine-Tuning.

[BibT_eX]

[DOI]

CoRR, 2024

Mnemosyne: Parallelization Strategies for Efficiently Serving Multi-Million Context Length LLM Inference Requests Without Approximations.

[BibT_eX]

[DOI]

CoRR, 2024

Metron: Holistic Performance Evaluation Framework for LLM Inference Systems.

[BibT_eX]

[DOI]

CoRR, 2024

DεS: Delayed ε-Shrinking for Faster Once-For-All Training.

[BibT_eX]

[DOI]

CoRR, 2024

Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve.

[BibT_eX]

[DOI]

Proceedings of the 18th USENIX Symposium on Operating Systems Design and Implementation, 2024

VIDUR: A Large-Scale Simulation Framework for LLM Inference.

[BibT_eX]

[DOI]

Proceedings of the Seventh Annual Conference on Machine Learning and Systems, 2024

Harmonica: Hybrid Accelerator to Overcome Imperfections of Mixed-signal DNN Accelerators.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

SuperFedNAS: Cost-Efficient Federated Neural Architecture Search for On-device Inference.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

DεpS: Delayed ε-Shrinking for Faster Once-for-All Training.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Inshrinkerator: Compressing Deep Learning Training Checkpoints via Dynamic Quantization.

[BibT_eX]

[DOI]

Amey Agrawal

Sameer Reddy

Satwik Bhattamishra

Venkata Prabhakara Sarath Nookala

Vidushi Vashishth

Kexin Rong

Alexey Tumanov

Proceedings of the 2024 ACM Symposium on Cloud Computing, 2024

2023

Hardware-Software Co-Design for Real-Time Latency-Accuracy Navigation in Tiny Machine Learning Applications.

[BibT_eX]

[DOI]

IEEE Micro, 2023

Signed Binarization: Unlocking Efficiency Through Repetition-Sparsity Trade-Off.

[BibT_eX]

[DOI]

Sachit Kuhar

Yash Jain

Alexey Tumanov

CoRR, 2023

ABKD: Graph Neural Network Compression with Attention-Based Knowledge Distillation.

[BibT_eX]

[DOI]

CoRR, 2023

Pareto-Secure Machine Learning (PSML): Fingerprinting and Securing Inference Serving Systems.

[BibT_eX]

[DOI]

CoRR, 2023

DynaQuant: Compressing Deep Learning Training Checkpoints via Dynamic Quantization.

[BibT_eX]

[DOI]

Amey Agrawal

Sameer Reddy

Satwik Bhattamishra

Venkata Prabhakara Sarath Nookala

Vidushi Vashishth

Kexin Rong

Alexey Tumanov

CoRR, 2023

SuperFed: Weight Shared Federated Learning.

[BibT_eX]

[DOI]

CoRR, 2023

Subgraph Stationary Hardware-Software Inference Co-Design.

[BibT_eX]

[DOI]

Abhimanyu Rajeshkumar Bambhaniya

Alind Khare

Proceedings of the Sixth Conference on Machine Learning and Systems, 2023

TransEHR: Self-Supervised Transformer for Clinical Time Series Data.

[BibT_eX]

[DOI]

Proceedings of the Machine Learning for Health, 2023

2022

UnfoldML: Cost-Aware and Uncertainty-Based Dynamic 2D Prediction for Multi-Stage Classification.

[BibT_eX]

[DOI]

Rishikesan Kamaleswaran

Chao Zhang

Alexey Tumanov

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

CoDG-ReRAM: An Algorithm-Hardware Co-design to Accelerate Semi-Structured GNNs on ReRAM.

[BibT_eX]

[DOI]

Proceedings of the IEEE 40th International Conference on Computer Design, 2022

Automatic Parallelization of Python Programs for Distributed Heterogeneous Computing.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2022: Parallel Processing, 2022

ESCHER: expressive scheduling with ephemeral resources.

[BibT_eX]

[DOI]

Proceedings of the 13th Symposium on Cloud Computing, SoCC 2022, 2022

2021

CompOFA - Compound Once-For-All Networks for Faster Multi-Platform Deployment.

[BibT_eX]

[DOI]

Proceedings of the 9th International Conference on Learning Representations, 2021

RubberBand: cloud-based hyperparameter tuning.

[BibT_eX]

[DOI]

Kirthevasan Kandasamy

Joseph E. Gonzalez

Ion Stoica

Alexey Tumanov

Proceedings of the EuroSys '21: Sixteenth European Conference on Computer Systems, 2021

2020

Cloudburst: Stateful Functions-as-a-Service.

[BibT_eX]

[DOI]

Vikram Sreekanti

Chenggang Wu

Xiayue Charles Lin

Johann Schleier-Smith

Joseph Gonzalez

Joseph M. Hellerstein

Alexey Tumanov

Proc. VLDB Endow., 2020

Cloudburst: Stateful Functions-as-a-Service.

[BibT_eX]

[DOI]

Vikram Sreekanti

Chenggang Wu

Xiayue Charles Lin

Johann Schleier-Smith

Jose M. Faleiro

Joseph E. Gonzalez

Joseph M. Hellerstein

Alexey Tumanov

CoRR, 2020

HOLMES: Health OnLine Model Ensemble Serving for Deep Learning Models in Intensive Care Units.

[BibT_eX]

[DOI]

Proceedings of the KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2020

InferLine: latency-aware provisioning and scaling for prediction serving pipelines.

[BibT_eX]

[DOI]

Proceedings of the SoCC '20: ACM Symposium on Cloud Computing, 2020

2019

The OoO VLIW JIT Compiler for GPU Inference.

[BibT_eX]

[DOI]

CoRR, 2019

Dynamic Space-Time Scheduling for GPU Inference.

[BibT_eX]

[DOI]

CoRR, 2019

Lineage stash: fault tolerance off the critical path.

[BibT_eX]

[DOI]

Proceedings of the 27th ACM Symposium on Operating Systems Principles, 2019

HyperSched: Dynamic Resource Reallocation for Model Development on a Deadline.

[BibT_eX]

[DOI]

Proceedings of the ACM Symposium on Cloud Computing, SoCC 2019, 2019

Cirrus: a Serverless Framework for End-to-end ML Workflows.

[BibT_eX]

[DOI]

Proceedings of the ACM Symposium on Cloud Computing, SoCC 2019, 2019

Serverless Computing: One Step Forward, Two Steps Back.

[BibT_eX]

[DOI]

Joseph M. Hellerstein

Jose M. Faleiro

Joseph Gonzalez

Johann Schleier-Smith

Vikram Sreekanti

Alexey Tumanov

Chenggang Wu

Proceedings of the 9th Biennial Conference on Innovative Data Systems Research, 2019

2018

InferLine: ML Inference Pipeline Composition Framework.

[BibT_eX]

[DOI]

CoRR, 2018

Tributary: spot-dancing for elastic services with latency SLOs.

[BibT_eX]

[DOI]

Proceedings of the 2018 USENIX Annual Technical Conference, 2018

IDK Cascades: Fast Deep Learning by Learning not to Overthink.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fourth Conference on Uncertainty in Artificial Intelligence, 2018

Ray: A Distributed Framework for Emerging AI Applications.

[BibT_eX]

[DOI]

Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation, 2018

3Sigma: distribution-based cluster scheduling for runtime uncertainty.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth EuroSys Conference, 2018

2017

Ray: A Distributed Framework for Emerging AI Applications.

[BibT_eX]

[DOI]

CoRR, 2017

IDK Cascades: Fast Deep Learning by Learning not to Overthink.

[BibT_eX]

[DOI]

CoRR, 2017

Real-Time Machine Learning: The Missing Pieces.

[BibT_eX]

[DOI]

Johann Schleier-Smith

Proceedings of the 16th Workshop on Hot Topics in Operating Systems, 2017

Proteus: agile ML elasticity through tiered reliability in dynamic resource markets.

[BibT_eX]

[DOI]

Proceedings of the Twelfth European Conference on Computer Systems, 2017

2016

Scheduling with Space-Time Soft Constraints In Heterogeneous Cloud Datacenters.

[BibT_eX]

[DOI]

Alexey Tumanov

PhD thesis, 2016

Morpheus: Towards Automated SLOs for Enterprise Clusters.

[BibT_eX]

[DOI]

Sangeetha Abdu Jyothi

Carlo Curino

Ishai Menache

Shravan Matthur Narayanamurthy

Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation, 2016

TetriSched: global rescheduling with adaptive plan-ahead in dynamic heterogeneous clusters.

[BibT_eX]

[DOI]

Proceedings of the Eleventh European Conference on Computer Systems, 2016

2014

Agility and Performance in Elastic Distributed Storage.

[BibT_eX]

[DOI]

ACM Trans. Storage, 2014

SpringFS: bridging agility and performance in elastic distributed storage.

[BibT_eX]

[DOI]

Proceedings of the 12th USENIX conference on File and Storage Technologies, 2014

PriorityMeister: Tail Latency QoS for Shared Networked Storage.

[BibT_eX]

[DOI]

Proceedings of the ACM Symposium on Cloud Computing, 2014

Exploiting iterative-ness for parallel ML computations.

[BibT_eX]

[DOI]

Jesse Haber-Kucharsky

Proceedings of the ACM Symposium on Cloud Computing, 2014

2012

alsched: algebraic scheduling of mixed workloads in heterogeneous clouds.

[BibT_eX]

[DOI]

Proceedings of the ACM Symposium on Cloud Computing, SOCC '12, 2012

Heterogeneity and dynamicity of clouds at scale: Google trace analysis.

[BibT_eX]

[DOI]

Proceedings of the ACM Symposium on Cloud Computing, SOCC '12, 2012

2011

Kaleidoscope: cloud micro-elasticity via VM state coloring.

[BibT_eX]

[DOI]

H. Andrés Lagar-Cavilla

Eyal de Lara

Proceedings of the European Conference on Computer Systems, 2011

2007

Variability-Aware Latency Amelioration in Distributed Environments.

[BibT_eX]

[DOI]

Alexey Tumanov

Robert S. Allison

Wolfgang Stürzlinger

Proceedings of the IEEE Virtual Reality Conference, 2007

Alexey Tumanov

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...