Shivaram Venkataraman

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Scaling Inference-Efficient Language Models.

[BibT_eX]

[DOI]

Song Bian

Minghao Yan

Proceedings of the Forty-second International Conference on Machine Learning, 2025

TUNA: Tuning Unstable and Noisy Cloud Applications.

[BibT_eX]

[DOI]

Johannes Freischuetz

Brian Kroth

Proceedings of the Twentieth European Conference on Computer Systems, 2025

Eva: Cost-Efficient Cloud-Based Cluster Scheduling.

[BibT_eX]

[DOI]

Tzu-Tao Chang

Proceedings of the Twentieth European Conference on Computer Systems, 2025

Diamond: Harnessing GPU Resources for Scientific Deep Learning.

[BibT_eX]

[DOI]

Volodymyr V. Kindratenko

Kyle Chard

Ian T. Foster

Zhao Zhang

Proceedings of the IEEE International Conference on eScience, 2025

Striking the Right Chord: Parameter Tuning in Memory Tiering Systems.

[BibT_eX]

[DOI]

Sujay Yadalam

Michael Swift

Proceedings of the 3rd Workshop on Disruptive Memory Systems, 2025

2024

SYMPHONY: Improving Memory Management for LLM Inference Workloads.

[BibT_eX]

[DOI]

Anyong Mao

Shihabur Rahman Chowdhury

CoRR, 2024

Incremental IVF Index Maintenance for Streaming Vector Search.

[BibT_eX]

[DOI]

Jason Mohoney

Anil Pacaci

CoRR, 2024

GraphSnapShot: Graph Machine Learning Acceleration with Fast Storage and Retrieval.

[BibT_eX]

[DOI]

Dong Liu

Roger Waleffe

Meng Jiang

CoRR, 2024

PAL: A Variability-Aware Policy for Scheduling ML Workloads in GPU Clusters.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2024

Does Compressing Activations Help Model Parallel Training?

[BibT_eX]

[DOI]

Proceedings of the Seventh Annual Conference on Machine Learning and Systems, 2024

CHAI: Clustered Head Attention for Efficient LLM Inference.

[BibT_eX]

[DOI]

Dimitris Papailiopoulos

Carole-Jean Wu

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Blox: A Modular Toolkit for Deep Learning Schedulers.

[BibT_eX]

[DOI]

Proceedings of the Nineteenth European Conference on Computer Systems, 2024

Nautilus: A Benchmarking Platform for DBMS Knob Tuning.

[BibT_eX]

[DOI]

Johannes Freischuetz

Proceedings of the Eighth Workshop on Data Management for End-to-End Machine Learning, 2024

2023

PolyThrottle: Energy-efficient Neural Network Inference on Edge Devices.

[BibT_eX]

[DOI]

Minghao Yan

CoRR, 2023

F2: Designing a Key-Value Store for Large Skewed Workloads.

[BibT_eX]

[DOI]

Badrish Chandramouli

CoRR, 2023

Bagpipe: Accelerating Deep Recommendation Model Training.

[BibT_eX]

[DOI]

Chengpo Yan

Ziyi Zhang

Proceedings of the 29th Symposium on Operating Systems Principles, 2023

Mirage: Towards Low-interruption Services on Batch GPU Clusters with Reinforcement Learning.

[BibT_eX]

[DOI]

Qiyang Ding

Pengfei Zheng

Shreyas Kudari

Zhao Zhang

Proceedings of the International Conference for High Performance Computing, 2023

Shockwave: Fair and Efficient Cluster Scheduling for Dynamic Adaptation in Machine Learning.

[BibT_eX]

[DOI]

Pengfei Zheng

Rui Pan

Tarannum Khan

Proceedings of the 20th USENIX Symposium on Networked Systems Design and Implementation, 2023

MariusGNN: Resource-Efficient Out-of-Core Training of Graph Neural Networks.

[BibT_eX]

[DOI]

Roger Waleffe

Jason Mohoney

Theodoros Rekatsinas

Proceedings of the Eighteenth European Conference on Computer Systems, 2023

2022

LlamaTune: Sample-Efficient DBMS Configuration Tuning.

[BibT_eX]

[DOI]

Proc. VLDB Endow., 2022

BagPipe: Accelerating Deep Recommendation Model Training.

[BibT_eX]

[DOI]

Ziyi Zhang

CoRR, 2022

Marius++: Large-Scale Training of Graph Neural Networks on a Single Machine.

[BibT_eX]

[DOI]

Roger Waleffe

Jason Mohoney

Theodoros Rekatsinas

CoRR, 2022

Not All GPUs Are Created Equal: Characterizing Variability in Large-Scale, Accelerator-Rich Systems.

[BibT_eX]

[DOI]

Proceedings of the SC22: International Conference for High Performance Computing, 2022

On the Utility of Gradient Compression in Distributed Training Systems.

[BibT_eX]

[DOI]

Proceedings of the Fifth Conference on Machine Learning and Systems, 2022

2021

The Roaming Edge and its Applications.

[BibT_eX]

[DOI]

Suman Banerjee

Remzi H. Arpaci-Dusseau

GetMobile Mob. Comput. Commun., 2021

Demonstration of Marius: Graph Embeddings with a Single Machine.

[BibT_eX]

[DOI]

Proc. VLDB Endow., 2021

AutoFreeze: Automatically Freezing Model Blocks to Accelerate Fine-tuning.

[BibT_eX]

[DOI]

Yuhan Liu

CoRR, 2021

Learning Massive Graph Embeddings on a Single Machine.

[BibT_eX]

[DOI]

CoRR, 2021

Accelerating Deep Learning Inference via Learned Caches.

[BibT_eX]

[DOI]

Adarsh Kumar

Yuhan Liu

Han Cao

CoRR, 2021

KAISA: an adaptive second-order optimizer framework for deep neural networks.

[BibT_eX]

[DOI]

J. Gregory Pauloski

Qi Huang

Lei Huang

Kyle Chard

Ian T. Foster

Zhao Zhang

Proceedings of the International Conference for High Performance Computing, 2021

Marius: Learning Massive Graph Embeddings on a Single Machine.

[BibT_eX]

[DOI]

Proceedings of the 15th USENIX Symposium on Operating Systems Design and Implementation, 2021

Move Fast and Meet Deadlines: Fine-grained Real-time Stream Processing with Cameo.

[BibT_eX]

[DOI]

Le Xu

Indranil Gupta

Luo Mai

Rahul Potharaju

Proceedings of the 18th USENIX Symposium on Networked Systems Design and Implementation, 2021

Adaptive Gradient Communication via Critical Learning Regime Identification.

[BibT_eX]

[DOI]

Kangwook Lee

Proceedings of the Fourth Conference on Machine Learning and Systems, 2021

Doing more by doing less: how structured partial backpropagation improves deep learning clusters.

[BibT_eX]

[DOI]

Adarsh Kumar

Kausik Subramanian

Proceedings of the DistributedML '21: Proceedings of the 2nd ACM International Workshop on Distributed Machine Learning, 2021

Atoll: A Scalable Low-Latency Serverless Platform.

[BibT_eX]

[DOI]

Kevin Houck

Mohammed Danish Shaikh

Proceedings of the SoCC '21: ACM Symposium on Cloud Computing, 2021

2020

Learning-Based Coded Computation.

[BibT_eX]

[DOI]

IEEE J. Sel. Areas Inf. Theory, 2020

Accordion: Adaptive Gradient Communication via Critical Learning Regime Identification.

[BibT_eX]

[DOI]

Kangwook Lee

CoRR, 2020

Themis: Fair and Efficient GPU Cluster Scheduling.

[BibT_eX]

[DOI]

Kshiteej Mahajan

Shuchi Chawla

Proceedings of the 17th USENIX Symposium on Networked Systems Design and Implementation, 2020

Blink: Fast and Generic Collectives for Distributed ML.

[BibT_eX]

[DOI]

Guanhua Wang

Proceedings of the Third Conference on Machine Learning and Systems, 2020

Too Many Knobs to Tune? Towards Faster Database Tuning by Pre-selecting Important Knobs.

[BibT_eX]

[DOI]

Ramnatthan Alagappan

Proceedings of the 12th USENIX Workshop on Hot Topics in Storage and File Systems, 2020

Serverless linear algebra.

[BibT_eX]

[DOI]

Jonathan Ragan-Kelley

Eric Jonas

Proceedings of the SoCC '20: ACM Symposium on Cloud Computing, 2020

2019

Archipelago: A Scalable Low-Latency Serverless Platform.

[BibT_eX]

[DOI]

Kevin Houck

Mohammed Danish Shaikh

CoRR, 2019

Themis: Fair and Efficient GPU Cluster Scheduling for Machine Learning Workloads.

[BibT_eX]

[DOI]

Kshiteej Mahajan

Varun Batra

Surya Teja Chavali

Shuchi Chawla

CoRR, 2019

Parity Models: A General Framework for Coding-Based Resilience in ML Inference.

[BibT_eX]

[DOI]

CoRR, 2019

SysML: The New Frontier of Machine Learning Systems.

[BibT_eX]

[DOI]

Alexandros G. Dimakis

Anastasios Kyrillidis

CoRR, 2019

Analysis of Large-Scale Multi-Tenant GPU Clusters for DNN Training Workloads.

[BibT_eX]

[DOI]

Myeongjae Jeon

Proceedings of the 2019 USENIX Annual Technical Conference, 2019

Parity models: erasure-coded resilience for prediction serving systems.

[BibT_eX]

[DOI]

Proceedings of the 27th ACM Symposium on Operating Systems Principles, 2019

Shuffling, Fast and Slow: Scalable Analytics on Serverless Infrastructure.

[BibT_eX]

[DOI]

Qifan Pu

Proceedings of the 16th USENIX Symposium on Networked Systems Design and Implementation, 2019

Cracking open the DNN black-box: Video Analytics with DNNs across the Camera-Cloud Boundary.

[BibT_eX]

[DOI]

John Emmons

Sadjad Fouladi

Ganesh Ananthanarayanan

Silvio Savarese

Keith Winstein

Proceedings of the 2019 Workshop on Hot Topics in Video Analytics and Intelligent Edges, 2019

Accelerating Deep Learning Inference via Freezing.

[BibT_eX]

[DOI]

Adarsh Kumar

Proceedings of the 11th USENIX Workshop on Hot Topics in Cloud Computing, 2019

The Case for Unifying Data Loading in Machine Learning Clusters.

[BibT_eX]

[DOI]

Aarati Kakaraparthy

Abhay Venkatesh

Proceedings of the 11th USENIX Workshop on Hot Topics in Cloud Computing, 2019

Serverless Event-Stream Processing over Virtual Actors.

[BibT_eX]

[DOI]

Wentao Wu

Proceedings of the 9th Biennial Conference on Innovative Data Systems Research, 2019

2018

Chi: A Scalable and Programmable Control Plane for Distributed Stream Processing Systems.

[BibT_eX]

[DOI]

Paolo Costa

Terry Kim

Saravanam Muthukrishnan

Vamsi Kuppa

Sudheer Dhulipalla

Sriram Rao

Proc. VLDB Endow., 2018

numpywren: serverless linear algebra.

[BibT_eX]

[DOI]

Jonathan Ragan-Kelley

CoRR, 2018

Learning a Code: Machine Learning for Approximate Non-Linear Coded Computation.

[BibT_eX]

[DOI]

CoRR, 2018

ASAP: Fast, Approximate Graph Pattern Mining at Scale.

[BibT_eX]

[DOI]

Anand Padmanabha Iyer

Zaoxing Liu

Xin Jin

Vladimir Braverman

Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation, 2018

Focus: Querying Large Video Datasets with Low Latency and Low Cost.

[BibT_eX]

[DOI]

Kevin Hsieh

Ganesh Ananthanarayanan

Peter Bodík

Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation, 2018

Towards Fast and Scalable Graph Pattern Mining.

[BibT_eX]

[DOI]

Anand Padmanabha Iyer

Zaoxing Liu

Xin Jin

Vladimir Braverman

Proceedings of the 10th USENIX Workshop on Hot Topics in Cloud Computing, 2018

Bridging the GAP: towards approximate graph analytics.

[BibT_eX]

[DOI]

Anand Padmanabha Iyer

Aurojit Panda

Proceedings of the 1st ACM SIGMOD Joint International Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA), 2018

2017

System Design for Large Scale Machine Learning.

[BibT_eX]

[DOI]

PhD thesis, 2017

Hemingway: Modeling Distributed Optimization Algorithms.

[BibT_eX]

[DOI]

Xinghao Pan

Zizheng Tai

Joseph Gonzalez

CoRR, 2017

Occupy the Cloud: Distributed Computing for the 99%.

[BibT_eX]

[DOI]

Eric Jonas

CoRR, 2017

Drizzle: Fast and Adaptable Stream Processing at Scale.

[BibT_eX]

[DOI]

Proceedings of the 26th Symposium on Operating Systems Principles, 2017

CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics.

[BibT_eX]

[DOI]

Omid Alipourfard

Hongqiang Harry Liu

Jianshu Chen

Minlan Yu

Ming Zhang

Proceedings of the 14th USENIX Symposium on Networked Systems Design and Implementation, 2017

Breaking Locality Accelerates Block Gauss-Seidel.

[BibT_eX]

[DOI]

Stephen Tu

Proceedings of the 34th International Conference on Machine Learning, 2017

KeystoneML: Optimizing Pipelines for Large-Scale Advanced Analytics.

[BibT_eX]

[DOI]

Evan Randall Sparks

Tomer Kaftan

Proceedings of the 33rd IEEE International Conference on Data Engineering, 2017

Occupy the cloud: distributed computing for the 99%.

[BibT_eX]

[DOI]

Eric Jonas

Qifan Pu

Proceedings of the 2017 Symposium on Cloud Computing, SoCC 2017, Santa Clara, CA, USA, 2017

2016

MLlib: Machine Learning in Apache Spark.

[BibT_eX]

[DOI]

J. Mach. Learn. Res., 2016

Large Scale Kernel Learning using Block Coordinate Descent.

[BibT_eX]

[DOI]

Stephen Tu

Rebecca Roelofs

CoRR, 2016

Apache Spark: a unified engine for big data processing.

[BibT_eX]

[DOI]

Commun. ACM, 2016

SparkR: Scaling R Programs with Spark.

[BibT_eX]

[DOI]

Proceedings of the 2016 International Conference on Management of Data, 2016

Ernest: Efficient Performance Prediction for Large-Scale Advanced Analytics.

[BibT_eX]

[DOI]

Proceedings of the 13th USENIX Symposium on Networked Systems Design and Implementation, 2016

Matrix Computations and Optimization in Apache Spark.

[BibT_eX]

[DOI]

Evan Randall Sparks

Aaron Staple

Matei Zaharia

Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016

2015

linalg: Matrix Computations in Apache Spark.

[BibT_eX]

[DOI]

Evan Randall Sparks

Alexander Ulanov

Matei Zaharia

CoRR, 2015

2014

Quantifying eventual consistency with PBS.

[BibT_eX]

[DOI]

Peter Bailis

Joseph M. Hellerstein

Commun. ACM, 2014

Record Placement Based on Data Skew Using Solid State Drives.

[BibT_eX]

[DOI]

Jun Suzuki

Sameer Agarwal

Proceedings of the Big Data Benchmarks, Performance Optimization, and Emerging Hardware, 2014

The Power of Choice in Data-Aware Cluster Scheduling.

[BibT_eX]

[DOI]

Aurojit Panda

Ganesh Ananthanarayanan

Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation, 2014

2013

PBS at work: advancing data management with consistency metrics.

[BibT_eX]

[DOI]

Peter Bailis

Joseph M. Hellerstein

Proceedings of the ACM SIGMOD International Conference on Management of Data, 2013

The Case for Tiny Tasks in Compute Clusters.

[BibT_eX]

[DOI]

Kay Ousterhout

Aurojit Panda

Josh Rosen

Proceedings of the 14th Workshop on Hot Topics in Operating Systems, 2013

Presto: distributed machine learning and graph processing with sparse matrices.

[BibT_eX]

[DOI]

Proceedings of the Eighth Eurosys Conference 2013, 2013

2012

Probabilistically Bounded Staleness for Practical Partial Quorums.

[BibT_eX]

[DOI]

Peter Bailis

Joseph M. Hellerstein

Proc. VLDB Endow., 2012

Sweet Storage SLOs with Frosting.

[BibT_eX]

[DOI]

Andrew Wang

Sara Alspaugh

Randy H. Katz

Proceedings of the 4th USENIX Workshop on Hot Topics in Cloud Computing, 2012

Using R for Iterative and Incremental Processing.

[BibT_eX]

[DOI]

Indrajit Roy

Alvin AuYoung

Robert S. Schreiber

Proceedings of the 4th USENIX Workshop on Hot Topics in Cloud Computing, 2012

Cake: enabling high-level SLOs on shared storage systems.

[BibT_eX]

[DOI]

Andrew Wang

Sara Alspaugh

Randy H. Katz

Proceedings of the ACM Symposium on Cloud Computing, SOCC '12, 2012

2011

Characterizing Data Structures for Volatile Forensics.

[BibT_eX]

[DOI]

Ellick Chan

Proceedings of the 2011 IEEE Sixth International Workshop on Systematic Approaches to Digital Forensic Engineering, 2011

Consistent and Durable Data Structures for Non-Volatile Byte-Addressable Memory.

[BibT_eX]

[DOI]

Parthasarathy Ranganathan

Niraj Tolia

Roy H. Campbell

Proceedings of the 9th USENIX Conference on File and Storage Technologies, 2011

2010

Scaling eCGA model building via data-intensive computing.

[BibT_eX]

[DOI]

Abhishek Verma

Xavier Llorà

David E. Goldberg

Roy H. Campbell

Proceedings of the IEEE Congress on Evolutionary Computation, 2010

Forenscope: a framework for live forensics.

[BibT_eX]

[DOI]

Ellick Chan