Zhihao Jia

Orcid: 0000-0002-1270-5185

According to our database1, Zhihao Jia authored at least 53 papers between 2012 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Parcae: Proactive, Liveput-Optimized DNN Training on Preemptible Instances.
CoRR, 2024

FlexLLM: A System for Co-Serving Large Language Model Inference and Parameter-Efficient Finetuning.
CoRR, 2024

Sequoia: Scalable, Robust, and Hardware-aware Speculative Decoding.
CoRR, 2024

Accelerating Retrieval-Augmented Language Model Serving with Speculation.
CoRR, 2024

Quantized Side Tuning: Fast and Memory-Efficient Tuning of Quantized Large Language Models.
CoRR, 2024

2023
Optimizing DNNs With Partially Equivalent Transformations and Automated Corrections.
IEEE Trans. Computers, December, 2023

Dynamic Correlation Adjacency-Matrix-Based Graph Neural Networks for Traffic Flow Prediction.
Sensors, March, 2023

SDPipe: A Semi-Decentralized Framework for Heterogeneity-aware Pipeline-parallel Training.
Proc. VLDB Endow., 2023

Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems.
CoRR, 2023

SpotServe: Serving Generative Large Language Models on Preemptible Instances.
CoRR, 2023

Drone-NeRF: Efficient NeRF Based 3D Scene Reconstruction for Large-Scale Drone Survey.
CoRR, 2023

Quarl: A Learning-Based Quantum Circuit Optimizer.
CoRR, 2023

SpecInfer: Accelerating Generative LLM Serving with Speculative Inference and Token Tree Verification.
CoRR, 2023

Sia: Heterogeneity-aware, goodput-optimized ML-cluster scheduling.
Proceedings of the 29th Symposium on Operating Systems Principles, 2023

EINNET: Optimizing Tensor Programs with Derivation-Based Transformations.
Proceedings of the 17th USENIX Symposium on Operating Systems Design and Implementation, 2023

TopoOpt: Co-optimizing Network Topology and Parallelization Strategy for Distributed Training Jobs.
Proceedings of the 20th USENIX Symposium on Networked Systems Design and Implementation, 2023

Bamboo: Making Preemptible Instances Resilient for Affordable Training of Large DNNs.
Proceedings of the 20th USENIX Symposium on Networked Systems Design and Implementation, 2023

2022
TOD: GPU-accelerated Outlier Detection via Tensor Operations.
Proc. VLDB Endow., 2022

Quark: A Gradient-Free Quantum Learning Framework for Classification Tasks.
CoRR, 2022

OLLIE: Derivation-based Tensor Program Optimizer.
CoRR, 2022

Benchmarking Node Outlier Detection on Graphs.
CoRR, 2022

Optimizing Mixture of Experts using Dynamic Recompilations.
CoRR, 2022

PyGOD: A Python Library for Graph Outlier Detection.
CoRR, 2022

Quartz: Superoptimization of Quantum Circuits (Extended Version).
CoRR, 2022

Quartz: superoptimization of Quantum circuits.
Proceedings of the PLDI '22: 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation, San Diego, CA, USA, June 13, 2022

Unity: Accelerating DNN Training Through Joint Optimization of Algebraic Transformations and Parallelization.
Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation, 2022

BOND: Benchmarking Unsupervised Outlier Node Detection on Static Attributed Graphs.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022


GradSign: Model Performance Inference with Theoretical Insights.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Collage: Seamless Integration of Deep Learning Backends with Automatic Placement.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2022

2021
Quanto: Optimizing Quantum Circuits with Automatic Generation of Circuit Identities.
CoRR, 2021

Collage: Automated Integration of Deep Learning Backends.
CoRR, 2021

TOD: Tensor-based Outlier Detection.
CoRR, 2021

Scaling implicit parallelism via dynamic control replication.
Proceedings of the PPoPP '21: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2021

PET: Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections.
Proceedings of the 15th USENIX Symposium on Operating Systems Design and Implementation, 2021

Dorylus: Affordable, Scalable, and Accurate GNN Training with Distributed CPU Servers and Serverless Threads.
Proceedings of the 15th USENIX Symposium on Operating Systems Design and Implementation, 2021

IOS: Inter-Operator Scheduler for CNN Acceleration.
Proceedings of Machine Learning and Systems 2021, 2021

2020
Automated discovery of machine learning optimizations.
PhD thesis, 2020

Improving the Accuracy, Scalability, and Performance of Graph Neural Networks with Roc.
Proceedings of Machine Learning and Systems 2020, 2020

Redundancy-Free Computation for Graph Neural Networks.
Proceedings of the KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2020

2019
Redundancy-Free Computation Graphs for Graph Neural Networks.
CoRR, 2019

TASO: optimizing deep learning computation with automatic generation of graph substitutions.
Proceedings of the 27th ACM Symposium on Operating Systems Principles, 2019

Beyond Data and Model Parallelism for Deep Neural Networks.
Proceedings of Machine Learning and Systems 2019, 2019

Optimizing DNN Computation with Relaxed Graph Substitutions.
Proceedings of Machine Learning and Systems 2019, 2019

2018
Research on user behavior clustering algorithm based on mobile application.
J. Intell. Fuzzy Syst., 2018

Isometry: A Path-Based Distributed Data Transfer System.
Proceedings of the 32nd International Conference on Supercomputing, 2018

Exploring Hidden Dimensions in Parallelizing Convolutional Neural Networks.
Proceedings of the 35th International Conference on Machine Learning, 2018

2017
A Distributed Multi-GPU System for Fast Graph Processing.
Proc. VLDB Endow., 2017

Integrating External Resources with a Task-Based Programming Model.
Proceedings of the 24th IEEE International Conference on High Performance Computing, 2017

2016
SLIK: Scalable Low-Latency Indexes for a Key-Value Store.
Proceedings of the 2016 USENIX Annual Technical Conference, 2016

2015
Automatic and transparent I/O optimization with storage integrated application runtime support.
Proceedings of the 10th Parallel Data Storage Workshop, 2015

2012
Improving Integer Security for Systems with KINT.
Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation, 2012

Undefined behavior: what happened to my code?
Proceedings of the Asia-Pacific Workshop on Systems, 2012


  Loading...