Zhihao Zhang

Orcid: 0009-0002-8409-2717

Affiliations:
  • Carnegie Mellon University, Pittsburgh, PA, USA


According to our database1, Zhihao Zhang authored at least 13 papers between 2020 and 2025.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning.
CoRR, April, 2025

AdaServe: SLO-Customized LLM Serving with Fine-Grained Speculative Decoding.
CoRR, January, 2025

TidalDecode: Fast and Accurate LLM Decoding with Position Persistent Sparse Attention.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024
Accelerating Retrieval-Augmented Language Model Serving with Speculation.
CoRR, 2024

Communication Bounds for the Distributed Experts Problem.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Accelerating Iterative Retrieval-augmented Language Model Serving with Speculation.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

SpecInfer: Accelerating Large Language Model Serving with Tree-based Speculative Inference and Verification.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

Quantized Side Tuning: Fast and Memory-Efficient Tuning of Quantized Large Language Models.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023
Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems.
CoRR, 2023

SpecInfer: Accelerating Generative LLM Serving with Speculative Inference and Token Tree Verification.
CoRR, 2023

2022
Spatio-Temporal Graph Dual-Attention Network for Multi-Agent Prediction and Tracking.
IEEE Trans. Intell. Transp. Syst., 2022

GradSign: Model Performance Inference with Theoretical Insights.
Proceedings of the Tenth International Conference on Learning Representations, 2022

2020
Social-WaGDAT: Interaction-aware Trajectory Prediction via Wasserstein Graph Double-Attention Network.
CoRR, 2020


  Loading...