Hao Zhang

Orcid: 0009-0003-8392-3977

Affiliations:
  • University of California San Diego, La Jolla, CA, USA
  • University of California Berkeley, CA, USA
  • Carnegie Mellon University, Pittsburgh, PA, USA (former)
  • Petuum Inc., Pittsburgh, PA, USA (former)


According to our database1, Hao Zhang authored at least 69 papers between 2014 and 2025.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
lmgame-Bench: How Good are LLMs at Playing Games?
CoRR, May, 2025

VSA: Faster Video Diffusion with Trainable Sparse Attention.
CoRR, May, 2025

Efficient-vDiT: Efficient Video Diffusion Transformers With Attention Tile.
CoRR, February, 2025

Fast Video Generation with Sliding Tile Attention.
CoRR, February, 2025

GameArena: Evaluating LLM Reasoning through Live Computer Games.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Scaling Long Context Training Data by Long-Distance Referrals.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

TrimLLM: Progressive Layer Dropping for Domain-Specific LLMs.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024
Efficiently Serving LLM Reasoning Programs with Certaindex.
CoRR, 2024

Specifications: The missing link to making the development of LLM systems an engineering discipline.
CoRR, 2024

MPC-Minimized Secure LLM Inference.
CoRR, 2024

Optimizing Speculative Decoding for Serving Large Language Models Using Goodput.
CoRR, 2024

Toward Inference-optimal Mixture-of-Expert Large Language Models.
CoRR, 2024

MuxServe: Flexible Multiplexing for Efficient Multiple LLM Serving.
CoRR, 2024

DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving.
Proceedings of the 18th USENIX Symposium on Operating Systems Design and Implementation, 2024

Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Efficient LLM Scheduling by Learning to Rank.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Online Speculative Decoding.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Break the Sequential Dependency of LLM Inference Using Lookahead Decoding.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

MuxServe: Flexible Spatial-Temporal Multiplexing for Multiple LLM Serving.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

AdaMoE: Token-Adaptive Routing with Null Experts for Mixture-of-Experts Language Models.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

2023
LightSeq: Sequence Level Parallelism for Distributed Training of Long Context Transformers.
CoRR, 2023

Efficient Memory Management for Large Language Model Serving with PagedAttention.
Proceedings of the 29th Symposium on Operating Systems Principles, 2023

AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving.
Proceedings of the 17th USENIX Symposium on Operating Systems Design and Implementation, 2023

Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

On Optimizing the Communication of Model Parallelism.
Proceedings of the Sixth Conference on Machine Learning and Systems, 2023

MPCFORMER: Fast, Performant and Provate Transformer Inference with MPC.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

2022
DNB: A Joint Learning Framework for Deep Bayesian Nonparametric Clustering.
IEEE Trans. Neural Networks Learn. Syst., 2022

MPCFormer: fast, performant and private Transformer inference with MPC.
CoRR, 2022

Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning.
CoRR, 2022

Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning.
Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation, 2022

AMP: Automatically Finding Model Parallel Strategies with Heterogeneity Awareness.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

2021
Machine Learning Parallelism Could Be Adaptive, Composable and Automated.
PhD thesis, 2021

Pollux: Co-adaptive Cluster Scheduling for Goodput-Optimized Deep Learning.
Proceedings of the 15th USENIX Symposium on Operating Systems Design and Implementation, 2021

Simple and Automatic Distributed Machine Learning on Ray.
Proceedings of the KDD '21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2021

TeraPipe: Token-Level Pipeline Parallelism for Training Large-Scale Language Models.
Proceedings of the 38th International Conference on Machine Learning, 2021

Ada-Segment: Automated Multi-loss Adaptation for Panoptic Segmentation.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
BayesAdapter: Being Bayesian, Inexpensively and Robustly, via Bayeisan Fine-tuning.
CoRR, 2020

Pollux: Co-adaptive Cluster Scheduling for Goodput-Optimized Deep Learning.
CoRR, 2020

AutoSync: Learning to Synchronize for Data-Parallel Distributed Deep Learning.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

2019
AutoLoss: Learning Discrete Schedule for Alternate Optimization.
Proceedings of the 7th International Conference on Learning Representations, 2019

Toward Understanding the Impact of Staleness in Distributed Machine Learning.
Proceedings of the 7th International Conference on Learning Representations, 2019

2018
AutoLoss: Learning Discrete Schedules for Alternate Optimization.
CoRR, 2018

Cavs: An Efficient Runtime System for Dynamic Neural Networks.
Proceedings of the 2018 USENIX Annual Technical Conference, 2018

Symbolic Graph Reasoning Meets Convolutions.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

SCAN: Structure Correcting Adversarial Network for Organ Segmentation in Chest X-Rays.
Proceedings of the Deep Learning in Medical Image Analysis - and - Multimodal Learning for Clinical Decision Support, 2018

Generative Semantic Manipulation with Mask-Contrasting GAN.
Proceedings of the Computer Vision - ECCV 2018, 2018

2017
Cavs: A Vertex-centric Programming Interface for Dynamic Neural Networks.
CoRR, 2017

Generative Semantic Manipulation with Contrasting GAN.
CoRR, 2017

SCAN: Structure Correcting Adversarial Network for Chest X-rays Organ Segmentation.
CoRR, 2017

ZM-Net: Real-time Zero-shot Image Manipulation Network.
CoRR, 2017

Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters.
Proceedings of the 2017 USENIX Annual Technical Conference, 2017

Structured Generative Adversarial Networks.
Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

Recurrent Topic-Transition GAN for Visual Paragraph Generation.
Proceedings of the IEEE International Conference on Computer Vision, 2017

2016
Automatic Photo Adjustment Using Deep Neural Networks.
ACM Trans. Graph., 2016

Combining the Best of Convolutional Layers and Recurrent Layers: A Hybrid Network for Semantic Segmentation.
CoRR, 2016

GeePS: scalable deep learning on distributed GPUs with a GPU-specialized parameter server.
Proceedings of the Eleventh European Conference on Computer Systems, 2016

Learning Concept Taxonomies from Multi-modal Data.
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 2016

2015
Speaker Adaptive Training of Deep Neural Network Acoustic Models Using I-Vectors.
IEEE ACM Trans. Audio Speech Lang. Process., 2015

Poseidon: A System Architecture for Efficient GPU-based Deep Learning on Multiple Machines.
CoRR, 2015

Dynamic Topic Modeling for Monitoring Market Competition from Online Text and Image Data.
Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2015

HD-CNN: Hierarchical Deep Convolutional Neural Networks for Large Scale Visual Recognition.
Proceedings of the 2015 IEEE International Conference on Computer Vision, 2015

Regularizing DNN acoustic models with Gaussian stochastic neurons.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Semi-supervised training in low-resource ASR and KWS.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

2014
Automatic Photo Adjustment Using Deep Learning.
CoRR, 2014

Improvements to speaker adaptive training of deep neural networks.
Proceedings of the 2014 IEEE Spoken Language Technology Workshop, 2014

Towards speaker adaptive training of deep neural network acoustic models.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Distributed learning of multilingual DNN feature extractors using GPUs.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014


  Loading...