Zhuohan Li

Orcid: 0009-0004-1534-9106

Affiliations:
  • University of California, Berkeley, CA, USA


According to our database1, Zhuohan Li authored at least 22 papers between 2018 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Fairness in Serving Large Language Models.
CoRR, 2024

2023
LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset.
CoRR, 2023

High-throughput Generative Inference of Large Language Models with a Single GPU.
CoRR, 2023

Efficient Memory Management for Large Language Model Serving with PagedAttention.
Proceedings of the 29th Symposium on Operating Systems Principles, 2023

AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving.
Proceedings of the 17th USENIX Symposium on Operating Systems Design and Implementation, 2023

Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU.
Proceedings of the International Conference on Machine Learning, 2023

2022
On Optimizing the Communication of Model Parallelism.
CoRR, 2022

Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning.
CoRR, 2022

Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning.
Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation, 2022

2021
Rearchitecting In-Memory Object Stores for Low Latency.
Proc. VLDB Endow., 2021

Hoplite: efficient and fault-tolerant collective communication for task-based distributed systems.
Proceedings of the ACM SIGCOMM 2021 Conference, Virtual Event, USA, August 23-27, 2021., 2021

Simple and Automatic Distributed Machine Learning on Ray.
Proceedings of the KDD '21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2021

TeraPipe: Token-Level Pipeline Parallelism for Training Large-Scale Language Models.
Proceedings of the 38th International Conference on Machine Learning, 2021

2020
Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers.
CoRR, 2020

Hoplite: Efficient Collective Communication for Task-Based Distributed Systems.
CoRR, 2020

Train Big, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers.
Proceedings of the 37th International Conference on Machine Learning, 2020

2019
Understanding and Improving Transformer From a Multi-Particle Dynamic System Point of View.
CoRR, 2019

Fast Structured Decoding for Sequence Models.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Efficient Training of BERT by Progressively Stacking.
Proceedings of the 36th International Conference on Machine Learning, 2019

Hint-Based Training for Non-Autoregressive Machine Translation.
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019

2018
Towards Binary-Valued Gates for Robust LSTM Training.
Proceedings of the 35th International Conference on Machine Learning, 2018


  Loading...