Ying Sheng

Orcid: 0000-0002-1883-2126

Affiliations:
  • Stanford University, CA, USA


According to our database1, Ying Sheng authored at least 35 papers between 2020 and 2025.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
Prism: Unleashing GPU Sharing for Cost-Efficient Multi-LLM Serving.
CoRR, May, 2025

Locality-aware Fair Scheduling in LLM Serving.
CoRR, January, 2025

DafnyBench: A Benchmark for Formal Software Verification.
Trans. Mach. Learn. Res., 2025

SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

MoE-Lightning: High-Throughput MoE Inference on Memory-constrained GPUs.
Proceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2025

2024
AI Metropolis: Scaling Large Language Model-based Multi-Agent Simulation with Out-of-order Execution.
CoRR, 2024

Post-Training Sparse Attention with Double Sparsity.
CoRR, 2024

SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors.
CoRR, 2024

Clover: Closed-Loop Verifiable Code Generation.
Proceedings of the AI Verification - First International Symposium, 2024

Fairness in Serving Large Language Models.
Proceedings of the 18th USENIX Symposium on Operating Systems Design and Implementation, 2024

SGLang: Efficient Execution of Structured Language Model Programs.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

SLoRA: Scalable Serving of Thousands of LoRA Adapters.
Proceedings of the Seventh Annual Conference on Machine Learning and Systems, 2024

Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Inference-Friendly Models With MixAttention.
Proceedings of the NeurIPS Efficient Natural Language and Speech Processing Workshop, 2024

2023
Combining Stable Infiniteness and (Strong) Politeness.
J. Autom. Reason., December, 2023

Reasoning About Vectors: Satisfiability Modulo a Theory of Sequences.
J. Autom. Reason., September, 2023

Efficiently Programming Large Language Models using SGLang.
CoRR, 2023

S-LoRA: Serving Thousands of Concurrent LoRA Adapters.
CoRR, 2023

H<sub>2</sub>O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.
CoRR, 2023

On Optimal Caching and Model Multiplexing for Large Model Inference.
CoRR, 2023

High-throughput Generative Inference of Large Language Models with a Single GPU.
CoRR, 2023

Efficient Memory Management for Large Language Model Serving with PagedAttention.
Proceedings of the 29th Symposium on Operating Systems Principles, 2023

AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving.
Proceedings of the 17th USENIX Symposium on Operating Systems Design and Implementation, 2023

Towards Optimal Caching and Model Selection for Large Model Inference.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU.
Proceedings of the International Conference on Machine Learning, 2023

2022
Read-once refutations in Horn constraint systems: an algorithmic approach.
J. Log. Comput., 2022

Polite Combination of Algebraic Datatypes.
J. Autom. Reason., 2022

cvc5: A Versatile and Industrial-Strength SMT Solver.
Proceedings of the Tools and Algorithms for the Construction and Analysis of Systems, 2022

Reasoning About Vectors Using an SMT Theory of Sequences.
Proceedings of the Automated Reasoning - 11th International Joint Conference, 2022

2021
Politeness for the Theory of Algebraic Datatypes (Extended Abstract).
Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, 2021

Politeness and Stable Infiniteness: Stronger Together.
Proceedings of the Automated Deduction - CADE 28, 2021

2020
Politeness for the Theory of Algebraic Datatypes.
Proceedings of the Automated Reasoning - 10th International Joint Conference, 2020


  Loading...