Youhe Jiang

Orcid: 0000-0001-9619-8039

According to our database1, Youhe Jiang authored at least 17 papers between 2020 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
Cascadia: A Cascade Serving System for Large Language Models.
CoRR, June, 2025

Thinking Short and Right Over Thinking Long: Serving LLM Reasoning Efficiently and Accurately.
CoRR, May, 2025

HEXGEN-TEXT2SQL: Optimizing LLM Inference Request Scheduling for Agentic Text-to-SQL Workflow.
CoRR, May, 2025

ThunderServe: High-performance and Cost-efficient LLM Serving in Cloud Environments.
CoRR, February, 2025

Demystifying Cost-Efficiency in LLM Serving over Heterogeneous GPUs.
CoRR, February, 2025

HexGen-2: Disaggregated Generative Inference of LLMs in Heterogeneous Environment.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024
Improving Automatic Parallel Training via Balanced Memory Workload Optimization.
IEEE Trans. Knowl. Data Eng., 2024

Revisiting the Time Cost Model of AllReduce.
CoRR, 2024

FlashFlex: Accommodating Large Language Model Training over Heterogeneous Environment.
CoRR, 2024

GNNFingers: A Fingerprinting Framework for Verifying Ownerships of Graph Neural Networks.
Proceedings of the ACM on Web Conference 2024, 2024

HexGen: Generative Inference of Large Language Model over Heterogeneous Environment.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

2023
HexGen: Generative Inference of Foundation Model over Heterogeneous Decentralized Environment.
CoRR, 2023

Improving Automatic Parallel Training via Balanced Memory Workload Optimization.
CoRR, 2023

OSDP: Optimal Sharded Data Parallel for Distributed Deep Learning.
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

2022
Galvatron: Efficient Transformer Training over Multiple GPUs Using Automatic Parallelism.
Proc. VLDB Endow., 2022

OSDP: Optimal Sharded Data Parallel for Distributed Deep Learning.
CoRR, 2022

2020
2D-HRA: Two-Dimensional Hierarchical Ring-Based All-Reduce Algorithm in Large-Scale Distributed Machine Learning.
IEEE Access, 2020


  Loading...