Youhe Jiang

Orcid: 0000-0001-7920-3748

According to our database¹, Youhe Jiang authored at least 22 papers between 2020 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2025

AReaL-Hex: Accommodating Asynchronous RL Training over Heterogeneous GPUs.

[BibT_eX]

[DOI]

CoRR, November, 2025

Parallax: Efficient LLM Inference Service over Decentralized Environment.

[BibT_eX]

[DOI]

CoRR, September, 2025

Efficient Pre-Training of LLMs via Topology-Aware Communication Alignment on More Than 9600 GPUs.

[BibT_eX]

[DOI]

CoRR, September, 2025

Flash Sparse Attention: An Alternative Efficient Implementation of Native Sparse Attention Kernel.

[BibT_eX]

[DOI]

Ran Yan

Youhe Jiang

Binhang Yuan

CoRR, August, 2025

Efficient Mixed-Precision Large Language Model Inference with TurboMind.

[BibT_eX]

[DOI]

CoRR, August, 2025

Cascadia: A Cascade Serving System for Large Language Models.

[BibT_eX]

[DOI]

CoRR, June, 2025

Thinking Short and Right Over Thinking Long: Serving LLM Reasoning Efficiently and Accurately.

[BibT_eX]

[DOI]

CoRR, May, 2025

HEXGEN-TEXT2SQL: Optimizing LLM Inference Request Scheduling for Agentic Text-to-SQL Workflow.

[BibT_eX]

[DOI]

CoRR, May, 2025

ThunderServe: High-performance and Cost-efficient LLM Serving in Cloud Environments.

[BibT_eX]

[DOI]

CoRR, February, 2025

Demystifying Cost-Efficiency in LLM Serving over Heterogeneous GPUs.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

HexGen-2: Disaggregated Generative Inference of LLMs in Heterogeneous Environment.

[BibT_eX]

[DOI]

Youhe Jiang

Ran Yan

Binhang Yuan

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024

Improving Automatic Parallel Training via Balanced Memory Workload Optimization.

[BibT_eX]

[DOI]

IEEE Trans. Knowl. Data Eng., 2024

Revisiting the Time Cost Model of AllReduce.

[BibT_eX]

[DOI]

CoRR, 2024

FlashFlex: Accommodating Large Language Model Training over Heterogeneous Environment.

[BibT_eX]

[DOI]

CoRR, 2024

GNNFingers: A Fingerprinting Framework for Verifying Ownerships of Graph Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the ACM on Web Conference 2024, 2024

HexGen: Generative Inference of Large Language Model over Heterogeneous Environment.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

2023

HexGen: Generative Inference of Foundation Model over Heterogeneous Decentralized Environment.

[BibT_eX]

[DOI]

CoRR, 2023

Improving Automatic Parallel Training via Balanced Memory Workload Optimization.

[BibT_eX]

[DOI]

CoRR, 2023

OSDP: Optimal Sharded Data Parallel for Distributed Deep Learning.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

2022

Galvatron: Efficient Transformer Training over Multiple GPUs Using Automatic Parallelism.

[BibT_eX]

[DOI]

Proc. VLDB Endow., 2022

OSDP: Optimal Sharded Data Parallel for Distributed Deep Learning.

[BibT_eX]

[DOI]

CoRR, 2022

2020

2D-HRA: Two-Dimensional Hierarchical Ring-Based All-Reduce Algorithm in Large-Scale Distributed Machine Learning.

[BibT_eX]

[DOI]

IEEE Access, 2020

Youhe Jiang

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...