Youhui Bai

Orcid: 0009-0007-6073-7011

According to our database¹, Youhui Bai authored at least 17 papers between 2017 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

nnScaler-M: Constraint-Guided and Placement-Aware Parallelization Plan Generation for Deep Learning Training.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., July, 2026

Accelerating Long-Tail Generation in Synchronous RLHF Training via Adaptive Tensor Parallelism.

[BibT_eX]

[DOI]

CoRR, May, 2026

AdaCluster: Adaptive Query-Key Clustering for Sparse Attention in Video Generation.

[BibT_eX]

[DOI]

CoRR, April, 2026

Lagom: Unleashing the Power of Communication and Computation Overlapping for Distributed LLM Training.

[BibT_eX]

[DOI]

CoRR, February, 2026

SMIDT: High-Performance Inference Framework for MoE Models with Dynamic Top-K Routing.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025

CLO: Efficient LLM Inference System with CPU-Light KVCache Offloading via Algorithm-System Co-Design.

[BibT_eX]

[DOI]

CoRR, November, 2025

A Generic, High-Performance, Compression-Aware Framework for Data Parallel DNN Training.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., July, 2025

Efficient Long-Context LLM Inference via KV Cache Clustering.

[BibT_eX]

[DOI]

CoRR, June, 2025

BigMac: A Communication-Efficient Mixture-of-Experts Model Structure for Fast Training and Inference.

[BibT_eX]

[DOI]

CoRR, February, 2025

HATA: Trainable and Hardware-Efficient Hash-Aware Top-k Attention for Scalable Large Model Inference.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2025

BigMac: A Communication-Efficient Mixture-of-Experts Model Structure for Fast Training and Inference.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024

XL3M: A Training-free Framework for LLM Length Extension Based on Segment-wise Inference.

[BibT_eX]

[DOI]

CoRR, 2024

2023

A Survey on Auto-Parallelism of Large-Scale Deep Learning Training.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., August, 2023

MPress: Democratizing Billion-Scale Model Training on Multi-GPU Servers via Memory-Saving Inter-Operator Parallelism.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2023

2021

Efficient Data Loader for Fast Sampling-Based GNN Training on Large Graphs.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2021

Gradient Compression Supercharged High-Performance Data Parallel DNN Training.

[BibT_eX]

[DOI]

Proceedings of the SOSP '21: ACM SIGOPS 28th Symposium on Operating Systems Principles, 2021

2017

PDS: An I/O-Efficient Scaling Scheme for Parity Declustered Data Layout.

[BibT_eX]

[DOI]

Proceedings of the 46th International Conference on Parallel Processing, 2017

Youhui Bai

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...