Youhui Bai

Orcid: 0009-0007-6073-7011

According to our database1, Youhui Bai authored at least 11 papers between 2017 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
A Generic, High-Performance, Compression-Aware Framework for Data Parallel DNN Training.
IEEE Trans. Parallel Distributed Syst., July, 2025

Efficient Long-Context LLM Inference via KV Cache Clustering.
CoRR, June, 2025

BigMac: A Communication-Efficient Mixture-of-Experts Model Structure for Fast Training and Inference.
CoRR, February, 2025

HATA: Trainable and Hardware-Efficient Hash-Aware Top-k Attention for Scalable Large Model Inference.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

BigMac: A Communication-Efficient Mixture-of-Experts Model Structure for Fast Training and Inference.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024
XL3M: A Training-free Framework for LLM Length Extension Based on Segment-wise Inference.
CoRR, 2024

2023
A Survey on Auto-Parallelism of Large-Scale Deep Learning Training.
IEEE Trans. Parallel Distributed Syst., August, 2023

MPress: Democratizing Billion-Scale Model Training on Multi-GPU Servers via Memory-Saving Inter-Operator Parallelism.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2023

2021
Efficient Data Loader for Fast Sampling-Based GNN Training on Large Graphs.
IEEE Trans. Parallel Distributed Syst., 2021

Gradient Compression Supercharged High-Performance Data Parallel DNN Training.
Proceedings of the SOSP '21: ACM SIGOPS 28th Symposium on Operating Systems Principles, 2021

2017
PDS: An I/O-Efficient Scaling Scheme for Parity Declustered Data Layout.
Proceedings of the 46th International Conference on Parallel Processing, 2017


  Loading...