Wenlei Bao

Orcid: 0009-0009-0826-8283

According to our database1, Wenlei Bao authored at least 16 papers between 2014 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
DITRON: Distributed Multi-level Tiling Compiler for Parallel Tensor Programs.
CoRR, May, 2026

MegaScale-MoE: Large-Scale Communication-Efficient Training of Mixture-of-Experts Models in Production.
Proceedings of the 21st European Conference on Computer Systems, 2026

2025
veScale: Consistent and Efficient Tensor Programming with Eager-Mode SPMD.
CoRR, September, 2025

MegaScale-MoE: Large-Scale Communication-Efficient Training of Mixture-of-Experts Models in Production.
CoRR, May, 2025

Triton-distributed: Programming Overlapping Kernels on Distributed AI Systems with the Triton Compiler.
CoRR, April, 2025

COMET: Fine-grained Computation-communication Overlapping for Mixture-of-Experts.
Proceedings of the Eighth Conference on Machine Learning and Systems, 2025

TileLink: Generating Efficient Compute-Communication Overlapping Kernels using Tile-Centric Primitives.
Proceedings of the Eighth Conference on Machine Learning and Systems, 2025

ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference.
Proceedings of the Forty-second International Conference on Machine Learning, 2025

2024
FLUX: Fast Software-based Communication Overlap On GPUs Through Kernel Fusion.
CoRR, 2024

2019
NGEMM: Optimizing GEMM for Deep Learning via Compiler-based Techniques.
CoRR, 2019

2018
Analytical modeling of cache behavior for affine programs.
Proc. ACM Program. Lang., 2018

2017
Efficient Cache Simulation for Affine Computations.
Proceedings of the Languages and Compilers for Parallel Computing, 2017

2016
Static and Dynamic Frequency Scaling on Multicore CPUs.
ACM Trans. Archit. Code Optim., 2016

PolyCheck: dynamic verification of iteration space transformations on affine programs.
Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 2016

Effective padding of multidimensional arrays to avoid cache conflict misses.
Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2016

2014
PWCET: Power-Aware Worst Case Execution Time Analysis.
Proceedings of the 43rd International Conference on Parallel Processing Workshops, 2014


  Loading...