Tong Wu
Orcid: 0009-0001-0472-5178Affiliations:
- Beijing University of Posts and Telecommunications, Beijing, China
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on orcid.org
On csauthors.net:
Bibliography
2025
OptiFX: Automatic Optimization for Convolutional Neural Networks with Aggressive Operator Fusion on GPUs.
ACM Trans. Archit. Code Optim., June, 2025
Libra: Synergizing CUDA and Tensor Cores for High-Performance Sparse Matrix Multiplication.
CoRR, June, 2025
SparkAttention: high-performance multi-head attention for large models on Volta GPU architecture.
CCF Trans. High Perform. Comput., February, 2025
FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores.
Proceedings of the 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2025