Huanqi Cao

Orcid: 0000-0002-3870-106X

According to our database1, Huanqi Cao authored at least 15 papers between 2017 and 2023.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2023
Mat2Stencil: A Modular Matrix-Based DSL for Explicit and Implicit Matrix-Free PDE Solvers on Structured Grid.
Proc. ACM Program. Lang., October, 2023

TriCache: A User-Transparent Block Cache Enabling High-Performance Out-of-Core Processing with In-Memory Programs.
ACM Trans. Storage, May, 2023

PowerFusion: A Tensor Compiler with Explicit Data Movement Description and Instruction-level Graph IR.
CoRR, 2023

RWKV: Reinventing RNNs for the Transformer Era.
CoRR, 2023


2022
Design and Implementation of ShenWei Universal C/C++.
CoRR, 2022

Programming Matrices as Staged Sparse Rows to Generate Efficient Matrix-free Differential Equation Solver.
CoRR, 2022

Scaling Graph 500 SSSP to 140 Trillion Edges with over 40 Million Cores.
Proceedings of the SC22: International Conference for High Performance Computing, 2022

BaGuaLu: targeting brain scale pretrained models with over 37 million cores.
Proceedings of the PPoPP '22: 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Seoul, Republic of Korea, April 2, 2022

Scaling graph traversal to 281 trillion edges with 40 million cores.
Proceedings of the PPoPP '22: 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Seoul, Republic of Korea, April 2, 2022

2021
Chukonu: A Fully-Featured Big Data Processing System by Efficiently Integrating a Native Compute Engine into Spark.
Proc. VLDB Endow., 2021

CPM: A large-scale generative Chinese Pre-trained language model.
AI Open, 2021

Sparker: Efficient Reduction for More Scalable Machine Learning with Spark.
Proceedings of the ICPP 2021: 50th International Conference on Parallel Processing, Lemont, IL, USA, August 9, 2021

2019
T2S-Tensor: Productively Generating High-Performance Spatial Hardware for Dense Tensor Computations.
Proceedings of the 27th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2019

2017
A hierarchical grid algorithm for accelerating high-performance conjugate gradient benchmark on sunway many-core processor.
Proceedings of the 3rd International Conference on Communication and Information Processing, 2017


  Loading...