We stand with Ukraine

We stand with Ukraine

Qingxiao Sun

Orcid: 0000-0003-2927-362X

According to our database¹, Qingxiao Sun authored at least 37 papers between 2019 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

Online presence:

on orcid.org

On csauthors.net:

Bibliography

2026

APERTURE: Algorithm-System Co-optimization for Temporal Graph Network Inference.

[DOI]

,

,

,

,

,

,

,

Proceedings of the 31st ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2026

Accelerating Sparse Transformer Inference on GPU.

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the 31st ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2026

Trojan Horse: Aggregate-and-Batch for Scaling Up Sparse Direct Solvers on GPU Clusters.

[DOI]

,

,

,

,

,

,

Proceedings of the 31st ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2026

Optimizing Streaming Tensor Decomposition on GPU.

[DOI]

,

,

,

,

,

Proceedings of the 40th ACM International Conference on Supercomputing, 2026

Efficient Temporal Graph Network Training via Unified Redundancy Elimination.

[DOI]

,

,

,

,

,

,

,

,

,

,

Proceedings of the 31st ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2026

2025

xGR: Efficient Generative Recommendation Serving at Scale.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, December, 2025

νGNN: Non-Uniformly partitioned full-graph GNN training on mixed GPUs.

[DOI]

,

,

,

CCF Trans. High Perform. Comput., August, 2025

Flexible Operator Fusion for Fast Sparse Transformer with Diverse Masking on GPU.

[DOI]

,

,

,

,

,

,

,

,

CoRR, June, 2025

Convergence-aware operator-wise mixed-precision training.

[DOI]

,

,

,

CCF Trans. High Perform. Comput., February, 2025

KAMI: Communication-Avoiding General Matrix Multiplication within a Single GPU.

[DOI]

,

,

,

,

,

Proceedings of the International Conference for High Performance Computing, 2025

INSPIRIT: Adaptive Priority-based Task Scheduling for Heterogeneous Hardware.

[DOI]

,

,

,

,

,

,

,

,

,

,

Proceedings of the 2025 IEEE International Parallel and Distributed Processing Symposium, 2025

GNNPerf: Towards Effective Performance Profiling and Analysis Across GNN Frameworks.

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2025

Accelerating Complex Stencil Computations with Adaptive Fusion Strategy.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the 39th ACM International Conference on Supercomputing, 2025

EVASION: Efficient KV CAche CompreSsion vIa PrOduct QuaNtization.

[DOI]

,

,

,

,

,

Proceedings of the Design, Automation & Test in Europe Conference, 2025

PISA: Efficient Precision-Slice Framework for LLMs with Adaptive Numerical Type.

[DOI]

,

,

,

,

Proceedings of the 62nd ACM/IEEE Design Automation Conference, 2025

MILLION: MasterIng Long-Context LLM Inference Via Outlier-Immunized KV Product QuaNtization.

[DOI]

,

,

,

,

,

,

,

,

,

Proceedings of the 62nd ACM/IEEE Design Automation Conference, 2025

2024

Adaptive Auto-Tuning Framework for Global Exploration of Stencil Optimization on GPUs.

[DOI]

,

,

,

,

,

IEEE Trans. Parallel Distributed Syst., January, 2024

ScalFrag: Efficient Tiled-MTTKRP with Adaptive Launching on GPUs.

[DOI]

,

,

,

Proceedings of the IEEE International Conference on Cluster Computing, 2024

GeST: Generalized Stencil Auto-tuning Framework on GPUs.

[DOI]

Proceedings of the ACM Turing Award Celebration Conference 2024, 2024

2023

Exploiting Input Tensor Dynamics in Activation Checkpointing for Efficient Training on GPU.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023

2022

Input-Aware Sparse Tensor Storage Format Selection for Optimizing MTTKRP.

[DOI]

,

,

,

,

,

,

,

IEEE Trans. Computers, 2022

QoS-aware dynamic resource allocation with improved utilization and energy efficiency on GPU.

[DOI]

,

,

,

,

,

Parallel Comput., 2022

Mimose: An Input-Aware Checkpointing Planner for Efficient Training on GPU.

[DOI]

,

,

,

,

,

,

,

,

,

,

CoRR, 2022

CoGNN: Efficient Scheduling for Concurrent GNN Training on GPUs.

[DOI]

,

,

,

,

,

,

,

,

,

,

Proceedings of the SC22: International Conference for High Performance Computing, 2022

StencilMART: Predicting Optimization Selection for Stencil Computations across GPUs.

[DOI]

,

,

,

,

,

Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium, 2022

Towards Optimized Streaming Tensor Completion on multiple GPUs.

[DOI]

,

,

,

,

,

Proceedings of the 24th IEEE Int Conf on High Performance Computing & Communications; 8th Int Conf on Data Science & Systems; 20th Int Conf on Smart City; 8th Int Conf on Dependability in Sensor, 2022

2021

The Deep Learning Compiler: A Comprehensive Survey.

[DOI]

,

,

,

,

,

,

,

,

,

IEEE Trans. Parallel Distributed Syst., 2021

Towards efficient canonical polyadic decomposition on sunway many-core processor.

[DOI]

,

,

,

,

,

,

,

,

Inf. Sci., 2021

Highly scalable parallel genetic algorithm on Sunway many-core processors.

[DOI]

,

,

,

,

Future Gener. Comput. Syst., 2021

An optimized tensor completion library for multiple GPUs.

[DOI]

,

,

,

,

,

Proceedings of the ICS '21: 2021 International Conference on Supercomputing, 2021

Automatic Code Generation and Optimization of Large-scale Stencil Computation on Many-core Processors.

[DOI]

,

,

,

,

,

,

,

,

,

Proceedings of the ICPP 2021: 50th International Conference on Parallel Processing, Lemont, IL, USA, August 9, 2021

csTuner: Scalable Auto-tuning Framework for Complex Stencil Computation on GPUs.

[DOI]

,

,

,

,

,

,

,

Proceedings of the IEEE International Conference on Cluster Computing, 2021

2020

The Deep Learning Compiler: A Comprehensive Survey.

[DOI]

,

,

,

,

,

,

,

CoRR, 2020

SpTFS: sparse tensor format selection for MTTKRP via deep learning.

[DOI]

,

,

,

,

,

,

,

Proceedings of the International Conference for High Performance Computing, 2020

Accelerating De Novo Assembler WTDBG2 on Commodity Servers.

[DOI]

,

,

,

,

,

Proceedings of the Algorithms and Architectures for Parallel Processing, 2020

2019

Improving Thread-level Parallelism in GPUs Through Expanding Register File to Scratchpad Memory.

[DOI]

,

,

,

ACM Trans. Archit. Code Optim., 2019

SMQoS: Improving Utilization and Energy Efficiency with QoS Awareness on GPUs.

[DOI]

,

,

,

,

Proceedings of the 2019 IEEE International Conference on Cluster Computing, 2019

Loading...