We stand with Ukraine

We stand with Ukraine

Jingwei Sun

Orcid: 0000-0001-5098-1503

Affiliations:

University of Science and Technology of China, Hefei, China

According to our database¹, Jingwei Sun authored at least 43 papers between 2016 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

Online presence:

on orcid.org
on ieeexplore.ieee.org

On csauthors.net:

Bibliography

2026

Fast compiler autotuning framework using design of experiments.

[DOI]

,

,

,

,

,

CCF Trans. High Perform. Comput., April, 2026

RSH-SpMM: A Row-Structured Hybrid Kernel for Sparse Matrix-Matrix Multiplication on GPUs.

[DOI]

,

,

,

,

CoRR, March, 2026

GCL-Sampler: Discovering Kernel Similarity for Sampled GPU Simulation via Graph Contrastive Learning.

[DOI]

,

,

,

,

CoRR, March, 2026

Token Pruning for In-Context Generation in Diffusion Transformers.

[DOI]

,

,

,

,

,

CoRR, February, 2026

HOPO: Accelerating Multimodal Neural Networks Inference via Holistic Parallelism Optimization.

[DOI]

,

,

,

,

Proceedings of the 40th ACM International Conference on Supercomputing, 2026

Dual-Verbalizer with Label Correlation Modeling for Few-Shot Multi-Label Text Classification.

[DOI]

,

,

,

,

Proceedings of the Database Systems for Advanced Applications, 2026

CommitMoE: Efficient Fallback-Free MoE Inference with Offloading Under GPU Memory Constraints.

[DOI]

,

,

,

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025

Toward Efficient SpMV in Sparse LLMs via Block Extraction and Compressed Storage.

[DOI]

,

,

,

CoRR, July, 2025

GNNPilot: A Holistic Framework for High-Performance Graph Neural Network Computations on GPUs.

[DOI]

,

,

ACM Trans. Archit. Code Optim., June, 2025

Lua-LLM: Learning Unstructured-Sparsity Allocation for Large Language Models.

[DOI]

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

ConCo: Optimizing Compilation of Concurrent Tensor Programs on Shared GPU.

[DOI]

,

,

,

,

Proceedings of the 39th ACM International Conference on Supercomputing, 2025

WinRS: Accelerate Winograd Backward-Filter Convolution with Tiny Workspace.

[DOI]

,

,

,

,

,

,

Proceedings of the 54th International Conference on Parallel Processing, 2025

A Fast Sparse Triangular Solve for Structured-grid Problems on Heterogeneous Processors.

[DOI]

,

,

,

,

Proceedings of the 54th International Conference on Parallel Processing, 2025

Compiler Tuning Method Based on Program Feature Extraction and Model Prediction.

[DOI]

,

,

,

,

,

Proceedings of the Evaluation Science and Engineering, 2025

Auto-tuning Compiler Flags with Pretrained Language Models and Surrogate-guided Search.

[DOI]

,

,

,

Proceedings of the Evaluation Science and Engineering, 2025

Introducing Graph Context into Language Models through Parameter-Efficient Fine-Tuning for Lexical Relation Mining.

[DOI]

,

,

,

,

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024

LO-SpMM: Low-cost Search for High-performance SpMM Kernels on GPUs.

[DOI]

,

,

,

,

,

,

,

ACM Trans. Archit. Code Optim., December, 2024

AG-SpTRSV: An Automatic Framework to Optimize Sparse Triangular Solve on GPUs.

[DOI]

,

,

,

ACM Trans. Archit. Code Optim., December, 2024

AdaSAM: Boosting sharpness-aware minimization with adaptive learning rate and momentum for training deep neural networks.

[DOI]

,

,

,

,

,

,

,

,

Neural Networks, January, 2024

Structured Pruning for Large Language Models Using Coupled Components Elimination and Minor Fine-tuning.

[DOI]

,

,

,

Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2024, 2024

PckGNN: Optimizing Aggregation Operators with Packing Strategies in Graph Neural Networks.

[DOI]

,

,

,

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

DProbe: Profiling and Predicting Multi-tenant Deep Learning Workloads for GPU Resource Scaling.

[DOI]

,

,

,

,

Proceedings of the Euro-Par 2024: Parallel Processing, 2024

Siesta: Synthesizing Proxy Applications for MPI Programs.

[DOI]

,

,

,

,

Proceedings of the IEEE International Conference on Cluster Computing, 2024

A Learning-path based Supervised Method for Concept Prerequisite Relations Extraction in Educational Data.

[DOI]

,

,

,

,

Proceedings of the 33rd ACM International Conference on Information and Knowledge Management, 2024

DNN-Schedule: A Predictive Scheduler for Minimizing Interference of Co-located DNN Workload.

[DOI]

,

,

,

Proceedings of the Benchmarking, Measuring, and Optimizing, 2024

2023

FedLALR: Client-Specific Adaptive Learning Rates Achieve Linear Speedup for Non-IID Data.

[DOI]

,

,

,

,

,

,

CoRR, 2023

Synthesizing Proxy Applications for MPI Programs.

[DOI]

,

,

,

,

CoRR, 2023

Scalable Tracing of MPI Events and Performance Metrics.

[DOI]

,

,

,

,

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023

EC-SpMM: Efficient Compilation of SpMM Kernel on GPUs.

[DOI]

,

,

,

,

,

,

Proceedings of the 52nd International Conference on Parallel Processing, 2023

GPU Occupancy Prediction of Deep Learning Models Using Graph Neural Network.

[DOI]

,

,

,

,

,

Proceedings of the IEEE International Conference on Cluster Computing, 2023

Automated HPC Workload Generation Combining Statistical Modeling and Autoregressive Analysis.

[DOI]

,

,

Proceedings of the Benchmarking, Measuring, and Optimizing, 2023

2022

Lossy Compression of Communication Traces Using Recurrent Neural Networks.

[DOI]

,

,

,

,

IEEE Trans. Parallel Distributed Syst., 2022

Multi-Net strategy: Accelerating physics-informed neural networks for solving partial differential equations.

[DOI]

,

,

,

,

Softw. Pract. Exp., 2022

Accelerating GNN Inference by Soft Channel Pruning.

[DOI]

,

,

Proceedings of the 13th IEEE International Symposium on Parallel Architectures, 2022

2021

Performance Analysis of Graph Neural Network Frameworks.

[DOI]

,

,

,

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2021

An Efficient Channel-level Pruning for CNNs without Fine-tuning.

[DOI]

,

,

,

Proceedings of the International Joint Conference on Neural Networks, 2021

2020

Automated Performance Modeling of HPC Applications Using Machine Learning.

[DOI]

,

,

,

,

IEEE Trans. Computers, 2020

Fast Training of POI Recommendation Models Using Gradient Compression.

[DOI]

,

,

,

Proceedings of the Spatial Data and Intelligence - First International Conference, 2020

Using Small-Scale History Data to Predict Large-Scale Performance of HPC Application.

[DOI]

,

,

,

Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium Workshops, 2020

An Active Learning Method for Empirical Modeling in Performance Tuning.

[DOI]

,

,

,

Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020

2019

Constructing Skeleton for Parallel Applications with Machine Learning Methods.

[DOI]

,

,

,

,

Proceedings of the 48th International Conference on Parallel Processing, 2019

2017

Automated Performance Modeling Based on Runtime Feature Detection and Machine Learning.

[DOI]

,

,

,

Proceedings of the 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC), 2017

2016

SPLZ: An efficient algorithm for single source shortest path problem using compression method.

[DOI]

,

GeoInformatica, 2016

Loading...