Jingwei Sun

Orcid: 0000-0001-5098-1503

Affiliations:
  • University of Science and Technology of China, Hefei, China


According to our database1, Jingwei Sun authored at least 43 papers between 2016 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

Online presence:

On csauthors.net:

Bibliography

2026
Fast compiler autotuning framework using design of experiments.
CCF Trans. High Perform. Comput., April, 2026

RSH-SpMM: A Row-Structured Hybrid Kernel for Sparse Matrix-Matrix Multiplication on GPUs.
CoRR, March, 2026

GCL-Sampler: Discovering Kernel Similarity for Sampled GPU Simulation via Graph Contrastive Learning.
CoRR, March, 2026

Token Pruning for In-Context Generation in Diffusion Transformers.
CoRR, February, 2026

HOPO: Accelerating Multimodal Neural Networks Inference via Holistic Parallelism Optimization.
Proceedings of the 40th ACM International Conference on Supercomputing, 2026

Dual-Verbalizer with Label Correlation Modeling for Few-Shot Multi-Label Text Classification.
Proceedings of the Database Systems for Advanced Applications, 2026

CommitMoE: Efficient Fallback-Free MoE Inference with Offloading Under GPU Memory Constraints.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025
Toward Efficient SpMV in Sparse LLMs via Block Extraction and Compressed Storage.
CoRR, July, 2025

GNNPilot: A Holistic Framework for High-Performance Graph Neural Network Computations on GPUs.
ACM Trans. Archit. Code Optim., June, 2025

Lua-LLM: Learning Unstructured-Sparsity Allocation for Large Language Models.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

ConCo: Optimizing Compilation of Concurrent Tensor Programs on Shared GPU.
Proceedings of the 39th ACM International Conference on Supercomputing, 2025

WinRS: Accelerate Winograd Backward-Filter Convolution with Tiny Workspace.
Proceedings of the 54th International Conference on Parallel Processing, 2025

A Fast Sparse Triangular Solve for Structured-grid Problems on Heterogeneous Processors.
Proceedings of the 54th International Conference on Parallel Processing, 2025

Compiler Tuning Method Based on Program Feature Extraction and Model Prediction.
Proceedings of the Evaluation Science and Engineering, 2025

Auto-tuning Compiler Flags with Pretrained Language Models and Surrogate-guided Search.
Proceedings of the Evaluation Science and Engineering, 2025

Introducing Graph Context into Language Models through Parameter-Efficient Fine-Tuning for Lexical Relation Mining.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024
LO-SpMM: Low-cost Search for High-performance SpMM Kernels on GPUs.
ACM Trans. Archit. Code Optim., December, 2024

AG-SpTRSV: An Automatic Framework to Optimize Sparse Triangular Solve on GPUs.
ACM Trans. Archit. Code Optim., December, 2024

AdaSAM: Boosting sharpness-aware minimization with adaptive learning rate and momentum for training deep neural networks.
Neural Networks, January, 2024

Structured Pruning for Large Language Models Using Coupled Components Elimination and Minor Fine-tuning.
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2024, 2024

PckGNN: Optimizing Aggregation Operators with Packing Strategies in Graph Neural Networks.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

DProbe: Profiling and Predicting Multi-tenant Deep Learning Workloads for GPU Resource Scaling.
Proceedings of the Euro-Par 2024: Parallel Processing, 2024

Siesta: Synthesizing Proxy Applications for MPI Programs.
Proceedings of the IEEE International Conference on Cluster Computing, 2024

A Learning-path based Supervised Method for Concept Prerequisite Relations Extraction in Educational Data.
Proceedings of the 33rd ACM International Conference on Information and Knowledge Management, 2024

DNN-Schedule: A Predictive Scheduler for Minimizing Interference of Co-located DNN Workload.
Proceedings of the Benchmarking, Measuring, and Optimizing, 2024

2023
FedLALR: Client-Specific Adaptive Learning Rates Achieve Linear Speedup for Non-IID Data.
CoRR, 2023

Synthesizing Proxy Applications for MPI Programs.
CoRR, 2023

Scalable Tracing of MPI Events and Performance Metrics.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023

EC-SpMM: Efficient Compilation of SpMM Kernel on GPUs.
Proceedings of the 52nd International Conference on Parallel Processing, 2023

GPU Occupancy Prediction of Deep Learning Models Using Graph Neural Network.
Proceedings of the IEEE International Conference on Cluster Computing, 2023

Automated HPC Workload Generation Combining Statistical Modeling and Autoregressive Analysis.
Proceedings of the Benchmarking, Measuring, and Optimizing, 2023

2022
Lossy Compression of Communication Traces Using Recurrent Neural Networks.
IEEE Trans. Parallel Distributed Syst., 2022

Multi-Net strategy: Accelerating physics-informed neural networks for solving partial differential equations.
Softw. Pract. Exp., 2022

Accelerating GNN Inference by Soft Channel Pruning.
Proceedings of the 13th IEEE International Symposium on Parallel Architectures, 2022

2021
Performance Analysis of Graph Neural Network Frameworks.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2021

An Efficient Channel-level Pruning for CNNs without Fine-tuning.
Proceedings of the International Joint Conference on Neural Networks, 2021

2020
Automated Performance Modeling of HPC Applications Using Machine Learning.
IEEE Trans. Computers, 2020

Fast Training of POI Recommendation Models Using Gradient Compression.
Proceedings of the Spatial Data and Intelligence - First International Conference, 2020

Using Small-Scale History Data to Predict Large-Scale Performance of HPC Application.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium Workshops, 2020

An Active Learning Method for Empirical Modeling in Performance Tuning.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020

2019
Constructing Skeleton for Parallel Applications with Machine Learning Methods.
Proceedings of the 48th International Conference on Parallel Processing, 2019

2017
Automated Performance Modeling Based on Runtime Feature Detection and Machine Learning.
Proceedings of the 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC), 2017

2016
SPLZ: An efficient algorithm for single source shortest path problem using compression method.
GeoInformatica, 2016


  Loading...