Mingzhen Li

Orcid: 0000-0002-4115-9072

Affiliations:
  • Chinese Academy of Sciences, Insititute of Computing Technology, State Key Lab of Processors (SKLP), Beijing, China
  • Beihang University, Beijing, China (PhD 2023)


According to our database1, Mingzhen Li authored at least 40 papers between 2018 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

Online presence:

On csauthors.net:

Bibliography

2026
JanusPipe: Efficient Pipeline Parallel Training for Machine Learning Interatomic Potentials.
CoRR, May, 2026

Breaking the Training Barrier of Billion-Parameter Universal Machine Learning Interatomic Potentials.
CoRR, April, 2026

Efficient large-scale sparse LU factorization for fast radio frequency circuit simulation.
Int. J. High Perform. Comput. Appl., 2026

2025
Large-scale Neural Network Quantum States for ab initio Quantum Chemistry Simulations on Fugaku.
CoRR, June, 2025

29-Billion Atoms Molecular Dynamics Simulation With Ab Initio Accuracy on 35 Million Cores of New Sunway Supercomputer.
IEEE Trans. Computers, May, 2025

Scaling Neural-Network-Based Molecular Dynamics with Long-Range Electrostatic Interactions to 51 Nanoseconds per Day.
CoRR, April, 2025

An interpretable DeePMD-kit performance model for emerging supercomputers.
CCF Trans. High Perform. Comput., April, 2025

Scaling Neural-Network-Based Molecular Dynamics with Long-Range Electrostatic Interactions to 51 Nanoseconds per Day.
Dataset, April, 2025

Mario: Near Zero-cost Activation Checkpointing in Pipeline Parallelism.
Proceedings of the 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2025

Skrull: Towards Efficient Long Context Fine-tuning through Dynamic Data Scheduling.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

INSPIRIT: Adaptive Priority-based Task Scheduling for Heterogeneous Hardware.
Proceedings of the 2025 IEEE International Parallel and Distributed Processing Symposium, 2025

ESC: Effective Submanifold Convolution using Tensor Cores.
Proceedings of the 54th International Conference on Parallel Processing, 2025

Efficient Long Context Fine-tuning with Chunk Flow.
Proceedings of the Forty-second International Conference on Machine Learning, 2025

FastSpMM: Leveraging Tensor Cores for Sparse Matrix Multiplication.
Proceedings of the 22nd ACM International Conference on Computing Frontiers, 2025

2024
Towards optimized tensor code generation for deep learning on sunway many-core processor.
Frontiers Comput. Sci., April, 2024

ElasticBatch: A Learning-Augmented Elastic Scheduling System for Batch Inference on MIG.
IEEE Trans. Parallel Distributed Syst., 2024

Building a domain-specific compiler for emerging processors with a reusable approach.
Sci. China Inf. Sci., 2024

Scaling Molecular Dynamics with ab initio Accuracy to 149 Nanoseconds per Day.
Proceedings of the International Conference for High Performance Computing, 2024

Retrospection on the Performance Analysis Tools for Large-Scale HPC Programs.
Proceedings of the 31st IEEE International Conference on High Performance Computing, 2024

Accelerating Large-Scale Sparse LU Factorization for RF Circuit Simulation.
Proceedings of the Euro-Par 2024: Parallel Processing, 2024

2023
Adapting combined tiling to stencil optimizations on sunway processor.
CCF Trans. High Perform. Comput., September, 2023

EasyScale: Elastic Training with Consistent Accuracy and Improved Utilization on GPUs.
Proceedings of the International Conference for High Performance Computing, 2023

Exploiting Input Tensor Dynamics in Activation Checkpointing for Efficient Training on GPU.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023

Exploiting Subgraph Similarities for Efficient Auto-tuning of Tensor Programs.
Proceedings of the 52nd International Conference on Parallel Processing, 2023

2022
QoS-aware dynamic resource allocation with improved utilization and energy efficiency on GPU.
Parallel Comput., 2022

Mimose: An Input-Aware Checkpointing Planner for Efficient Training on GPU.
CoRR, 2022

EasyScale: Accuracy-consistent Elastic Training for Deep Learning.
CoRR, 2022

FamilySeer: Towards Optimized Tensor Codes by Exploiting Computation Subgraph Similarity.
CoRR, 2022

CoGNN: Efficient Scheduling for Concurrent GNN Training on GPUs.
Proceedings of the SC22: International Conference for High Performance Computing, 2022

Toward accelerated stencil computation by adapting tensor core unit on GPU.
Proceedings of the ICS '22: 2022 International Conference on Supercomputing, Virtual Event, June 28, 2022

2021
The Deep Learning Compiler: A Comprehensive Survey.
IEEE Trans. Parallel Distributed Syst., 2021

swMR: A Framework for Accelerating MapReduce Applications on Sunway Taihulight.
IEEE Trans. Emerg. Top. Comput., 2021

Automatic Code Generation and Optimization of Large-scale Stencil Computation on Many-core Processors.
Proceedings of the ICPP 2021: 50th International Conference on Parallel Processing, Lemont, IL, USA, August 9, 2021

PriPro: Towards Effective Privacy Protection on Edge-Cloud System running DNN Inference.
Proceedings of the 21st IEEE/ACM International Symposium on Cluster, 2021

2020
Accelerating Sparse Cholesky Factorization on Sunway Manycore Architecture.
IEEE Trans. Parallel Distributed Syst., 2020

The Deep Learning Compiler: A Comprehensive Survey.
CoRR, 2020

Real-Time Polyp Detection for Colonoscopy Video on CPU.
Proceedings of the 32nd IEEE International Conference on Tools with Artificial Intelligence, 2020

swRodinia: A Benchmark Suite for Exploiting Architecture Properties of Sunway Processor.
Proceedings of the Benchmarking, Measuring, and Optimizing, 2020

2018
Block-Checksum-Based Fault Tolerance for Matrix Multiplication on Large-Scale Parallel Systems.
Proceedings of the 20th IEEE International Conference on High Performance Computing and Communications; 16th IEEE International Conference on Smart City; 4th IEEE International Conference on Data Science and Systems, 2018

Multi-role SpTRSV on Sunway Many-Core Architecture.
Proceedings of the 20th IEEE International Conference on High Performance Computing and Communications; 16th IEEE International Conference on Smart City; 4th IEEE International Conference on Data Science and Systems, 2018


  Loading...