Huimin Cui

Orcid: 0000-0002-2491-7679

According to our database¹, Huimin Cui authored at least 79 papers between 2007 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Bibliography

2026

BePilot: An AI Programming Assistant for Compiler Backend Development.

[BibT_eX]

[DOI]

ACM Trans. Softw. Eng. Methodol., June, 2026

Learning When to Optimize: Verified Optimization Skills from Expert GPU-Kernel Lineages.

[BibT_eX]

[DOI]

CoRR, May, 2026

Tessera: Unlocking Heterogeneous GPUs through Kernel-Granularity Disaggregation.

[BibT_eX]

[DOI]

CoRR, April, 2026

The new compiler stack: a survey on the synergy of LLMs and compilers.

[BibT_eX]

[DOI]

CCF Trans. High Perform. Comput., April, 2026

Beyond Pass-by-Pass Optimization: Intent-Driven IR Optimization with Large Language Models.

[BibT_eX]

[DOI]

CoRR, February, 2026

SYCL-MLU: unifying SIMT and SIMD in heterogeneous programming.

[BibT_eX]

[DOI]

CCF Trans. High Perform. Comput., February, 2026

Hummingbird: SLO-Oriented GPU Preemption at Microsecond-scale.

[BibT_eX]

[DOI]

CoRR, January, 2026

Progressive Low-Precision Approximation of Tensor Operators on GPUs: Enabling Greater Trade-Offs between Performance and Accuracy.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2026

From Threads to Tiles: T2T, a Compiler for CUDA-to-NPU Translation via 2D Vectorization.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2026

2025

OptiFX: Automatic Optimization for Convolutional Neural Networks with Aggressive Operator Fusion on GPUs.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., June, 2025

SRSparse: Generating Codes for High-Performance Sparse Matrix-Vector Semiring Computations.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., June, 2025

QiMeng: Fully Automated Hardware and Software Design for Processor Chip.

[BibT_eX]

[DOI]

CoRR, June, 2025

LEGO-Compiler: Enhancing Neural Compilation Through Translation Composability.

[BibT_eX]

[DOI]

CoRR, May, 2025

Output Constraints as Attack Surface: Exploiting Structured Generation to Bypass LLM Safety Mechanisms.

[BibT_eX]

[DOI]

CoRR, March, 2025

Scalable tasking runtime with parallelized builders for explicit message passing architectures.

[BibT_eX]

[DOI]

Parallel Comput., 2025

Hybrid Instruction Scheduling Algorithm for RISC-V VLIW Architecture.

[BibT_eX]

[DOI]

Int. J. Softw. Informatics, 2025

A New Design Method for Anti-Interception Waveform.

[BibT_eX]

[DOI]

Yanli Hou

Xuan Liu

Huimin Cui

IEICE Trans. Commun., 2025

Orthrus: Efficient and Timely Detection of Silent User Data Corruption in the Cloud with Resource-Adaptive Computation Validation.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGOPS 31st Symposium on Operating Systems Principles, 2025

TENSORMD: Accelerating Molecular Dynamics with a High-Performance Machine Learning Interatomic Potential.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2025

Boosting Large Language Models for System Software Retargeting: A Preliminary Study.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Software Analysis, 2025

TensorMD: Molecular Dynamics Simulation with Ab Initio Accuracy of 50 Billion Atoms.

[BibT_eX]

[DOI]

Proceedings of the 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2025

Beehive: A Scalable Disaggregated Memory Runtime Exploiting Asynchrony of Multithreaded Programs.

[BibT_eX]

[DOI]

Proceedings of the 22nd USENIX Symposium on Networked Systems Design and Implementation, 2025

IR-OptSet: An Optimization-Sensitive Dataset for Advancing LLM-Based IR Optimizer.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

SpaceServe: Spatial Multiplexing of Complementary Encoders and Decoders for Multimodal LLMs.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

Light-FP: Analyze Floating-Point Error in a Highly Condensed Approach.

[BibT_eX]

[DOI]

Proceedings of the 39th ACM International Conference on Supercomputing, 2025

TopServe: Task-Operator Co-scheduling for Efficient Multi-DNN Inference Serving on GPUs.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2025: Parallel Processing, 2025

VEGA: Automatically Generating Compiler Backends using a Pre-trained Transformer Model.

[BibT_eX]

[DOI]

Proceedings of the 23rd ACM/IEEE International Symposium on Code Generation and Optimization, 2025

Qiwu: Exploiting Ciphertext-Level SIMD Parallelism in Homomorphic Encryption Programs.

[BibT_eX]

[DOI]

Proceedings of the 23rd ACM/IEEE International Symposium on Code Generation and Optimization, 2025

2024

Pushing the Limit of Quantum Mechanical Simulation to the Raman Spectra of a Biological System with 100 Million Atoms.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2024

A Tale of Two Paths: Toward a Hybrid Data Plane for Efficient Far-Memory Applications.

[BibT_eX]

[DOI]

Proceedings of the 18th USENIX Symposium on Operating Systems Design and Implementation, 2024

ComBack: A Versatile Dataset for Enhancing Compiler Backend Development Efficiency.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Introducing Compiler Semantics into Large Language Models as Programming Language Translators: A Case Study of C to x86 Assembly.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

Optimizing Dynamic-Shape Neural Networks on Accelerators via On-the-Fly Micro-Kernel Polymerization.

[BibT_eX]

[DOI]

Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

Optimizing Deep Learning Inference via Global Analysis and Tensor Expressions.

[BibT_eX]

[DOI]

Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

2023

Automatic Target Description File Generation.

[BibT_eX]

[DOI]

J. Comput. Sci. Technol., December, 2023

Optimizing Deep Learning Inference via Global Analysis and Tensor Expressions.

[BibT_eX]

[DOI]

Dataset, October, 2023

Reinvent Cloud Software Stacks for Resource Disaggregation.

[BibT_eX]

[DOI]

J. Comput. Sci. Technol., September, 2023

Portable and Scalable All-Electron Quantum Perturbation Simulations on Exascale Supercomputers.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2023

Honeycomb: Secure and Efficient GPU Executions via Static Validation.

[BibT_eX]

[DOI]

Proceedings of the 17th USENIX Symposium on Operating Systems Design and Implementation, 2023

SIRIUS: Harvesting Whole-Program Optimization Opportunities for DNNs.

[BibT_eX]

[DOI]

Proceedings of the Sixth Conference on Machine Learning and Systems, 2023

Occamy: Elastically Sharing a SIMD Co-processor across Multiple CPU Cores.

[BibT_eX]

[DOI]

Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

2022

Scaling Poisson Solvers on Many Cores via MMEwald.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2022

2021

Unified Holistic Memory Management Supporting Multiple Big Data Processing Frameworks over Hybrid Memories.

[BibT_eX]

[DOI]

ACM Trans. Comput. Syst., 2021

Accelerating all-electron <i>ab initio</i> simulation of raman spectra for biological systems.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2021

NRHI: A Concurrent Non-Rehashing Hash Index for Persistent Memory.

[BibT_eX]

[DOI]

Xinyu Li

Huimin Cui

Lei Liu

Proceedings of the 39th IEEE International Conference on Computer Design, 2021

2020

DNNTune: Automatic Benchmarking DNN Models for Mobile-cloud Computing.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2020

Referee: A Pattern-Guided Approach for Auto Design in Compiler-Based Analyzers.

[BibT_eX]

[DOI]

Proceedings of the 27th IEEE International Conference on Software Analysis, 2020

VTensor: Using Virtual Tensors to Build a Layout-oblivious AI Programming Framework.

[BibT_eX]

[DOI]

Proceedings of the PACT '20: International Conference on Parallel Architectures and Compilation Techniques, 2020

Bandwidth-Aware Loop Tiling for DMA-Supported Scratchpad Memory.

[BibT_eX]

[DOI]

Proceedings of the PACT '20: International Conference on Parallel Architectures and Compilation Techniques, 2020

2019

Panthera: holistic memory management for big data processing over hybrid memories.

[BibT_eX]

[DOI]

Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2019

PPOpenCL: a performance-portable OpenCL compiler with host and kernel thread code fusion.

[BibT_eX]

[DOI]

Proceedings of the 28th International Conference on Compiler Construction, 2019

2018

NVM Streaker: a fast and reconfigurable performance simulator for non-volatile memory-based memory architecture.

[BibT_eX]

[DOI]

J. Supercomput., 2018

Lazygraph: lazy data coherency for replicas in distributed graph-parallel computation.

[BibT_eX]

[DOI]

Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2018

On Retargeting the AI Programming Framework to New Hardwares.

[BibT_eX]

[DOI]

Proceedings of the Network and Parallel Computing, 2018

Automating the Exchangeability of Shared Data Abstractions.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 2018

Characterizing DNN Models for Edge-Cloud Computing.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Symposium on Workload Characterization, 2018

Revisiting Loop Tiling for Datacenters: Live and Let Live.

[BibT_eX]

[DOI]

Proceedings of the 32nd International Conference on Supercomputing, 2018

2017

Parallel Incremental Frequent Itemset Mining for Large Data.

[BibT_eX]

[DOI]

Yu-Geng Song

Hui-Min Cui

Xiaobing Feng

J. Comput. Sci. Technol., 2017

2016

Predicting Cross-Core Performance Interference on Multicore Processors with Regression Analysis.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2016

Adaptive control for uncertain discrete-time systems with unknown disturbance based on RNN.

[BibT_eX]

[DOI]

Artif. Intell. Res., 2016

Articulation points guided redundancy elimination for betweenness centrality.

[BibT_eX]

[DOI]

Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2016

2015

WiseThrottling: a new asynchronous task scheduler for mitigating I/O bottleneck in large-scale datacenter servers.

[BibT_eX]

[DOI]

J. Supercomput., 2015

A novel single multiplicative neuron model trained by an improved glowworm swarm optimization algorithm for time series prediction.

[BibT_eX]

[DOI]

Knowl. Based Syst., 2015

Global μ-stability of impulsive reaction-diffusion neural networks with unbounded time-varying delays and bounded continuously distributed delays.

[BibT_eX]

[DOI]

Neurocomputing, 2015

Hadoop+: Modeling and Evaluating the Heterogeneity for MapReduce Applications in Heterogeneous Clusters.

[BibT_eX]

[DOI]

Proceedings of the 29th ACM on International Conference on Supercomputing, 2015

Research and improvement of DV_HOP localization algorithm in wireless sensor networks.

[BibT_eX]

[DOI]

Proceedings of the 2015 International Conference on Machine Learning and Cybernetics, 2015

2014

Dynamic I/O-Aware Scheduling for Batch-Mode Applications on Chip Multiprocessor Systems of Cluster Platforms.

[BibT_eX]

[DOI]

J. Comput. Sci. Technol., 2014

Specializing Compiler Optimizations through Programmable Composition for Dense Matrix Computations.

[BibT_eX]

[DOI]

Qing Yi

Qian Wang

Huimin Cui

Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014

A collaborative divide-and-conquer K-means clustering algorithm for processing large data.

[BibT_eX]

[DOI]

Proceedings of the Computing Frontiers Conference, CF'14, 2014

2013

Layout-oblivious compiler optimization for matrix computations.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2013

An empirical model for predicting cross-core performance interference on multicore processors.

[BibT_eX]

[DOI]

Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, 2013

2012

A Hybrid Circular Queue Method for Iterative Stencil Computations on GPUs.

[BibT_eX]

[DOI]

J. Comput. Sci. Technol., 2012

A Highly Parallel Reuse Distance Analysis Algorithm on GPUs.

[BibT_eX]

[DOI]

Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

Layout-oblivious optimization for matrix computations.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

2011

Automatic Library Generation for BLAS3 on GPUs.

[BibT_eX]

[DOI]

Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Extendable pattern-oriented optimization directives.

[BibT_eX]

[DOI]

Proceedings of the CGO 2011, 2011

2010

Landing Stencil Code on Godson-T.

[BibT_eX]

[DOI]

J. Comput. Sci. Technol., 2010

An adaptive task creation strategy for work-stealing scheduling.

[BibT_eX]

[DOI]

Proceedings of the CGO 2010, 2010

2007

Optimized Register Renaming Scheme for Stack-Based x86 Operations.

[BibT_eX]

[DOI]

Proceedings of the Architecture of Computing Systems, 2007

Huimin Cui

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...