Huimin Cui

Orcid: 0000-0002-2491-7679

According to our database1, Huimin Cui authored at least 39 papers between 2007 and 2023.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2023
Portable and Scalable All-Electron Quantum Perturbation Simulations on Exascale Supercomputers.
Proceedings of the International Conference for High Performance Computing, 2023

Honeycomb: Secure and Efficient GPU Executions via Static Validation.
Proceedings of the 17th USENIX Symposium on Operating Systems Design and Implementation, 2023

Occamy: Elastically Sharing a SIMD Co-processor across Multiple CPU Cores.
Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

2022
Scaling Poisson Solvers on Many Cores via MMEwald.
IEEE Trans. Parallel Distributed Syst., 2022

2021
Unified Holistic Memory Management Supporting Multiple Big Data Processing Frameworks over Hybrid Memories.
ACM Trans. Comput. Syst., 2021

Accelerating all-electron <i>ab initio</i> simulation of raman spectra for biological systems.
Proceedings of the International Conference for High Performance Computing, 2021

NRHI: A Concurrent Non-Rehashing Hash Index for Persistent Memory.
Proceedings of the 39th IEEE International Conference on Computer Design, 2021

2020
DNNTune: Automatic Benchmarking DNN Models for Mobile-cloud Computing.
ACM Trans. Archit. Code Optim., 2020

Referee: A Pattern-Guided Approach for Auto Design in Compiler-Based Analyzers.
Proceedings of the 27th IEEE International Conference on Software Analysis, 2020

VTensor: Using Virtual Tensors to Build a Layout-oblivious AI Programming Framework.
Proceedings of the PACT '20: International Conference on Parallel Architectures and Compilation Techniques, 2020

Bandwidth-Aware Loop Tiling for DMA-Supported Scratchpad Memory.
Proceedings of the PACT '20: International Conference on Parallel Architectures and Compilation Techniques, 2020

2019
Panthera: holistic memory management for big data processing over hybrid memories.
Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2019

PPOpenCL: a performance-portable OpenCL compiler with host and kernel thread code fusion.
Proceedings of the 28th International Conference on Compiler Construction, 2019

2018
NVM Streaker: a fast and reconfigurable performance simulator for non-volatile memory-based memory architecture.
J. Supercomput., 2018

Lazygraph: lazy data coherency for replicas in distributed graph-parallel computation.
Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2018

On Retargeting the AI Programming Framework to New Hardwares.
Proceedings of the Network and Parallel Computing, 2018

Automating the Exchangeability of Shared Data Abstractions.
Proceedings of the Languages and Compilers for Parallel Computing, 2018

Characterizing DNN Models for Edge-Cloud Computing.
Proceedings of the 2018 IEEE International Symposium on Workload Characterization, 2018

Revisiting Loop Tiling for Datacenters: Live and Let Live.
Proceedings of the 32nd International Conference on Supercomputing, 2018

2016
Predicting Cross-Core Performance Interference on Multicore Processors with Regression Analysis.
IEEE Trans. Parallel Distributed Syst., 2016

Adaptive control for uncertain discrete-time systems with unknown disturbance based on RNN.
Artif. Intell. Res., 2016

Articulation points guided redundancy elimination for betweenness centrality.
Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2016

2015
WiseThrottling: a new asynchronous task scheduler for mitigating I/O bottleneck in large-scale datacenter servers.
J. Supercomput., 2015

A novel single multiplicative neuron model trained by an improved glowworm swarm optimization algorithm for time series prediction.
Knowl. Based Syst., 2015

Global μ-stability of impulsive reaction-diffusion neural networks with unbounded time-varying delays and bounded continuously distributed delays.
Neurocomputing, 2015

Hadoop+: Modeling and Evaluating the Heterogeneity for MapReduce Applications in Heterogeneous Clusters.
Proceedings of the 29th ACM on International Conference on Supercomputing, 2015

2014
Dynamic I/O-Aware Scheduling for Batch-Mode Applications on Chip Multiprocessor Systems of Cluster Platforms.
J. Comput. Sci. Technol., 2014

Specializing Compiler Optimizations through Programmable Composition for Dense Matrix Computations.
Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014

A collaborative divide-and-conquer K-means clustering algorithm for processing large data.
Proceedings of the Computing Frontiers Conference, CF'14, 2014

2013
Layout-oblivious compiler optimization for matrix computations.
ACM Trans. Archit. Code Optim., 2013

An empirical model for predicting cross-core performance interference on multicore processors.
Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, 2013

2012
Extendable pattern-oriented optimization directives.
ACM Trans. Archit. Code Optim., 2012

A Hybrid Circular Queue Method for Iterative Stencil Computations on GPUs.
J. Comput. Sci. Technol., 2012

A Highly Parallel Reuse Distance Analysis Algorithm on GPUs.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

Layout-oblivious optimization for matrix computations.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

2011
Automatic Library Generation for BLAS3 on GPUs.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

2010
Landing Stencil Code on Godson-T.
J. Comput. Sci. Technol., 2010

An adaptive task creation strategy for work-stealing scheduling.
Proceedings of the CGO 2010, 2010

2007
Optimized Register Renaming Scheme for Stack-Based x86 Operations.
Proceedings of the Architecture of Computing Systems, 2007


  Loading...