Zhenhua Zhu

Orcid: 0009-0007-9259-7180

Affiliations:
  • Hong Kong University of Science and Technology (HKUST), Hong Kong
  • Tsinghua University, Department of Electrical Engineering, BNRist, Beijing, China (PhD 2024)


According to our database1, Zhenhua Zhu authored at least 71 papers between 2017 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

Online presence:

On csauthors.net:

Bibliography

2026
ReNN-RV: Run-Time PE Reconfiguration for DNN Inference Acceleration With Custom RISC-V ISA.
IEEE Trans. Computers, May, 2026

CD-LLM: A Heterogeneous Multi-FPGA System for Batched Decoding of 70B+ LLMs Using a Compute-Dedicated Architecture.
ACM Trans. Reconfigurable Technol. Syst., March, 2026

Towards Floating Point-Based AI Acceleration: Hybrid PIM with Non-Uniform Data Format and Reduced Multiplications.
ACM Trans. Design Autom. Electr. Syst., January, 2026

STAlloc: Enhancing Memory Efficiency in Large-Scale Model Training with Spatio-Temporal Planning.
Proceedings of the 21st European Conference on Computer Systems, 2026

Efficient and Adaptable Overlapping for Computation and Communication via Signaling and Reordering.
Proceedings of the 21st European Conference on Computer Systems, 2026

SpAct-NDP: Efficient LLM Inference via Sparse Activation on NDP-GPU Heterogeneous Architecture.
Proceedings of the 31st Asia and South Pacific Design Automation Conference, 2026

2025
Cross-Layer Design and Design Automation for In-Memory Computing Based on Nonvolatile Memory Technologies.
IEEE Des. Test, December, 2025

Exploiting the Memory-Compute-Coupling Feature for CIM Accelerator Design Optimization.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., November, 2025

db-SP: Accelerating Sparse Attention for Visual Generative Models with Dual-Balanced Sequence Parallelism.
CoRR, November, 2025

Reducing Latency of LLM Search Agent via Speculation-based Algorithm-System Co-Design.
CoRR, November, 2025

Reducing GPU Memory Fragmentation via Spatio-Temporal Planning for Efficient Large-Scale Model Training.
CoRR, July, 2025

HyCTor: A Hybrid CNN-Transformer Network Accelerator With Flexible Weight/Output Stationary Dataflow and Multicore Extension.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., May, 2025

FlashOverlap: A Lightweight Design for Efficiently Overlapping Communication and Computation.
CoRR, April, 2025

REACT3D: Real-time Edge Accelerator for Incremental Training in 3D Gaussian Splatting based SLAM Systems.
Proceedings of the 58th IEEE/ACM International Symposium on Microarchitecture, 2025

Deep Neural Network Inference Partitioning in Embedded Hybrid Analog-Digital Systems.
Proceedings of the 26th International Symposium on Quality Electronic Design, 2025

How Do Errors Impact NN Accuracy on Non-Ideal Analog PIM? Fast Evaluation via an Error-Injected Robustness Metric.
Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2025

UniNDP: A Unified Compilation and Simulation Tool for Near DRAM Processing Architectures.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2025

TB-STC: Transposable Block-wise N: M Structured Sparse Tensor Core.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2025

FMC-LLM: Enabling FPGAs for Efficient Batched Decoding of 70B+ LLMs with a Memory-Centric Streaming Architecture.
Proceedings of the 2025 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2025

HPIM-NoC: A Priori-Knowledge-Based Optimization Framework for Heterogeneous PIM-Based NoCs.
Proceedings of the 62nd ACM/IEEE Design Automation Conference, 2025

PARO: Hardware-Software Co-design with Pattern-aware Reorder-based Attention Quantization in Video Generation Models.
Proceedings of the 62nd ACM/IEEE Design Automation Conference, 2025

2024
Toward High-Accuracy and Real-Time Two-Stage Small Object Detection on FPGA.
IEEE Trans. Circuits Syst. Video Technol., September, 2024

TDPP: 2-D Permutation-Based Protection of Memristive Deep Neural Networks.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., March, 2024

LiteVAR: Compressing Visual Autoregressive Modelling with Efficient Attention and Quantization.
CoRR, 2024

Efficient and Effective Retrieval of Dense-Sparse Hybrid Vectors using Graph-based Approximate Nearest Neighbor Search.
CoRR, 2024

Efficient Deployment of Large Language Model across Cloud-Device Systems.
Proceedings of the 37th IEEE International System-on-Chip Conference, 2024

MOTPE/D: Hardware and Algorithm Co-design for Reconfigurable Neuromorphic Processor.
Proceedings of the 42nd IEEE International Conference on Computer Design, 2024

Towards Floating Point-Based Attention-Free LLM: Hybrid PIM with Non-Uniform Data Format and Reduced Multiplications.
Proceedings of the 43rd IEEE/ACM International Conference on Computer-Aided Design, 2024

GLITCHES: GPU-FPGA LLM Inference Through a Collaborative Heterogeneous System.
Proceedings of the IEEE High Performance Extreme Computing Conference, 2024

DyPIM: Dynamic-Inference-Enabled Processing - In-Memory Accelerator.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2024

DySpMM: From Fix to Dynamic for Sparse Matrix-Matrix Multiplication Accelerators.
Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024

EPIM: Efficient Processing-In-Memory Accelerators based on Epitome.
Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024

FEASTA: A Flexible and Efficient Accelerator for Sparse Tensor Algebra in Machine Learning.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

2023
CoGNN: An Algorithm-Hardware Co-Design Approach to Accelerate GNN Inference With Minibatch Sampling.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., December, 2023

MNSIM 2.0: A Behavior-Level Modeling Tool for Processing-In-Memory Architectures.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., November, 2023

Gibbon: An Efficient Co-Exploration Framework of NN Model and Processing-In-Memory Architecture.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., November, 2023

Serving Multi-DNN Workloads on FPGAs: A Coordinated Architecture, Scheduling, and Mapping Perspective.
IEEE Trans. Computers, May, 2023

TDPP: Two-Dimensional Permutation-Based Protection of Memristive Deep Neural Networks.
CoRR, 2023

DF-GAS: a Distributed FPGA-as-a-Service Architecture towards Billion-Scale Graph-based Approximate Nearest Neighbor Search.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

Realizing Extreme Endurance Through Fault-aware Wear Leveling and Improved Tolerance.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2023

SRAM-Based Processing-In-Memory Design with Kullback-Leibler Divergence-Based Dynamic Precision Quantization.
Proceedings of the Great Lakes Symposium on VLSI 2023, 2023

Minimizing Communication Conflicts in Network-On-Chip Based Processing-In-Memory Architecture.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2023

CLAP: Locality Aware and Parallel Triangle Counting with Content Addressable Memory.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2023

PIM-HLS: An Automatic Hardware Generation Tool for Heterogeneous Processing-In-Memory-based Neural Network Accelerators.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

Processing-In-Hierarchical-Memory Architecture for Billion-Scale Approximate Nearest Neighbor Search.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

Memory-Efficient and Real-Time SPAD-based dToF Depth Sensor with Spatial and Statistical Correlation.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

2022
Exploring the Potential of Low-Bit Training of Convolutional Neural Networks.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2022

Optimizing Graph-based Approximate Nearest Neighbor Search: Stronger and Smarter.
Proceedings of the 23rd IEEE International Conference on Mobile Data Management, 2022

WESCO: Weight-encoded Reliability and Security Co-design for In-memory Computing Systems.
Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2022

DIMMining: pruning-efficient and parallel graph mining on near-memory-computing.
Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022

Exploiting Parallelism with Vertex-Clustering in Processing-In-Memory-based GCN Accelerators.
Proceedings of the 2022 Design, Automation & Test in Europe Conference & Exhibition, 2022

Gibbon: Efficient Co-Exploration of NN Model and Processing-In-Memory Architecture.
Proceedings of the 2022 Design, Automation & Test in Europe Conference & Exhibition, 2022

2021
FTT-NAS: Discovering Fault-tolerant Convolutional Neural Architecture.
ACM Trans. Design Autom. Electr. Syst., 2021

Enabling Lower-Power Charge-Domain Nonvolatile In-Memory Computing With Ferroelectric FETs.
IEEE Trans. Circuits Syst. II Express Briefs, 2021

Rerec: In-ReRAM Acceleration with Access-Aware Mapping for Personalized Recommendation.
Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2021

Reliability-Aware Training and Performance Modeling for Processing-In-Memory Systems.
Proceedings of the ASPDAC '21: 26th Asia and South Pacific Design Automation Conference, 2021

MNSIM-TIME: Performance Modeling Framework for Training-In-Memory Architectures.
Proceedings of the 3rd IEEE International Conference on Artificial Intelligence Circuits and Systems, 2021

2020
FTT-NAS: Discovering Fault-Tolerant Neural Architecture.
CoRR, 2020

Efficient 16 Boolean logic and arithmetic based on bipolar oxide memristors.
Sci. China Inf. Sci., 2020

MNSIM 2.0: A Behavior-Level Modeling Tool for Memristor-based Neuromorphic Computing Systems.
Proceedings of the GLSVLSI '20: Great Lakes Symposium on VLSI 2020, 2020

Security Enhancement for RRAM Computing System through Obfuscating Crossbar Row Connections.
Proceedings of the 2020 Design, Automation & Test in Europe Conference & Exhibition, 2020

An Energy-Efficient Quantized and Regularized Training Framework For Processing-In-Memory Accelerators.
Proceedings of the 25th Asia and South Pacific Design Automation Conference, 2020

2019
TIME: A Training-in-Memory Architecture for RRAM-Based Deep Neural Networks.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2019

HDC-IM: Hyperdimensional Computing In-Memory Architecture based on RRAM.
Proceedings of the 26th IEEE International Conference on Electronics, Circuits and Systems, 2019

A General Logic Synthesis Framework for Memristor-based Logic Design.
Proceedings of the International Conference on Computer-Aided Design, 2019

A Configurable Multi-Precision CNN Computing Framework Based on Single Bit RRAM.
Proceedings of the 56th Annual Design Automation Conference 2019, 2019

Learning the sparsity for ReRAM: mapping and pruning sparse neural network for ReRAM based accelerator.
Proceedings of the 24th Asia and South Pacific Design Automation Conference, 2019

2018
Mixed size crossbar based RRAM CNN accelerator with overlapped mapping method.
Proceedings of the International Conference on Computer-Aided Design, 2018

Rescuing memristor-based computing with non-linear resistance levels.
Proceedings of the 2018 Design, Automation & Test in Europe Conference & Exhibition, 2018

Training low bitwidth convolutional neural network on RRAM.
Proceedings of the 23rd Asia and South Pacific Design Automation Conference, 2018

2017
TIME: A Training-in-memory Architecture for Memristor-based Deep Neural Networks.
Proceedings of the 54th Annual Design Automation Conference, 2017


  Loading...