Zhenhua Zhu
Orcid: 0009-0007-9259-7180Affiliations:
- Hong Kong University of Science and Technology (HKUST), Hong Kong
- Tsinghua University, Department of Electrical Engineering, BNRist, Beijing, China (PhD 2024)
According to our database1,
Zhenhua Zhu authored at least 71 papers
between 2017 and 2026.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on orcid.org
On csauthors.net:
Bibliography
2026
ReNN-RV: Run-Time PE Reconfiguration for DNN Inference Acceleration With Custom RISC-V ISA.
IEEE Trans. Computers, May, 2026
CD-LLM: A Heterogeneous Multi-FPGA System for Batched Decoding of 70B+ LLMs Using a Compute-Dedicated Architecture.
ACM Trans. Reconfigurable Technol. Syst., March, 2026
Towards Floating Point-Based AI Acceleration: Hybrid PIM with Non-Uniform Data Format and Reduced Multiplications.
ACM Trans. Design Autom. Electr. Syst., January, 2026
STAlloc: Enhancing Memory Efficiency in Large-Scale Model Training with Spatio-Temporal Planning.
Proceedings of the 21st European Conference on Computer Systems, 2026
Efficient and Adaptable Overlapping for Computation and Communication via Signaling and Reordering.
Proceedings of the 21st European Conference on Computer Systems, 2026
SpAct-NDP: Efficient LLM Inference via Sparse Activation on NDP-GPU Heterogeneous Architecture.
Proceedings of the 31st Asia and South Pacific Design Automation Conference, 2026
2025
Cross-Layer Design and Design Automation for In-Memory Computing Based on Nonvolatile Memory Technologies.
IEEE Des. Test, December, 2025
Exploiting the Memory-Compute-Coupling Feature for CIM Accelerator Design Optimization.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., November, 2025
db-SP: Accelerating Sparse Attention for Visual Generative Models with Dual-Balanced Sequence Parallelism.
CoRR, November, 2025
Reducing Latency of LLM Search Agent via Speculation-based Algorithm-System Co-Design.
CoRR, November, 2025
Reducing GPU Memory Fragmentation via Spatio-Temporal Planning for Efficient Large-Scale Model Training.
CoRR, July, 2025
HyCTor: A Hybrid CNN-Transformer Network Accelerator With Flexible Weight/Output Stationary Dataflow and Multicore Extension.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., May, 2025
FlashOverlap: A Lightweight Design for Efficiently Overlapping Communication and Computation.
CoRR, April, 2025
REACT3D: Real-time Edge Accelerator for Incremental Training in 3D Gaussian Splatting based SLAM Systems.
Proceedings of the 58th IEEE/ACM International Symposium on Microarchitecture, 2025
Deep Neural Network Inference Partitioning in Embedded Hybrid Analog-Digital Systems.
Proceedings of the 26th International Symposium on Quality Electronic Design, 2025
How Do Errors Impact NN Accuracy on Non-Ideal Analog PIM? Fast Evaluation via an Error-Injected Robustness Metric.
Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2025
UniNDP: A Unified Compilation and Simulation Tool for Near DRAM Processing Architectures.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2025
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2025
FMC-LLM: Enabling FPGAs for Efficient Batched Decoding of 70B+ LLMs with a Memory-Centric Streaming Architecture.
Proceedings of the 2025 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2025
HPIM-NoC: A Priori-Knowledge-Based Optimization Framework for Heterogeneous PIM-Based NoCs.
Proceedings of the 62nd ACM/IEEE Design Automation Conference, 2025
PARO: Hardware-Software Co-design with Pattern-aware Reorder-based Attention Quantization in Video Generation Models.
Proceedings of the 62nd ACM/IEEE Design Automation Conference, 2025
2024
IEEE Trans. Circuits Syst. Video Technol., September, 2024
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., March, 2024
LiteVAR: Compressing Visual Autoregressive Modelling with Efficient Attention and Quantization.
CoRR, 2024
Efficient and Effective Retrieval of Dense-Sparse Hybrid Vectors using Graph-based Approximate Nearest Neighbor Search.
CoRR, 2024
Proceedings of the 37th IEEE International System-on-Chip Conference, 2024
Proceedings of the 42nd IEEE International Conference on Computer Design, 2024
Towards Floating Point-Based Attention-Free LLM: Hybrid PIM with Non-Uniform Data Format and Reduced Multiplications.
Proceedings of the 43rd IEEE/ACM International Conference on Computer-Aided Design, 2024
Proceedings of the IEEE High Performance Extreme Computing Conference, 2024
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2024
Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024
Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024
FEASTA: A Flexible and Efficient Accelerator for Sparse Tensor Algebra in Machine Learning.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024
2023
CoGNN: An Algorithm-Hardware Co-Design Approach to Accelerate GNN Inference With Minibatch Sampling.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., December, 2023
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., November, 2023
Gibbon: An Efficient Co-Exploration Framework of NN Model and Processing-In-Memory Architecture.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., November, 2023
Serving Multi-DNN Workloads on FPGAs: A Coordinated Architecture, Scheduling, and Mapping Perspective.
IEEE Trans. Computers, May, 2023
TDPP: Two-Dimensional Permutation-Based Protection of Memristive Deep Neural Networks.
CoRR, 2023
DF-GAS: a Distributed FPGA-as-a-Service Architecture towards Billion-Scale Graph-based Approximate Nearest Neighbor Search.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023
Realizing Extreme Endurance Through Fault-aware Wear Leveling and Improved Tolerance.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2023
SRAM-Based Processing-In-Memory Design with Kullback-Leibler Divergence-Based Dynamic Precision Quantization.
Proceedings of the Great Lakes Symposium on VLSI 2023, 2023
Minimizing Communication Conflicts in Network-On-Chip Based Processing-In-Memory Architecture.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2023
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2023
PIM-HLS: An Automatic Hardware Generation Tool for Heterogeneous Processing-In-Memory-based Neural Network Accelerators.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023
Processing-In-Hierarchical-Memory Architecture for Billion-Scale Approximate Nearest Neighbor Search.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023
Memory-Efficient and Real-Time SPAD-based dToF Depth Sensor with Spatial and Statistical Correlation.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023
2022
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2022
Proceedings of the 23rd IEEE International Conference on Mobile Data Management, 2022
WESCO: Weight-encoded Reliability and Security Co-design for In-memory Computing Systems.
Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2022
Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022
Exploiting Parallelism with Vertex-Clustering in Processing-In-Memory-based GCN Accelerators.
Proceedings of the 2022 Design, Automation & Test in Europe Conference & Exhibition, 2022
Proceedings of the 2022 Design, Automation & Test in Europe Conference & Exhibition, 2022
2021
ACM Trans. Design Autom. Electr. Syst., 2021
Enabling Lower-Power Charge-Domain Nonvolatile In-Memory Computing With Ferroelectric FETs.
IEEE Trans. Circuits Syst. II Express Briefs, 2021
Rerec: In-ReRAM Acceleration with Access-Aware Mapping for Personalized Recommendation.
Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2021
Reliability-Aware Training and Performance Modeling for Processing-In-Memory Systems.
Proceedings of the ASPDAC '21: 26th Asia and South Pacific Design Automation Conference, 2021
Proceedings of the 3rd IEEE International Conference on Artificial Intelligence Circuits and Systems, 2021
2020
Sci. China Inf. Sci., 2020
MNSIM 2.0: A Behavior-Level Modeling Tool for Memristor-based Neuromorphic Computing Systems.
Proceedings of the GLSVLSI '20: Great Lakes Symposium on VLSI 2020, 2020
Security Enhancement for RRAM Computing System through Obfuscating Crossbar Row Connections.
Proceedings of the 2020 Design, Automation & Test in Europe Conference & Exhibition, 2020
An Energy-Efficient Quantized and Regularized Training Framework For Processing-In-Memory Accelerators.
Proceedings of the 25th Asia and South Pacific Design Automation Conference, 2020
2019
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2019
Proceedings of the 26th IEEE International Conference on Electronics, Circuits and Systems, 2019
Proceedings of the International Conference on Computer-Aided Design, 2019
Proceedings of the 56th Annual Design Automation Conference 2019, 2019
Learning the sparsity for ReRAM: mapping and pruning sparse neural network for ReRAM based accelerator.
Proceedings of the 24th Asia and South Pacific Design Automation Conference, 2019
2018
Proceedings of the International Conference on Computer-Aided Design, 2018
Proceedings of the 2018 Design, Automation & Test in Europe Conference & Exhibition, 2018
Proceedings of the 23rd Asia and South Pacific Design Automation Conference, 2018
2017
Proceedings of the 54th Annual Design Automation Conference, 2017