Guanghui He

Orcid: 0000-0002-0486-6421

Affiliations:
  • Shanghai Jiao Tong University, Shanghai, China


According to our database1, Guanghui He authored at least 91 papers between 2005 and 2025.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
IPDR: An Inter-Chiplet Priority-Driven Deadlock Resolution for 2-D/2.5-D Multichiplet Systems.
IEEE Trans. Very Large Scale Integr. Syst., September, 2025

Physics-guided atmospheric restoration network for single image dehazing.
Signal Image Video Process., September, 2025

OFQ-LLM: Outlier-Flexing Quantization for Efficient Low-Bit Large Language Model Acceleration.
IEEE Trans. Circuits Syst. I Regul. Pap., August, 2025

An Efficient Multi-View Cross-Attention Accelerator for Vision-Centric 3D Perception in Autonomous Driving.
IEEE Trans. Circuits Syst. I Regul. Pap., July, 2025

Neural Rendering Acceleration With Deferred Neural Decoding and Voxel-Centric Data Flow.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., July, 2025

Lightweight image super-resolution based on mixer-based focal modulation network.
Signal Image Video Process., July, 2025

SD-Acc: Accelerating Stable Diffusion through Phase-aware Sampling and Hardware Co-Optimizations.
CoRR, July, 2025

DESA: Dataflow Efficient Systolic Array for Acceleration of Transformers.
IEEE Trans. Computers, June, 2025

Efficient Hardware Architecture Design for Rotary Position Embedding of Large Language Models.
IEEE J. Emerg. Sel. Topics Circuits Syst., June, 2025

Adaptive Two-Range Quantization and Hardware Co-Design for Large Language Model Acceleration.
IEEE J. Emerg. Sel. Topics Circuits Syst., June, 2025

Accelerating 3D Gaussian Splatting with Neural Sorting and Axis-Oriented Rasterization.
CoRR, June, 2025

HyCTor: A Hybrid CNN-Transformer Network Accelerator With Flexible Weight/Output Stationary Dataflow and Multicore Extension.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., May, 2025

Phydiisp: a physics-guided differentiable pipeline for low-light machine vision.
Signal Image Video Process., May, 2025

Lightweight image super-resolution network based on graph-based deep learning.
Signal Image Video Process., March, 2025

COSA Plus: Enhanced Co-Operative Systolic Arrays for Attention Mechanism in Transformers.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., February, 2025

Diffusion models for image super-resolution: State-of-the-art and future directions.
Neurocomputing, 2025

Lightweight image super-resolution network based on dynamic graph message passing and convolution mixer.
Expert Syst. Appl., 2025

VEDA: Efficient LLM Generation Through Voting-based KV Cache Eviction and Dataflow-flexible Accelerator.
Proceedings of the 62nd ACM/IEEE Design Automation Conference, 2025

BitPattern: Enabling Efficient Bit-Serial Acceleration of Deep Neural Networks through Bit-Pattern Pruning.
Proceedings of the 62nd ACM/IEEE Design Automation Conference, 2025

KVO-LLM: Boosting Long-Context Generation Throughput for Batched LLM Inference.
Proceedings of the 62nd ACM/IEEE Design Automation Conference, 2025

SparseTrim: A Neural Network Accelerator Featuring On-Chip Decompression of Fine-Grained Sparse Model with 10.1TOPS/W System Energy Efficiency.
Proceedings of the IEEE Custom Integrated Circuits Conference, 2025

2024
M2M: A Fine-Grained Mapping Framework to Accelerate Multiple DNNs on a Multi-Chiplet Architecture.
IEEE Trans. Very Large Scale Integr. Syst., October, 2024

BSViT: A Bit-Serial Vision Transformer Accelerator Exploiting Dynamic Patch and Weight Bit-Group Quantization.
IEEE Trans. Circuits Syst. I Regul. Pap., September, 2024

CoDA: A Co-Design Framework for Versatile and Efficient Attention Accelerators.
IEEE Trans. Computers, August, 2024

A Broad-Spectrum and High-Throughput Compression Engine for Neural Network Processors.
IEEE Trans. Circuits Syst. II Express Briefs, July, 2024

Quantization and Hardware Architecture Co-Design for Matrix-Vector Multiplications of Large Language Models.
IEEE Trans. Circuits Syst. I Regul. Pap., June, 2024

INDM: Chiplet-Based Interconnect Network and Dataflow Mapping for DNN Accelerators.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., April, 2024

TICA: Timing Slack Inference and Clock Frequency Adaption Technique for a Deeply Pipelined Near-Threshold-Voltage Bitcoin Mining Core.
IEEE J. Solid State Circuits, February, 2024

A Precision-Scalable Deep Neural Network Accelerator With Activation Sparsity Exploitation.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., January, 2024

DTDeMo: A Deep Learning-Based Two-Stage Image Demosaicing Model With Interpolation and Enhancement.
IEEE Trans. Computational Imaging, 2024

Lightweight image super-resolution network based on extended convolution mixer.
Eng. Appl. Artif. Intell., 2024

Hardware-oriented algorithms for softmax and layer normalization of large language models.
Sci. China Inf. Sci., 2024

Efficient image super-resolution based on transformer with bidirectional interaction.
Appl. Soft Comput., 2024

VEGA: Implementing a Versatile and Efficient Deep Learning Processor with Graph-Based ALU.
Proceedings of the 42nd IEEE International Conference on Computer Design, 2024

DEFA: Efficient Deformable Attention Acceleration via Pruning-Assisted Grid-Sampling and Multi-Scale Parallel Processing.
Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024

2023
Lightweight image super-resolution based multi-order gated aggregation network.
Neural Networks, September, 2023

Lightweight image super-resolution based on deep learning: State-of-the-art and future directions.
Inf. Fusion, June, 2023

Low-Complexity Precision-Scalable Multiply-Accumulate Unit Architectures for Deep Neural Network Accelerators.
IEEE Trans. Circuits Syst. II Express Briefs, April, 2023

GEM: A Generalized Memristor Device Modeling Framework Based on Neural Network for Transient Circuit Simulation.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., March, 2023

CCSA: A 394TOPS/W Mixed-Signal GPS Accelerator with Charge-Based Correlation Computing for Signal Acquisition.
Proceedings of the IEEE International Solid- State Circuits Conference, 2023

SpOctA: A 3D Sparse Convolution Accelerator with Octree-Encoding-Based Map Search and Inherent Sparsity-Aware Processing.
Proceedings of the IEEE/ACM International Conference on Computer Aided Design, 2023

COSA:Co-Operative Systolic Arrays for Multi-head Attention Mechanism in Neural Network using Hybrid Data Reuse and Fusion Methodologies.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

FLNA: An Energy-Efficient Point Cloud Feature Learning Accelerator with Dataflow Decoupling.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023


NTIRE 2023 Challenge on Stereo Image Super-Resolution: Methods and Results.
, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Mixer-based Local Residual Network for Lightweight Image Super-resolution.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

A Simple Transformer-style Network for Lightweight Image Super-resolution.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

MUG5: Modeling of Universal Chiplet Interconnect Express (UCIe) Standard Based on gem5.
Proceedings of the 15th IEEE International Conference on ASIC, 2023

2022
Efficient Compression Methods for Wire-Spread-Based Stochastic Computing Deep Neural Networks.
IEEE Trans. Circuits Syst. II Express Briefs, 2022

XBarNet: Computationally Efficient Memristor Crossbar Model Using Convolutional Autoencoder.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2022

Balanced Spatial Feature Distillation and Pyramid Attention Network for Lightweight Image Super-resolution.
Neurocomputing, 2022

An Efficient Stochastic Convolution Accelerator based on Pseudo-Sobol Sequences.
Proceedings of the 17th ACM International Symposium on Nanoscale Architectures, 2022


Real-Time Channel Mixing Net for Mobile Image Super-Resolution.
Proceedings of the Computer Vision - ECCV 2022 Workshops, 2022

TICA: A 0.3V, Variation-Resilient 64-Stage Deeply-Pipelined Bitcoin Mining Core with Timing Slack Inference and Clock Frequency Adaption.
Proceedings of the IEEE Custom Integrated Circuits Conference, 2022

2021
A 3.85-Gb/s 8 × 8 Soft-Output MIMO Detector With Lattice-Reduction-Aided Channel Preprocessing.
IEEE Trans. Very Large Scale Integr. Syst., 2021

Efficient and Robust RRAM-Based Convolutional Weight Mapping With Shifted and Duplicated Kernel.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2021

TEANS: A Target Enhancement and Attenuated Nonmaximum Suppression Object Detector for Remote Sensing Images.
IEEE Geosci. Remote. Sens. Lett., 2021

A Low-Latency FPGA Implementation for Real-Time Object Detection.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2021

Fast FPGA-Based Emulation for ReRAM-Enabled Deep Neural Network Accelerator.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2021

Subgraph Decoupling and Rescheduling for Increased Utilization in CGRA Architecture.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2021

Reducing Memory Access Conflicts with Loop Transformation and Data Reuse on Coarse-grained Reconfigurable Architecture.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2021

CCASM: A Computation- and Communication-Aware Scheduling and Mapping Algorithm for NoC-Based DNN Accelerators.
Proceedings of the 14th IEEE International Conference on ASIC, 2021

2020
Algorithm and Architecture of an Efficient MIMO Detector With Cross-Level Parallel Tree-Search.
IEEE Trans. Very Large Scale Integr. Syst., 2020

A Hierarchical Scrubbing Technique for SEU Mitigation on SRAM-Based FPGAs.
IEEE Trans. Very Large Scale Integr. Syst., 2020

Hardware Implementation of an Improved Stochastic Computing Based Deep Neural Network Using Short Sequence Length.
IEEE Trans. Circuits Syst., 2020

An Efficient Massive MIMO Detector Based on Second-Order Richardson Iteration: From Algorithm to Flexible Architecture.
IEEE Trans. Circuits Syst., 2020

Model Order Reduction Based on Dynamic Relative Gain Array for MIMO Systems.
IEEE Trans. Circuits Syst., 2020

A Deeply Fused Detection Algorithm Based on Steepest Descent and Non-Stationary Richardson Iteration for Massive MIMO Systems.
IEEE Commun. Lett., 2020

Decoupling the Multi-Rate Dataflow Execution in Coarse-Grained Reconfigurable Array.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2020

Enabling Resistive-RAM-based Activation Functions for Deep Neural Network Acceleration.
Proceedings of the GLSVLSI '20: Great Lakes Symposium on VLSI 2020, 2020

2019
A Novel Resistive Memory-based Process-in-memory Architecture for Efficient Logic and Add Operations.
ACM Trans. Design Autom. Electr. Syst., 2019

Scale Adaptive Proposal Network for Object Detection in Remote Sensing Images.
IEEE Geosci. Remote. Sens. Lett., 2019

A Rapid Scrubbing Technique for SEU Mitigation on SRAM-Based FPGAs.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2019

AR-C3D: Action Recognition Accelerator for Human-Computer Interaction on FPGA.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2019

On-chip Learning of Multilayer Perceptron Based on Memristors with Limited Multilevel States.
Proceedings of the IEEE International Conference on Artificial Intelligence Circuits and Systems, 2019

2018
A Self-Powered 3.26-µW 70-m Wireless Temperature Sensor Node for Power Grid Monitoring.
IEEE Trans. Ind. Electron., 2018

2017
A 12-bit 4928 × 3264 pixel CMOS image signal processor for digital still cameras.
Integr., 2017

A hardware-friendly hierarchical HEVC motion estimation algorithm for UHD applications.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2017

Short path padding with multiple-Vt cells for wide-pulsed-latch based circuits at ultra-low voltage.
Proceedings of the 12th IEEE International Conference on ASIC, 2017

2016
High performance parallel turbo decoder with configurable interleaving network for LTE application.
Integr., 2016

Area-efficient HEVC IDCT/IDST architecture for 8K × 4K video decoding.
IEICE Electron. Express, 2016

2015
Design and Implementation of Flexible Dual-Mode Soft-Output MIMO Detector With Channel Preprocessing.
IEEE Trans. Circuits Syst. I Regul. Pap., 2015

Improved Iterative Receiver for Co-channel Interference Suppression in MIMO-OFDM Systems.
IEICE Trans. Fundam. Electron. Commun. Comput. Sci., 2015

2014
Improved Max-Log-MAP BICM-IDD receiver for MIMO systems.
IEICE Electron. Express, 2014

Area and throughput efficient IDCT/IDST architecture for HEVC standard.
Proceedings of the IEEE International Symposium on Circuits and Systemss, 2014

2013
A soft-output parallel stack algorithm for MIMO detection.
Proceedings of the IEEE Workshop on Signal Processing Systems, 2013

2011
Generalized interleaving network based on configurable QPP architecture for parallel turbo decoder.
Proceedings of the IEEE Workshop on Signal Processing Systems, 2011

Effective multi-standard macroblock prediction VLSI design for reconfigurable multimedia systems.
Proceedings of the International Symposium on Circuits and Systems (ISCAS 2011), 2011

2006
A single receiving chip for DVB data broadcasting system.
IEEE Trans. Consumer Electron., 2006

2005
The design and implementation of a DVB receiving chip with PCI interface.
Proceedings of the 2005 Conference on Asia South Pacific Design Automation, 2005


  Loading...