18.2 A 22nm 1.87ms/Frame Streaming Multi-Speaker ASR Accelerator Leveraging Contextual-Aware Redundancy Skipping with 2D-Writable Microscaling Compute-in-Memory and Similarity-Aware TCAM Design.

[BibT_eX]

[DOI]

Wenjie Ren

Mingxuan Li

Proceedings of the IEEE International Solid-State Circuits Conference, 2026

2025

A 22-nm Delta-Sigma Computing-In-Memory SRAM Macro With Near-Zero-Mean Outputs and LSB-First ADCs for Edge AI Processing.

[BibT_eX]

[DOI]

IEEE J. Solid State Circuits, August, 2025

13.1 A 0.22mm2 161nW Noise-Robust Voice-Activity Detection Using Information-Aware Data Compression and Neuromorphic Spatial-Temporal Feature Extraction.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Solid-State Circuits Conference, 2025

3D-MoE: Accelerating Multi-Expert Activated LLMs on 3D In/Near-Memory Computing Architecture via Hybrid Parallelism.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2025

3D-TokSIM: Stacking 3D Memory with Token-Stationary Compute-in-Memory for Speculative LLM Inference.

[BibT_eX]

[DOI]

Proceedings of the 62nd ACM/IEEE Design Automation Conference, 2025

3D-SubG: A 3D Stacked Hybrid Processing Near/In-Memory Accelerator for Subgraph GNNs.

[BibT_eX]

[DOI]

Proceedings of the 62nd ACM/IEEE Design Automation Conference, 2025

Accelerating Diffusion Transformer via Increment-Calibrated Caching with Channel-Aware Singular Value Decomposition.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024

DCIM-GCN: Digital Computing-in-Memory Accelerator for Graph Convolutional Network.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. I Regul. Pap., June, 2024

Sparsity-Aware In-Memory Neuromorphic Computing Unit With Configurable Topology of Hybrid Spiking and Artificial Neural Network.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. I Regul. Pap., June, 2024

Alternating Subspace Approximate Message Passing.

[BibT_eX]

[DOI]

CoRR, 2024

A Heterogeneous TinyML SoC with Energy-Event-Performance-Aware Management and Compute-in-Memory Two-Stage Event-Driven Wakeup.

[BibT_eX]

[DOI]

Proceedings of the IEEE Symposium on VLSI Technology and Circuits 2024, 2024

30.2 A 22nm 0.26nW/Synapse Spike-Driven Spiking Neural Network Processing Unit Using Time-Step-First Dataflow and Sparsity-Adaptive In-Memory Computing.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Solid-State Circuits Conference, 2024

AIG-CIM: A Scalable Chiplet Module with Tri-Gear Heterogeneous Compute-in-Memory for Diffusion Acceleration.

[BibT_eX]

[DOI]

Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024

An In-Memory Computing Accelerator with Reconfigurable Dataflow for Multi-Scale Vision Transformer with Hybrid Topology.

[BibT_eX]

[DOI]

Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024

S2D-CIM: A 22nm 128Kb Systolic Digital Compute-in-Memory Macro with Domino Data Path for Flexible Vector Operation and 2-D Weight Update in Edge AI Applications.

[BibT_eX]

[DOI]

Proceedings of the IEEE Custom Integrated Circuits Conference, 2024

Quartet: A 22nm 0.09mJ/lnference Digital Compute-in-Memory Versatile AI Accelerator with Heterogeneous Tensor Engines and Off-Chip-Less Dataflow.

[BibT_eX]

[DOI]

Proceedings of the IEEE Custom Integrated Circuits Conference, 2024

2023

Research progress on low-power artificial intelligence of things (AIoT) chip design.

[BibT_eX]

[DOI]

Sci. China Inf. Sci., October, 2023

An 82-nW 0.53-pJ/SOP Clock-Free Spiking Neural Network With 40-μs Latency for AIoT Wake-Up Functions Using a Multilevel-Event-Driven Bionic Architecture and Computing-in-Memory Technique.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. I Regul. Pap., 2023

A 22nm Delta-Sigma Computing-In-Memory (Δ∑CIM) SRAM Macro with Near-Zero-Mean Outputs and LSB-First ADCs Achieving 21.38TOPS/W for 8b-MAC Edge AI Processing.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Solid- State Circuits Conference, 2023

DCIM-3DRec: A 3D Reconstruction Accelerator with Digital Computing-in-Memory and Octree-Based Scheduler.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM International Symposium on Low Power Electronics and Design, 2023

A Model-Specific End-to-End Design Methodology for Resource-Constrained TinyML Hardware.

[BibT_eX]

[DOI]

Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

A A 22nm 0.43pJ/SOP Sparsity-Aware In-Memory Neuromorphic Computing System with Hybrid Spiking and Artificial Neural Network and Configurable Topology.

[BibT_eX]

[DOI]

Proceedings of the IEEE Custom Integrated Circuits Conference, 2023

RIMAC: An Array-Level ADC/DAC-Free ReRAM-Based in-Memory DNN Processor with Analog Cache and Computation.

[BibT_eX]

[DOI]

Proceedings of the 28th Asia and South Pacific Design Automation Conference, 2023

2022

A Flexible and Efficient FPGA Accelerator for Various Large-Scale and Lightweight CNNs.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. I Regul. Pap., 2022

Hybrid Stochastic-Binary Computing for Low-Latency and High-Precision Inference of CNNs.

[BibT_eX]

[DOI]

Zhiyuan Chen

Yufei Ma

Zhongfeng Wang

IEEE Trans. Circuits Syst. I Regul. Pap., 2022

DCIM-GCN: Digital Computing-in-Memory to Efficiently Accelerate Graph Convolutional Networks.

[BibT_eX]

[DOI]

Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design, 2022

2021

SWIFT: Small-World-based Structural Pruning to Accelerate DNN Inference on FPGA.

[BibT_eX]

[DOI]

Proceedings of the FPGA '21: The 2021 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, Virtual Event, USA, February 28, 2021

2020

Performance Modeling for CNN Inference Accelerators on FPGA.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2020

Automatic Compilation of Diverse CNNs Onto High-Performance FPGA Accelerators.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2020

Efficient Inference of Large-Scale and Lightweight Convolutional Neural Networks on FPGA.

[BibT_eX]

[DOI]

Xiao Wu

Yufei Ma

Zhongfeng Wang

Proceedings of the 33rd IEEE International System-on-Chip Conference, 2020

Efficient Hardware Post Processing of Anchor-Based Object Detection on FPGA.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE Computer Society Annual Symposium on VLSI, 2020

Optimizing Stochastic Computing for Low Latency Inference of Convolutional Neural Networks.

[BibT_eX]

[DOI]

Zhiyuan Chen

Yufei Ma

Zhongfeng Wang

Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2020

In-Memory Computing: The Next-Generation AI Computing Paradigm.

[BibT_eX]

[DOI]

Proceedings of the GLSVLSI '20: Great Lakes Symposium on VLSI 2020, 2020

An Efficient FPGA Accelerator Optimized for High Throughput Sparse CNN Inference.

[BibT_eX]

[DOI]

Jiayu Wen

Yufei Ma

Zhongfeng Wang

Proceedings of the 2020 IEEE Asia Pacific Conference on Circuits and Systems, 2020

2019

Efficient Network Construction Through Structural Plasticity.

[BibT_eX]

[DOI]

IEEE J. Emerg. Sel. Topics Circuits Syst., 2019

Automatic Compiler Based FPGA Accelerator for CNN Training.

[BibT_eX]

[DOI]

Shreyas Kolala Venkataramanaiah

Proceedings of the 29th International Conference on Field Programmable Logic and Applications, 2019

2018

Hardware Acceleration of Deep Convolutional Neural Networks on FPGA.

[BibT_eX]

[DOI]

Yufei Ma

PhD thesis, 2018

Optimizing the Convolution Operation to Accelerate Deep Neural Networks on FPGA.

[BibT_eX]

[DOI]

IEEE Trans. Very Large Scale Integr. Syst., 2018

ALAMO: FPGA acceleration of deep learning algorithms with a modularized RTL compiler.

[BibT_eX]

[DOI]

Integr., 2018

Algorithm-hardware co-design of single shot detector for fast object detection on FPGAs.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Computer-Aided Design, 2018

2017

End-to-end scalable FPGA accelerator for deep residual networks.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Circuits and Systems, 2017

An automatic RTL compiler for high-throughput FPGA implementation of diverse deep convolutional neural networks.

[BibT_eX]

[DOI]

Proceedings of the 27th International Conference on Field Programmable Logic and Applications, 2017

Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017

2016

Scalable and modularized RTL compilation of Convolutional Neural Networks onto FPGA.

[BibT_eX]

[DOI]

Proceedings of the 26th International Conference on Field Programmable Logic and Applications, 2016

Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2016

2015

Energy-efficient reconstruction of compressively sensed bioelectrical signals with stochastic computing circuits.

[BibT_eX]

[DOI]

Proceedings of the 33rd IEEE International Conference on Computer Design, 2015

Yufei Ma

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...