31.1 A 14.08-to-135.69Token/s ReRAM-on-Logic Stacked Outlier-Free Large-Language-Model Accelerator with Block-Clustered Weight-Compression and Adaptive Parallel-Speculative-Decoding.

[BibT_eX]

[DOI]

Pingcheng Dong

Yonghao Tan

Proceedings of the IEEE International Solid-State Circuits Conference, 2026

VAR-Turbo: Unlocking the Potential of Visual Autoregressive Models Through Dual Redundancy.

[BibT_eX]

[DOI]

Xujiang Xiang

Fengbin Tu

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2026

HR-DCIM: High-Reliability Floating-Point Digital CIM Architecture With Unified Low-Cost Iterative Error Correction.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2026

D<sup>2</sup>CIM: A 28nm 53.3 TFLOPS/W Decoding Digital CIM Macro for Efficient FlashMLA-based LLM Inference.

[BibT_eX]

[DOI]

Proceedings of the IEEE Custom Integrated Circuits Conference, 2026

2025

Exploiting the Memory-Compute-Coupling Feature for CIM Accelerator Design Optimization.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., November, 2025

CompAir: Synergizing Complementary PIMs and In-Transit NoC Computation for Efficient LLM Acceleration.

[BibT_eX]

[DOI]

CoRR, September, 2025

CV-CIM: A Hybrid Domain Xor-Derived Similarity-Aware Computation-in-Memory Supporting Cost-Volume Construction.

[BibT_eX]

[DOI]

IEEE J. Solid State Circuits, February, 2025

TensorCIM: Digital Computing-in-Memory Tensor Processor With Multichip-Module-Based Architecture for Beyond-NN Acceleration.

[BibT_eX]

[DOI]

IEEE J. Solid State Circuits, February, 2025

Rethinking Control Flow in Spatial Architectures: Insights Into Control Flow Plane Design.

[BibT_eX]

[DOI]

IEEE Trans. Computers, January, 2025

A 28nm 0.22μJ/Token Memory-Compute-Intensity-Aware CNN-Transformer Accelerator with Hybrid-Attention-Based Layer-Fusion and Cascaded Pruning for Semantic-Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Solid-State Circuits Conference, 2025

CoXplorer: Multi-Staged Co-Exploration Framework for AI Model Compression and Accelerator Design.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2025

ER-DCIM: Error-Resilient Digital CIM Architecture with Run-Time MAC-Cell Error Correction.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2025

SynDCIM: A Performance-Aware Digital Computing-in-Memory Compiler with Multi-Spec-Oriented Subcircuit Synthesis.

[BibT_eX]

[DOI]

Kwang-Ting (Tim) Cheng

Chi-Ying Tsui

Proceedings of the Design, Automation & Test in Europe Conference, 2025

PAMA: Large-Scale GNN Acceleration with Pre-Aggregation in Multi-Node Architecture.

[BibT_eX]

[DOI]

Proceedings of the 36th IEEE International Conference on Application-specific Systems, 2025

2024

DyBit: Dynamic Bit-Precision Numbers for Efficient Quantized Neural Network Inference.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., May, 2024

HDSuper: High-Quality and High Computational Utilization Edge Super-Resolution Accelerator With Hardware-Algorithm Co-Design Techniques.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. I Regul. Pap., April, 2024

MulTCIM: Digital Computing-in-Memory-Based Multimodal Transformer Accelerator With Attention-Token-Bit Hybrid Sparsity.

[BibT_eX]

[DOI]

IEEE J. Solid State Circuits, January, 2024

SWG: an architecture for sparse weight gradient computation.

[BibT_eX]

[DOI]

Sci. China Inf. Sci., 2024

A 28nm 4170-TFLOPS/W/b and 195-TFLOPS/mm<sup>2</sup>/b Multiply-Free Fully-Digital Floating-Point Compute-In-Memory Macro with Mitchell's Approximation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Symposium on VLSI Technology and Circuits 2024, 2024

ETCIM: An Error-Tolerant Digital-CIM Processor with Redundancy-Free Repair and Run-Time MAC and Cell Error Correction.

[BibT_eX]

[DOI]

Proceedings of the IEEE Symposium on VLSI Technology and Circuits 2024, 2024

Multi-Issue Butterfly Architecture for Sparse Convex Quadratic Programming.

[BibT_eX]

[DOI]

Proceedings of the 57th IEEE/ACM International Symposium on Microarchitecture, 2024

15.1 A 0.795fJ/bit Physically-Unclonable Function-Protected TCAM for a Software-Defined Networking Switch.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Solid-State Circuits Conference, 2024

20.2 A 28nm 74.34TFLOPS/W BF16 Heterogenous CIM-Based Accelerator Exploiting Denoising-Similarity for Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Solid-State Circuits Conference, 2024

Exploiting Similarity Opportunities of Emerging Vision AI Models on Hybrid Bonding Architecture.

[BibT_eX]

[DOI]

Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024

AdaP-CIM: Compute-in-Memory Based Neural Network Accelerator Using Adaptive Posit.

[BibT_eX]

[DOI]

Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2024

2023

Reconfigurability, Why It Matters in AI Tasks Processing: A Survey of Reconfigurable AI Chips.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. I Regul. Pap., March, 2023

SPCIM: Sparsity-Balanced Practical CIM Accelerator With Optimized Spatial-Temporal Multi-Macro Utilization.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. I Regul. Pap., January, 2023

STAR: An STGCN ARchitecture for Skeleton-Based Human Action Recognition.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. I Regul. Pap., 2023

SDP: Co-Designing Algorithm, Dataflow, and Architecture for In-SRAM Sparse NN Acceleration.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2023

SPG: Structure-Private Graph Database via SqueezePIR.

[BibT_eX]

[DOI]

Proc. VLDB Endow., 2023

ReDCIM: Reconfigurable Digital Computing- In -Memory Processor With Unified FP/INT Pipeline for Cloud AI Acceleration.

[BibT_eX]

[DOI]

IEEE J. Solid State Circuits, 2023

TranCIM: Full-Digital Bitline-Transpose CIM-based Sparse Transformer Accelerator With Pipeline/Parallel Reconfigurable Modes.

[BibT_eX]

[DOI]

IEEE J. Solid State Circuits, 2023

A 137.5 TOPS/W SRAM Compute-in-Memory Macro with 9-b Memory Cell-Embedded ADCs and Signal Margin Enhancement Techniques for AI Edge Applications.

[BibT_eX]

[DOI]

CoRR, 2023

Towards Efficient Control Flow Handling in Spatial Architecture via Architecting the Control Flow Plane.

[BibT_eX]

[DOI]

CoRR, 2023

Towards Efficient Control Flow Handling in Spatial Architecture via Architecting the Control Flow Plane.

[BibT_eX]

[DOI]

Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

TensorCIM: A 28nm 3.7nJ/Gather and 8.3TFLOPS/W FP32 Digital-CIM Tensor Processor for MCM-CIM-Based Beyond-NN Acceleration.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Solid- State Circuits Conference, 2023

MuITCIM: A 28nm $2.24 \mu\mathrm{J}$/Token Attention-Token-Bit Hybrid Sparse Digital CIM-Based Accelerator for Multimodal Transformers.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Solid- State Circuits Conference, 2023

ECSSD: Hardware/Data Layout Co-Designed In-Storage-Computing Architecture for Extreme Classification.

[BibT_eX]

[DOI]

Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

BIOS: A 40nm Bionic Sensor-defined 0.47pJ/SOP, 268.7TSOPs/W Configurable Spiking Neuron-in-Memory Processor for Wearable Healthcare.

[BibT_eX]

[DOI]

Kwang-Ting (Tim) Cheng

Proceedings of the 49th IEEE European Solid State Circuits Conference, 2023

PIM-HLS: An Automatic Hardware Generation Tool for Heterogeneous Processing-In-Memory-based Neural Network Accelerators.

[BibT_eX]

[DOI]

Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

AutoDCIM: An Automated Digital CIM Compiler.

[BibT_eX]

[DOI]

Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

2022

GQNA: Generic Quantized DNN Accelerator With Weight-Repetition-Aware Activation Aggregating.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. I Regul. Pap., 2022

H2Learn: High-Efficiency Learning Accelerator for High-Accuracy Spiking Neural Networks.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2022

Dynamic Sparse Attention for Scalable Transformer Acceleration.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2022

A 28nm 15.59µJ/Token Full-Digital Bitline-Transpose CIM-Based Sparse Transformer Accelerator with Pipeline/Parallel Reconfigurable Modes.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Solid-State Circuits Conference, 2022

A 28nm 29.2TFLOPS/W BF16 and 36.5TOPS/W INT8 Reconfigurable Digital CIM Processor with Unified FP/INT Pipeline and Bitwise In-Memory Booth Multiplication for Cloud Deep Learning Acceleration.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Solid-State Circuits Conference, 2022

INSPIRE: in-storage private information retrieval via protocol and architecture co-design.

[BibT_eX]

[DOI]

Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022

Accelerating Spatiotemporal Supervised Training of Large-Scale Spiking Neural Networks on GPU.

[BibT_eX]

[DOI]

Proceedings of the 2022 Design, Automation & Test in Europe Conference & Exhibition, 2022

Alleviating datapath conflicts and design centralization in graph analytics acceleration.

[BibT_eX]

[DOI]

Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022

DOTA: detect and omit weak attentions for scalable transformer acceleration.

[BibT_eX]

[DOI]

Proceedings of the ASPLOS '22: 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, 28 February 2022, 2022

2021

Erratum to "Evolver: a Deep Learning Processor With On-Device Quantization-Voltage-Frequency Tuning".

[BibT_eX]

[DOI]

IEEE J. Solid State Circuits, 2021

Evolver: A Deep Learning Processor With On-Device Quantization-Voltage-Frequency Tuning.

[BibT_eX]

[DOI]

IEEE J. Solid State Circuits, 2021

Brain-Inspired Computing: Adventure from Beyond CMOS Technologies to Beyond von Neumann Architectures ICCAD Special Session Paper.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2021

ADROIT: An Adaptive Dynamic Refresh Optimization Framework for DRAM Energy Saving In DNN Training.

[BibT_eX]

[DOI]

Proceedings of the 58th ACM/IEEE Design Automation Conference, 2021

2020

DUET: Boosting Deep Neural Network Efficiency on Dual-Module Architecture.

[BibT_eX]

[DOI]

Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020

STC: Significance-aware Transform-based Codec Framework for External Memory Access Reduction.

[BibT_eX]

[DOI]

Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020

2019

Parana: A Parallel Neural Architecture Considering Thermal Problem of 3D Stacked Memory.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2019

Reconfigurable Architecture for Neural Approximation in Multimedia Computing.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., 2019

A High Throughput Acceleration for Hybrid Neural Networks With Efficient Resource Management on FPGA.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2019

MoNA: Mobile Neural Architecture with Reconfigurable Parallel Dimensions.

[BibT_eX]

[DOI]

Proceedings of the 17th IEEE International New Circuits and Systems Conference, 2019

Towards Efficient Compact Network Training on Edge-Devices.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE Computer Society Annual Symposium on VLSI, 2019

2018

GNA: Reconfigurable and Efficient Architecture for Generative Network Acceleration.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2018

A High Energy Efficient Reconfigurable Hybrid Neural Network Processor for Deep Learning Applications.

[BibT_eX]

[DOI]

IEEE J. Solid State Circuits, 2018

An Energy Efficient JPEG Encoder with Neural Network Based Approximation and Near-Threshold Computing.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Circuits and Systems, 2018

Bit-width Adaptive Accelerator Design for Convolution Neural Network.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Circuits and Systems, 2018

RANA: Towards Efficient Neural Acceleration with Refresh-Optimized Embedded DRAM.

[BibT_eX]

[DOI]

Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture, 2018

LCP: a layer clusters paralleling mapping method for accelerating inception and residual networks on FPGA.

[BibT_eX]

[DOI]

Proceedings of the 55th Annual Design Automation Conference, 2018

2017

Deep Convolutional Neural Network Architecture With Reconfigurable Computation Patterns.

[BibT_eX]

[DOI]

IEEE Trans. Very Large Scale Integr. Syst., 2017

AEPE: An area and power efficient RRAM crossbar-based accelerator for deep CNNs.

[BibT_eX]

[DOI]

Proceedings of the IEEE 6th Non-Volatile Memory Systems and Applications Symposium, 2017

2015

Neural approximating architecture targeting multiple application domains.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Symposium on Circuits and Systems, 2015

RNA: a reconfigurable architecture for hardware neural acceleration.

[BibT_eX]

[DOI]

Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, 2015

Fengbin Tu

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...