Yang Wang

Orcid: 0000-0002-8293-8881

Affiliations:
  • Tsinghua University, Institute of Microelectronics, Beijing, China


According to our database1, Yang Wang authored at least 41 papers between 2020 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
An Energy-Efficient POSIT Compute-in-Memory Macro for High-Accuracy AI Applications.
IEEE J. Solid State Circuits, August, 2025

BeaCIM: A Digital Compute-in-Memory DNN Processor With Bi-Directional Exponent Alignment for FP8 Training.
IEEE Trans. Circuits Syst. II Express Briefs, April, 2025

CV-CIM: A Hybrid Domain Xor-Derived Similarity-Aware Computation-in-Memory Supporting Cost-Volume Construction.
IEEE J. Solid State Circuits, February, 2025

A 28-nm 28.8-TOPS/W Attention-Based NN Processor With Correlative CIM Ring Architecture and Dataflow-Reshaped Digital-Assisted CIM Array.
IEEE J. Solid State Circuits, January, 2025

14.4 A 51.6TFLOPs/W Full-Datapath CIM Macro Approaching Sparsity Bound and <<sup>-30</sup> Loss for Compound AI.
Proceedings of the IEEE International Solid-State Circuits Conference, 2025

23.8 An 88.36TOPS/W Bit-Level-Weight-Compressed Large-Language-Model Accelerator with Cluster-Aligned INT-FP-GEMM and Bi-Dimensional Workflow Reformulation.
Proceedings of the IEEE International Solid-State Circuits Conference, 2025

A 28nm Value-Wise Hybrid-Domain Compute-in-Memory Macro with Heterogeneous Memory Fabric and Asynchronous Sparsity Manager.
Proceedings of the IEEE Custom Integrated Circuits Conference, 2025

2024
Ayaka: A Versatile Transformer Accelerator With Low-Rank Estimation and Heterogeneous Dataflow.
IEEE J. Solid State Circuits, October, 2024

CIMFormer: A Systolic CIM-Array-Based Transformer Accelerator With Token-Pruning-Aware Attention Reformulating and Principal Possibility Gathering.
IEEE J. Solid State Circuits, October, 2024

SOFA: A Compute-Memory Optimized Sparsity Accelerator via Cross-Stage Coordinated Tiling.
CoRR, 2024

A 52.01 TFLOPS/W Diffusion Model Processor with Inter-Time-Step Convolution-Attention-Redundancy Elimination and Bipolar Floating-Point Multiplication.
Proceedings of the IEEE Symposium on VLSI Technology and Circuits 2024, 2024

A 22nm 54.94TFLOPS/W Transformer Fine-Tuning Processor with Exponent-Stationary Re-Computing, Aggressive Linear Fitting, and Logarithmic Domain Multiplicating.
Proceedings of the IEEE Symposium on VLSI Technology and Circuits 2024, 2024

SOFA: A Compute-Memory Optimized Sparsity Accelerator via Cross-Stage Coordinated Tiling.
Proceedings of the 57th IEEE/ACM International Symposium on Microarchitecture, 2024

15.1 A 0.795fJ/bit Physically-Unclonable Function-Protected TCAM for a Software-Defined Networking Switch.
Proceedings of the IEEE International Solid-State Circuits Conference, 2024

34.1 A 28nm 83.23TFLOPS/W POSIT-Based Compute-in-Memory Macro for High-Accuracy AI Applications.
Proceedings of the IEEE International Solid-State Circuits Conference, 2024

20.2 A 28nm 74.34TFLOPS/W BF16 Heterogenous CIM-Based Accelerator Exploiting Denoising-Similarity for Diffusion Models.
Proceedings of the IEEE International Solid-State Circuits Conference, 2024

Exploiting Similarity Opportunities of Emerging Vision AI Models on Hybrid Bonding Architecture.
Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024

MECLA: Memory-Compute-Efficient LLM Accelerator with Scaling Sub-matrix Partition.
Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024

FQP: A Fibonacci Quantization Processor with Multiplication-Free Computing and Topological-Order Routing.
Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024

A 28nm 314.6TLFOPS/W Reconfigurable Floating-Point Analog Compute-In-Memory Macro with Exponent Approximation and Two-Stage Sharing TD-ADC.
Proceedings of the IEEE Custom Integrated Circuits Conference, 2024

A 28nm 118.26TOPS/W Multi-Dimensional Fault-Tolerant Al Processor Enabling Voltage-Frequency Scaling Below Point-of-First-Failure.
Proceedings of the IEEE Asian Solid-State Circuits Conference, 2024

2023
Reconfigurability, Why It Matters in AI Tasks Processing: A Survey of Reconfigurable AI Chips.
IEEE Trans. Circuits Syst. I Regul. Pap., March, 2023

An Energy-Efficient Transformer Processor Exploiting Dynamic Weak Relevances in Global Attention.
IEEE J. Solid State Circuits, 2023

A 28nm 77.35TOPS/W Similar Vectors Traceable Transformer Processor with Principal-Component-Prior Speculating and Dynamic Bit-wise Stationary Computing.
Proceedings of the 2023 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits), 2023

CV-CIM: A 28nm XOR-Derived Similarity-Aware Computation-in-Memory for Cost-Volume Construction.
Proceedings of the IEEE International Solid- State Circuits Conference, 2023

FACT: FFN-Attention Co-optimized Transformer Architecture with Eager Correlation Prediction.
Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

A 28nm 49.7TOPS/W Sparse Transformer Processor with Random-Projection-Based Speculation, Multi-Stationary Dataflow, and Redundant Partial Product Elimination.
Proceedings of the IEEE Asian Solid-State Circuits Conference, 2023

CIMFormer: A 38.9TOPS/W-8b Systolic CIM-Array Based Transformer Processor with Token-Slimmed Attention Reformulating and Principal Possibility Gathering.
Proceedings of the IEEE Asian Solid-State Circuits Conference, 2023

2022
SWPU: A 126.04 TFLOPS/W Edge-Device Sparse DNN Training Processor With Dynamic Sub-Structured Weight Pruning.
IEEE Trans. Circuits Syst. I Regul. Pap., 2022

PL-NPU: An Energy-Efficient Edge-Device DNN Training Processor With Posit-Based Logarithm-Domain Computing.
IEEE Trans. Circuits Syst. I Regul. Pap., 2022

Trainer: An Energy-Efficient Edge-Device Training Processor Supporting Dynamic Weight Pruning.
IEEE J. Solid State Circuits, 2022

HQNAS: Auto CNN deployment framework for joint quantization and architecture search.
CoRR, 2022

FAQS: Communication-efficient Federate DNN Architecture and Quantization Co-Search for personalized Hardware-aware Preferences.
CoRR, 2022

A 28nm 27.5TOPS/W Approximate-Computing-Based Transformer Processor with Asymptotic Sparsity Speculating and Out-of-Order Computing.
Proceedings of the IEEE International Solid-State Circuits Conference, 2022

2021
Erratum to "Evolver: a Deep Learning Processor With On-Device Quantization-Voltage-Frequency Tuning".
IEEE J. Solid State Circuits, 2021

Evolver: A Deep Learning Processor With On-Device Quantization-Voltage-Frequency Tuning.
IEEE J. Solid State Circuits, 2021

A 28nm 276.55TFLOPS/W Sparse Deep-Neural-Network Training Processor with Implicit Redundancy Speculation and Batch Normalization Reformulation.
Proceedings of the 2021 Symposium on VLSI Circuits, Kyoto, Japan, June 13-19, 2021, 2021

Learnable Quantization Loss Function Based on Expectation.
Proceedings of the 26th International Conference on Automation and Computing, 2021

HPPU: An Energy-Efficient Sparse DNN Training Processor with Hybrid Weight Pruning.
Proceedings of the 3rd IEEE International Conference on Artificial Intelligence Circuits and Systems, 2021

LPE: Logarithm Posit Processing Element for Energy-Efficient Edge-Device Training.
Proceedings of the 3rd IEEE International Conference on Artificial Intelligence Circuits and Systems, 2021

2020
STC: Significance-aware Transform-based Codec Framework for External Memory Access Reduction.
Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020


  Loading...