Rangharajan Venkatesan

Proceedings of the IEEE International Solid-State Circuits Conference, 2026

2025

ThinKV: Thought-Adaptive KV Cache Compression for Efficient Reasoning Models.

[BibT_eX]

[DOI]

Akshat Ramachandran

Marina Neseem

Charbel Sakr

Tushar Krishna

CoRR, October, 2025

FGMP: Fine-Grained Mixed-Precision Weight and Activation Quantization for Hardware-Accelerated LLM Inference.

[BibT_eX]

[DOI]

Coleman Hooper

Charbel Sakr

Kurt Keutzer

Yakun Sophia Shao

CoRR, April, 2025

SQ-DM: Accelerating Diffusion Models with Aggressive Quantization and Temporal Sparsity.

[BibT_eX]

[DOI]

Zichen Fan

Dennis Sylvester

Proceedings of the 62nd ACM/IEEE Design Automation Conference, 2025

2024

Vision Transformer Computation and Resilience for Dynamic Inference.

[BibT_eX]

[DOI]

Kavya Sreedhar

Mark Horowitz

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2024

2023

A 95.6-TOPS/W Deep Learning Inference Accelerator With Per-Vector Scaled 4-bit Quantization in 5 nm.

[BibT_eX]

[DOI]

IEEE J. Solid State Circuits, 2023

Efficient Transformer Inference with Statically Structured Sparse Attention.

[BibT_eX]

[DOI]

Hasan Genc

Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

2022

LNS-Madam: Low-Precision Training in Logarithmic Number System Using Multiplicative Weight Update.

[BibT_eX]

[DOI]

Jiawei Zhao

IEEE Trans. Computers, 2022

Fair and Comprehensive Benchmarking of Machine Learning Processing Chips.

[BibT_eX]

[DOI]

Geoffrey W. Burr

Sukhwan Lim

Boris Murmann

Marian Verhelst

IEEE Des. Test, 2022

Enabling and Accelerating Dynamic Vision Transformer Inference for Real-Time Applications.

[BibT_eX]

[DOI]

Kavya Sreedhar

Mark Horowitz

CoRR, 2022

A 17-95.6 TOPS/W Deep Learning Inference Accelerator with Per-Vector Scaled 4-bit Quantization for Transformers in 5nm.

[BibT_eX]

[DOI]

Proceedings of the IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits 2022), 2022

Optimal Clipping and Magnitude-aware Differentiation for Improved Quantization-aware Training.

[BibT_eX]

[DOI]

Charbel Sakr

Brian Zimmer

William J. Dally

Proceedings of the International Conference on Machine Learning, 2022

2021

Low-Precision Training in Logarithmic Number System using Multiplicative Weight Update.

[BibT_eX]

[DOI]

Jiawei Zhao

CoRR, 2021

Verifying High-Level Latency-Insensitive Designs with Formal Model Checking.

[BibT_eX]

[DOI]

Alicia Klinefelter

Haoxing Ren

CoRR, 2021

VS-Quant: Per-vector Scaled Quantization for Accurate Low-Precision Neural Network Inference.

[BibT_eX]

[DOI]

CoRR, 2021

Simba: scaling deep-learning inference with chiplet-based architecture.

[BibT_eX]

[DOI]

Yakun Sophia Shao

Commun. ACM, 2021

VS-Quant: Per-vector Scaled Quantization for Accurate Low-Precision Neural Network Inference.

[BibT_eX]

[DOI]

Proceedings of the Fourth Conference on Machine Learning and Systems, 2021

Session 3 Overview: Highlighted Chip Releases: Modern Digital SoCs Invited Papers.

[BibT_eX]

[DOI]

Thomas Burd

Dennis Sylvester

Proceedings of the IEEE International Solid-State Circuits Conference, 2021

IPA: Floorplan-Aware SystemC Interconnect Performance Modeling and Generation for HLS-based SoCs.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2021

Softermax: Hardware/Software Co-Design of an Efficient Softmax for Transformers.

[BibT_eX]

[DOI]

Jacob R. Stevens

Proceedings of the 58th ACM/IEEE Design Automation Conference, 2021

2020

Accelerating Chip Design With Machine Learning.

[BibT_eX]

[DOI]

Yanqing Zhang

Bryan Catanzaro

William J. Dally

IEEE Micro, 2020

A 0.32-128 TOPS, Scalable Multi-Chip-Module-Based Deep Neural Network Inference Accelerator With Ground-Referenced Signaling in 16 nm.

[BibT_eX]

[DOI]

Brian Zimmer

IEEE J. Solid State Circuits, 2020

2019

A 0.11 pJ/Op, 0.32-128 TOPS, Scalable Multi-Chip-Module-based Deep Neural Network Accelerator with Ground-Reference Signaling in 16nm.

[BibT_eX]

[DOI]

Brian Zimmer

Proceedings of the 2019 Symposium on VLSI Circuits, Kyoto, Japan, June 9-14, 2019, 2019

Simba: Scaling Deep-Learning Inference with Multi-Chip-Module-Based Architecture.

[BibT_eX]

[DOI]

Yakun Sophia Shao

Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019

Timeloop: A Systematic Approach to DNN Accelerator Evaluation.

[BibT_eX]

[DOI]

Joel S. Emer

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2019

MAGNet: A Modular Accelerator Generator for Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Computer-Aided Design, 2019

A 0.11 PJ/OP, 0.32-128 Tops, Scalable Multi-Chip-Module-Based Deep Neural Network Accelerator Designed with A High-Productivity vlsi Methodology.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE Hot Chips 31 Symposium (HCS), 2019

Analog/Mixed-Signal Hardware Error Modeling for Deep Learning Inference.

[BibT_eX]

[DOI]

Proceedings of the 56th Annual Design Automation Conference 2019, 2019

Buffets: An Efficient and Composable Storage Idiom for Explicit Decoupled Data Orchestration.

[BibT_eX]

[DOI]

Christopher W. Fletcher

Joel S. Emer

Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, 2019

2018

A modular digital VLSI flow for high-productivity SoC design.

[BibT_eX]

[DOI]

Evgeni Khmer

Proceedings of the 55th Annual Design Automation Conference, 2018

2017

SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 44th Annual International Symposium on Computer Architecture, 2017

STAxCache: An approximate, energy efficient STT-MRAM cache.

[BibT_eX]

[DOI]

Ashish Ranjan

Swagath Venkataramani

Zoha Pajouhi

Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2017

2016

Emulation-Based Analysis of System-on-Chip Performance Under Variations.

[BibT_eX]

[DOI]

Sujit Dey

IEEE Trans. Very Large Scale Integr. Syst., 2016

Embedding Read-Only Memory in Spin-Transfer Torque MRAM-Based On-Chip Caches.

[BibT_eX]

[DOI]

Dongsoo Lee

IEEE Trans. Very Large Scale Integr. Syst., 2016

Cache Design with Domain Wall Memory.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2016

Spin-Transfer Torque Memories: Devices, Circuits, and Systems.

[BibT_eX]

[DOI]

Yusung Kim

Sri Harsha Choday

Proc. IEEE, 2016

Asymmetric Underlapped FinFETs for Near- and Super-Threshold Logic at Sub-10nm Technology Nodes.

[BibT_eX]

[DOI]

A. Arun Goud

ACM J. Emerg. Technol. Comput. Syst., 2016

A real-time energy-efficient superpixel hardware accelerator for mobile computer vision applications.

[BibT_eX]

[DOI]

Injoon Hong

Iuri Frosio

Proceedings of the 53rd Annual Design Automation Conference, 2016

2015

Energy-Efficient All-Spin Cache Hierarchy Using Shift-Based Writes and Multilevel Storage.

[BibT_eX]

[DOI]

ACM J. Emerg. Technol. Comput. Syst., 2015

Spintastic: spin-based stochastic logic for energy-efficient computing.

[BibT_eX]

[DOI]

Swagath Venkataramani

Shankar Ganesh Ramasubramanian

Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, 2015

DyReCTape: a dynamically reconfigurable cache using domain wall memory tapes.

[BibT_eX]

[DOI]

Ashish Ranjan

Vijay S. Pai

Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, 2015

Asymmetric underlapped FinFET based robust SRAM design at 7nm node.

[BibT_eX]

[DOI]

A. Arun Goud

Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, 2015

2014

Computing with Spintronics: Circuits and architectures

[BibT_eX]

[DOI]

Shankar Ganesh Ramasubramanian

PhD thesis, 2014

SPINDLE: SPINtronic deep learning engine for large-scale neuromorphic computing.

[BibT_eX]

[DOI]

Proceedings of the International Symposium on Low Power Electronics and Design, 2014

STAG: Spintronic-Tape Architecture for GPGPU cache hierarchies.

[BibT_eX]

[DOI]

Shankar Ganesh Ramasubramanian

Swagath Venkataramani

Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture, 2014

2013

Reading spin-torque memory with spin-torque sensors.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM International Symposium on Nanoscale Architectures, 2013

Multi-level magnetic RAM using domain wall shift for energy-efficient, high-density caches.

[BibT_eX]

[DOI]

Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED), 2013

DWM-TAPESTRI - an energy efficient all-spin cache using domain wall shift based writes.

[BibT_eX]

[DOI]

Proceedings of the Design, Automation and Test in Europe, 2013

2012

TapeCache: a high density, energy efficient cache based on domain wall memory.

[BibT_eX]

[DOI]

Proceedings of the International Symposium on Low Power Electronics and Design, 2012

2011

Energy efficient many-core processor for recognition and mining using spin-based memory.

[BibT_eX]

[DOI]

Proceedings of the 2011 IEEE/ACM International Symposium on Nanoscale Architectures, 2011

MACACO: Modeling and analysis of circuits for approximate computing.

[BibT_eX]

[DOI]

Amit Agarwal

Proceedings of the 2011 IEEE/ACM International Conference on Computer-Aided Design, 2011

VESPA: Variability emulation for System-on-Chip performance analysis.

[BibT_eX]

[DOI]