Paul N. Whatmough

CoRR, 2020

Compressing Language Models using Doped Kronecker Products.

[BibT_eX]

[DOI]

CoRR, 2020

Noisy Machines: Understanding Noisy Neural Networks and Enhancing Robustness to Analog Hardware Errors Using Distillation.

[BibT_eX]

[DOI]

CoRR, 2020

CHIPKIT: An agile, reusable open-source framework for rapid test chip development.

[BibT_eX]

[DOI]

CoRR, 2020

Systolic Tensor Array: An Efficient Structured-Sparse GEMM Accelerator for Mobile CNN Inference.

[BibT_eX]

[DOI]

Zhi Gang Liu

IEEE Comput. Archit. Lett., 2020

The Sky Is Not the Limit: A Visual Performance Model for Cyber-Physical Co-Design in Autonomous Machines.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2020

A 3mm2 Programmable Bayesian Inference Accelerator for Unsupervised Machine Perception using Parallel Gibbs Sampling in 16nm.

[BibT_eX]

[DOI]

Proceedings of the IEEE Symposium on VLSI Circuits, 2020

Searching for Winograd-aware Quantized Networks.

[BibT_eX]

[DOI]

Javier Fernández-Marqués

Andrew Mundy

Proceedings of Machine Learning and Systems 2020, 2020

Mesorasi: Architecture Support for Point Cloud Analytics via Delayed-Aggregation.

[BibT_eX]

[DOI]

Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020

A Systematic Methodology for Characterizing Scalability of DNN Accelerators using SCALE-Sim.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2020

TinyLSTMs: Efficient Neural Speech Enhancement for Hearing Aids.

[BibT_eX]

[DOI]

Proceedings of the Interspeech 2020, 2020

ISP4ML: The Role of Image Signal Processing in Efficient Deep Learning Vision Systems.

[BibT_eX]

[DOI]

Proceedings of the 25th International Conference on Pattern Recognition, 2020

A Scalable Bayesian Inference Accelerator for Unsupervised Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE Hot Chips 32 Symposium, 2020

2019

A 16-nm Always-On DNN Processor With Adaptive Clocking and Multi-Cycle Banked SRAMs.

[BibT_eX]

[DOI]

IEEE J. Solid State Circuits, 2019

Guest Editors' Introduction: Hardware and Algorithms for Energy-Constrained On-Chip Machine Learning (Part 2).

[BibT_eX]

[DOI]

ACM J. Emerg. Technol. Comput. Syst., 2019

Guest Editors' Introduction to the Special Section on Hardware and Algorithms for Energy-Constrained On-chip Machine Learning.

[BibT_eX]

[DOI]

ACM J. Emerg. Technol. Comput. Syst., 2019

ISP4ML: Understanding the Role of Image Signal Processing in Efficient Deep Learning Vision Systems.

[BibT_eX]

[DOI]

CoRR, 2019

FixyNN: Efficient Hardware for Mobile Computer Vision via Transfer Learning.

[BibT_eX]

[DOI]

Shreyas K. Venkataramanaiah

Chuteng Zhou

Patrick Hansen

Jae-sun Seo

CoRR, 2019

A 16nm 25mm2 SoC with a 54.5x Flexibility-Efficiency Range from Dual-Core Arm Cortex-A53 to eFPGA and Cache-Coherent Accelerators.

[BibT_eX]

[DOI]

Proceedings of the 2019 Symposium on VLSI Circuits, Kyoto, Japan, June 9-14, 2019, 2019

SpArSe: Sparse Architecture Search for CNNs on Resource-Constrained Microcontrollers.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

FixyNN: Energy-Efficient Real-Time Mobile Computer Vision Hardware Acceleration via Transfer Learning.

[BibT_eX]

[DOI]

Shreyas K. Venkataramanaiah

Chuteng Zhou

Patrick Hansen

Jae-sun Seo

Proceedings of Machine Learning and Systems 2019, 2019

ASV: Accelerated Stereo Vision System.

[BibT_eX]

[DOI]

Yu Feng

Yuhao Zhu

Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019

GeST: An Automatic Framework For Generating CPU Stress-Tests.

[BibT_eX]

[DOI]

Zacharias Hadjilambrou

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2019

On-Chip Memory Technology Design Space Explorations for Mobile Deep Neural Network Accelerators.

[BibT_eX]

[DOI]

Proceedings of the 56th Annual Design Automation Conference 2019, 2019

2018

DNN Engine: A 28-nm Timing-Error Tolerant Sparse Deep Neural Network Processor for IoT Applications.

[BibT_eX]

[DOI]

IEEE J. Solid State Circuits, 2018

Energy Efficient Hardware for On-Device CNN Inference via Transfer Learning.

[BibT_eX]

[DOI]

CoRR, 2018

SCALE-Sim: Systolic CNN Accelerator.

[BibT_eX]

[DOI]

CoRR, 2018

Mobile Machine Learning Hardware at ARM: A Systems-on-Chip (SoC) Perspective.

[BibT_eX]

[DOI]

Yuhao Zhu

CoRR, 2018

Euphrates: Algorithm-SoC Co-Design for Low-Power Mobile Continuous Vision.

[BibT_eX]

[DOI]

Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture, 2018

A Wide Dynamic Range Sparse FC-DNN Processor with Multi-Cycle Banked SRAM Read and Adaptive Clocking in 16nm FinFET.

[BibT_eX]

[DOI]

Proceedings of the 44th IEEE European Solid State Circuits Conference, 2018

Ares: a framework for quantifying the resilience of deep neural networks.

[BibT_eX]

[DOI]

Proceedings of the 55th Annual Design Automation Conference, 2018

2017

Deep Learning for Computer Architects

[BibT_eX]

[DOI]

Synthesis Lectures on Computer Architecture, Morgan & Claypool Publishers, ISBN: 978-3-031-01756-8, 2017

Power Integrity Analysis of a 28 nm Dual-Core ARM Cortex-A57 Cluster Using an All-Digital Power Delivery Monitor.

[BibT_eX]

[DOI]

Zacharias Hadjilambrou

José Miguel Hernández-Lobato

IEEE J. Solid State Circuits, 2017

14.3 A 28nm SoC with a 1.2GHz 568nJ/prediction sparse deep-neural-network engine with >0.1 timing error rate tolerance for IoT applications.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Solid-State Circuits Conference, 2017

A case for efficient accelerator design space exploration via Bayesian optimization.

[BibT_eX]

[DOI]

Brandon Reagen

Proceedings of the 2017 IEEE/ACM International Symposium on Low Power Electronics and Design, 2017

Applications of Deep Neural Networks for Ultra Low Power IoT.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Computer Design, 2017

Sub-uJ deep neural networks for embedded applications.

[BibT_eX]

[DOI]

Proceedings of the 51st Asilomar Conference on Signals, Systems, and Computers, 2017

2016

Sequence-Aware Watermark Design for Soft IP Embedded Processors.

[BibT_eX]

[DOI]

IEEE Trans. Very Large Scale Integr. Syst., 2016

A low-power correlator for wakeup receivers with algorithm pruning through early termination.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Circuits and Systems, 2016

Minerva: Enabling Low-Power, Highly-Accurate Deep Neural Network Accelerators.

[BibT_eX]

[DOI]

José Miguel Hernández-Lobato

Gu-Yeon Wei

David M. Brooks

Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture, 2016

2015

A 0.6V all-digital body-coupled wakeup transceiver for IoT applications.

[BibT_eX]

[DOI]

Proceedings of the Symposium on VLSI Circuits, 2015

14.6 An all-digital power-delivery monitor for analysis of a 28nm dual-core ARM Cortex-A57 cluster.

[BibT_eX]

[DOI]

Zacharias Hadjilambrou

Proceedings of the 2015 IEEE International Solid-State Circuits Conference, 2015

Analysis of adaptive clocking technique for resonant supply voltage noise mitigation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM International Symposium on Low Power Electronics and Design, 2015

Modeling and characterization of the system-level Power Delivery Network for a dual-core ARM Cortex-A57 cluster in 28nm CMOS.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM International Symposium on Low Power Electronics and Design, 2015

2014

Precision-Energy-Throughput Scaling of Generic Matrix Multiplication and Convolution Kernels via Linear Projections.

[BibT_eX]

[DOI]

Mohammad Ashraful Anam

Yiannis Andreopoulos

IEEE Trans. Circuits Syst. Video Technol., 2014

A Low-Power 1-GHz Razor FIR Accelerator With Time-Borrow Tracking Pipeline and Approximate Error Correction in 65-nm CMOS.

[BibT_eX]

[DOI]

IEEE J. Solid State Circuits, 2014

Clock-modulation based watermark for protection of embedded processors.

[BibT_eX]

[DOI]

Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2014

2013

Circuit-Level Timing Error Tolerance for Low-Power DSP Filters and Transforms.

[BibT_eX]

[DOI]

IEEE Trans. Very Large Scale Integr. Syst., 2013

A low-power 1GHz razor FIR accelerator with time-borrow tracking pipeline and approximate error correction in 65nm CMOS.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE International Solid-State Circuits Conference, 2013

Precision-energy-throughput scaling of generic matrix multiplication and discrete convolution kernels via linear projections.

[BibT_eX]

[DOI]

Mohammad Ashraful Anam