Paul N. Whatmough

Orcid: 0000-0002-1865-6492

According to our database1, Paul N. Whatmough authored at least 88 papers between 2009 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
GPTVQ: The Blessing of Dimensionality for LLM Quantization.
CoRR, 2024

2023
A 16-nm SoC for Noise-Robust Speech and NLP Edge AI Inference With Bayesian Sound Source Separation and Attention-Based DNNs.
IEEE J. Solid State Circuits, February, 2023

PerfSAGE: Generalized Inference Performance Predictor for Arbitrary Deep Learning Models on Edge Devices.
CoRR, 2023

Fast and Accurate: Video Enhancement Using Sparse Depth.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023

A 12nm 18.1TFLOPs/W Sparse Transformer Processor with Entropy-Based Early Exit, Mixed-Precision Predication and Fine-Grained Power Management.
Proceedings of the IEEE International Solid- State Circuits Conference, 2023

AR-PIM: An Adaptive-Range Processing-in-Memory Architecture.
Proceedings of the IEEE/ACM International Symposium on Low Power Electronics and Design, 2023

Efficient Edge Inference by Selective Query.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

2022
ML-HW Co-Design of Noise-Robust TinyML Models and Always-On Analog Compute-in-Memory Edge Accelerator.
IEEE Micro, 2022

SMIV: A 16-nm 25-mm² SoC for IoT With Arm Cortex-A53, eFPGA, and Coherent Accelerators.
IEEE J. Solid State Circuits, 2022

Thales: Formulating and Estimating Architectural Vulnerability Factors for DNN Accelerators.
CoRR, 2022

Restructurable Activation Networks.
CoRR, 2022

UDC: Unified DNAS for Compressible TinyML Models.
CoRR, 2022

UDC: Unified DNAS for Compressible TinyML Models for Neural Processing Units.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Automatic Domain-Specific SoC Design for Autonomous Unmanned Aerial Vehicles.
Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022

Braum: Analyzing and Protecting Autonomous Machine Software Stack.
Proceedings of the IEEE 33rd International Symposium on Software Reliability Engineering, 2022

S2TA: Exploiting Structured Sparsity for Energy-Efficient Mobile CNN Acceleration.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022

Super-Efficient Super Resolution for Fast Adversarial Defense at the Edge.
Proceedings of the 2022 Design, Automation & Test in Europe Conference & Exhibition, 2022

2021
AnalogNets: ML-HW Co-Design of Noise-robust TinyML Models and Always-On Analog Compute-in-Memory Accelerator.
CoRR, 2021

A LiDAR-Guided Framework for Video Enhancement.
CoRR, 2021

Doping: A technique for efficient compression of LSTM models using sparse structured additive matrices.
CoRR, 2021

Machine Learning-Based Automated Design Space Exploration for Autonomous Aerial Robots.
CoRR, 2021

Information contraction in noisy binary neural networks and its implications.
CoRR, 2021

Doping: A technique for Extreme Compression of LSTM Models using Sparse Structured Additive Matrices.
Proceedings of Machine Learning and Systems 2021, 2021

MicroNets: Neural Network Architectures for Deploying TinyML Applications on Commodity Microcontrollers.
Proceedings of Machine Learning and Systems 2021, 2021

EdgeBERT: Sentence-Level Energy Optimizations for Latency-Aware Multi-Task NLP Inference.
Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

9.8 A 25mm<sup>2</sup> SoC for IoT Devices with 18ms Noise-Robust Speech-to-Text Latency via Bayesian Speech Denoising and Attention-Based Sequence-to-Sequence DNN Speech Recognition in 16nm FinFET.
Proceedings of the IEEE International Solid-State Circuits Conference, 2021

Strong data processing inequality in neural networks with noisy neurons and its implications.
Proceedings of the IEEE International Symposium on Information Theory, 2021

Debiasing Model Updates for Improving Personalized Federated Training.
Proceedings of the 38th International Conference on Machine Learning, 2021

Federated Learning Based on Dynamic Regularization.
Proceedings of the 9th International Conference on Learning Representations, 2021

SM6: A 16nm System-on-Chip for Accurate and Noise-Robust Attention-Based NLP Applications : The 33<sup>rd</sup> Hot Chips Symposium - August 22-24, 2021.
Proceedings of the IEEE Hot Chips 33 Symposium, 2021

FixyFPGA: Efficient FPGA Accelerator for Deep Neural Networks with High Element-Wise Sparsity and without External Memory Access.
Proceedings of the 31st International Conference on Field-Programmable Logic and Applications, 2021

2020
SMAUG: End-to-End Full-Stack Simulation Infrastructure for Deep Learning Workloads.
ACM Trans. Archit. Code Optim., 2020

CHIPKIT: An Agile, Reusable Open-Source Framework for Rapid Test Chip Development.
IEEE Micro, 2020

Sparse Systolic Tensor Array for Efficient CNN Hardware Acceleration.
CoRR, 2020

Compressing Language Models using Doped Kronecker Products.
CoRR, 2020

Noisy Machines: Understanding Noisy Neural Networks and Enhancing Robustness to Analog Hardware Errors Using Distillation.
CoRR, 2020

CHIPKIT: An agile, reusable open-source framework for rapid test chip development.
CoRR, 2020

Systolic Tensor Array: An Efficient Structured-Sparse GEMM Accelerator for Mobile CNN Inference.
IEEE Comput. Archit. Lett., 2020

The Sky Is Not the Limit: A Visual Performance Model for Cyber-Physical Co-Design in Autonomous Machines.
IEEE Comput. Archit. Lett., 2020

A 3mm<sup>2</sup> Programmable Bayesian Inference Accelerator for Unsupervised Machine Perception using Parallel Gibbs Sampling in 16nm.
Proceedings of the IEEE Symposium on VLSI Circuits, 2020

Searching for Winograd-aware Quantized Networks.
Proceedings of Machine Learning and Systems 2020, 2020

Mesorasi: Architecture Support for Point Cloud Analytics via Delayed-Aggregation.
Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020

A Systematic Methodology for Characterizing Scalability of DNN Accelerators using SCALE-Sim.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2020

TinyLSTMs: Efficient Neural Speech Enhancement for Hearing Aids.
Proceedings of the Interspeech 2020, 2020

ISP4ML: The Role of Image Signal Processing in Efficient Deep Learning Vision Systems.
Proceedings of the 25th International Conference on Pattern Recognition, 2020

A Scalable Bayesian Inference Accelerator for Unsupervised Learning.
Proceedings of the IEEE Hot Chips 32 Symposium, 2020

2019
A 16-nm Always-On DNN Processor With Adaptive Clocking and Multi-Cycle Banked SRAMs.
IEEE J. Solid State Circuits, 2019

Guest Editors' Introduction: Hardware and Algorithms for Energy-Constrained On-Chip Machine Learning (Part 2).
ACM J. Emerg. Technol. Comput. Syst., 2019

Guest Editors' Introduction to the Special Section on Hardware and Algorithms for Energy-Constrained On-chip Machine Learning.
ACM J. Emerg. Technol. Comput. Syst., 2019

ISP4ML: Understanding the Role of Image Signal Processing in Efficient Deep Learning Vision Systems.
CoRR, 2019

FixyNN: Efficient Hardware for Mobile Computer Vision via Transfer Learning.
CoRR, 2019

A 16nm 25mm<sup>2</sup> SoC with a 54.5x Flexibility-Efficiency Range from Dual-Core Arm Cortex-A53 to eFPGA and Cache-Coherent Accelerators.
Proceedings of the 2019 Symposium on VLSI Circuits, Kyoto, Japan, June 9-14, 2019, 2019

SpArSe: Sparse Architecture Search for CNNs on Resource-Constrained Microcontrollers.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

FixyNN: Energy-Efficient Real-Time Mobile Computer Vision Hardware Acceleration via Transfer Learning.
Proceedings of Machine Learning and Systems 2019, 2019

ASV: Accelerated Stereo Vision System.
Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019

GeST: An Automatic Framework For Generating CPU Stress-Tests.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2019

On-Chip Memory Technology Design Space Explorations for Mobile Deep Neural Network Accelerators.
Proceedings of the 56th Annual Design Automation Conference 2019, 2019

2018
DNN Engine: A 28-nm Timing-Error Tolerant Sparse Deep Neural Network Processor for IoT Applications.
IEEE J. Solid State Circuits, 2018

Energy Efficient Hardware for On-Device CNN Inference via Transfer Learning.
CoRR, 2018

SCALE-Sim: Systolic CNN Accelerator.
CoRR, 2018

Mobile Machine Learning Hardware at ARM: A Systems-on-Chip (SoC) Perspective.
CoRR, 2018

Euphrates: Algorithm-SoC Co-Design for Low-Power Mobile Continuous Vision.
Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture, 2018

A Wide Dynamic Range Sparse FC-DNN Processor with Multi-Cycle Banked SRAM Read and Adaptive Clocking in 16nm FinFET.
Proceedings of the 44th IEEE European Solid State Circuits Conference, 2018

Ares: a framework for quantifying the resilience of deep neural networks.
Proceedings of the 55th Annual Design Automation Conference, 2018

2017
Deep Learning for Computer Architects
Synthesis Lectures on Computer Architecture, Morgan & Claypool Publishers, ISBN: 978-3-031-01756-8, 2017

Power Integrity Analysis of a 28 nm Dual-Core ARM Cortex-A57 Cluster Using an All-Digital Power Delivery Monitor.
IEEE J. Solid State Circuits, 2017

14.3 A 28nm SoC with a 1.2GHz 568nJ/prediction sparse deep-neural-network engine with >0.1 timing error rate tolerance for IoT applications.
Proceedings of the 2017 IEEE International Solid-State Circuits Conference, 2017

A case for efficient accelerator design space exploration via Bayesian optimization.
Proceedings of the 2017 IEEE/ACM International Symposium on Low Power Electronics and Design, 2017

Applications of Deep Neural Networks for Ultra Low Power IoT.
Proceedings of the 2017 IEEE International Conference on Computer Design, 2017

Sub-uJ deep neural networks for embedded applications.
Proceedings of the 51st Asilomar Conference on Signals, Systems, and Computers, 2017

2016
Sequence-Aware Watermark Design for Soft IP Embedded Processors.
IEEE Trans. Very Large Scale Integr. Syst., 2016

A low-power correlator for wakeup receivers with algorithm pruning through early termination.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2016

Minerva: Enabling Low-Power, Highly-Accurate Deep Neural Network Accelerators.
Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture, 2016

2015
A 0.6V all-digital body-coupled wakeup transceiver for IoT applications.
Proceedings of the Symposium on VLSI Circuits, 2015

14.6 An all-digital power-delivery monitor for analysis of a 28nm dual-core ARM Cortex-A57 cluster.
Proceedings of the 2015 IEEE International Solid-State Circuits Conference, 2015

Analysis of adaptive clocking technique for resonant supply voltage noise mitigation.
Proceedings of the IEEE/ACM International Symposium on Low Power Electronics and Design, 2015

Modeling and characterization of the system-level Power Delivery Network for a dual-core ARM Cortex-A57 cluster in 28nm CMOS.
Proceedings of the IEEE/ACM International Symposium on Low Power Electronics and Design, 2015

2014
Precision-Energy-Throughput Scaling of Generic Matrix Multiplication and Convolution Kernels via Linear Projections.
IEEE Trans. Circuits Syst. Video Technol., 2014

A Low-Power 1-GHz Razor FIR Accelerator With Time-Borrow Tracking Pipeline and Approximate Error Correction in 65-nm CMOS.
IEEE J. Solid State Circuits, 2014

Clock-modulation based watermark for protection of embedded processors.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2014

2013
Circuit-Level Timing Error Tolerance for Low-Power DSP Filters and Transforms.
IEEE Trans. Very Large Scale Integr. Syst., 2013

A low-power 1GHz razor FIR accelerator with time-borrow tracking pipeline and approximate error correction in 65nm CMOS.
Proceedings of the 2013 IEEE International Solid-State Circuits Conference, 2013

Precision-energy-throughput scaling of generic matrix multiplication and discrete convolution kernels via linear projections.
Proceedings of the 11th IEEE Symposium on Embedded Systems for Real-time Multimedia, 2013

2012
VLSI Architecture for a Reconfigurable Spectrally Efficient FDM Baseband Transmitter.
IEEE Trans. Circuits Syst. I Regul. Pap., 2012

Selective time borrowing for DSP pipelines with hybrid voltage control loop.
Proceedings of the 17th Asia and South Pacific Design Automation Conference, 2012

2011
Error-resilient low-power DSP via path-delay shaping.
Proceedings of the 48th Design Automation Conference, 2011

2010
A robust FIR filter with in situ error detection.
Proceedings of the International Symposium on Circuits and Systems (ISCAS 2010), May 30, 2010

2009
System-Efficiency Analysis of Power Amplifier Supply-Tracking Regimes in Mobile Transmitters.
IEEE Trans. Circuits Syst. I Regul. Pap., 2009


  Loading...