David Brooks

Orcid: 0000-0002-0662-7889

Affiliations:
  • Harvard University, School of Engineering and Applied Sciences, Cambridge, MA, USA


According to our database1, David Brooks authored at least 247 papers between 1999 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Silent Data Corruption in Robot Operating System: A Case for End-to-End System-Level Fault Analysis Using Autonomous UAVs.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., April, 2024

Guac: Energy-Aware and SSA-Based Generation of Coarse-Grained Merged Accelerators from LLVM-IR.
CoRR, 2024

Flash: A Hybrid Private Inference Protocol for Deep CNNs with High Accuracy and Low Latency on CPU.
CoRR, 2024


CAMEL: Co-Designing AI Models and eDRAMs for Efficient On-Device Learning.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2024

GPU-based Private Information Retrieval for On-Device Machine Learning Inference.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

2023
SoCProbe: Compositional Post-Silicon Validation of Heterogeneous NoC-Based SoCs.
IEEE Des. Test, December, 2023

Abisko: Deep codesign of an architecture for spiking neural networks using novel neuromorphic materials.
Int. J. High Perform. Comput. Appl., July, 2023

Early DSE and Automatic Generation of Coarse-grained Merged Accelerators.
ACM Trans. Embed. Comput. Syst., March, 2023

A 16-nm SoC for Noise-Robust Speech and NLP Edge AI Inference With Bayesian Sound Source Separation and Attention-Based DNNs.
IEEE J. Solid State Circuits, February, 2023

A Binary-Activation, Multi-Level Weight RNN and Training Algorithm for ADC-/DAC-Free and Noise-Resilient Processing-in-Memory Inference With eNVM.
IEEE Trans. Emerg. Top. Comput., 2023

Trireme: Exploration of Hierarchical Multi-level Parallelism for Hardware Acceleration.
ACM Trans. Embed. Comput. Syst., 2023

Architectural CO<sub>2</sub> Footprint Tool: Designing Sustainable Computer Systems With an Architectural Carbon Modeling Tool.
IEEE Micro, 2023

Generative AI Beyond LLMs: System Implications of Multi-Modal Generation.
CoRR, 2023

Hardware Resilience Properties of Text-Guided Image Classifiers.
CoRR, 2023

Carbon Responder: Coordinating Demand Response for the Datacenter Fleet.
CoRR, 2023

MAD Max Beyond Single-Node: Enabling Large Machine Learning Model Acceleration on Distributed Systems.
CoRR, 2023

Guess & Sketch: Language Model Guided Transpilation.
CoRR, 2023

INT2.1: Towards Fine-Tunable Quantized Large Language Models with Error Correction through Low-Rank Adaptation.
CoRR, 2023

CAMEL: Co-Designing AI Models and Embedded DRAMs for Efficient On-Device Learning.
CoRR, 2023

Design Space Exploration and Optimization for Carbon-Efficient Extended Reality Systems.
CoRR, 2023

GreenScale: Carbon-Aware Systems for Edge Computing.
CoRR, 2023

PerfSAGE: Generalized Inference Performance Predictor for Arbitrary Deep Learning Models on Edge Devices.
CoRR, 2023

Hardware Resilience Properties of Text-Guided Image Classifiers.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

S<sup>3</sup>: Increasing GPU Utilization during Generative Inference for Higher Throughput.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

A 12nm 18.1TFLOPs/W Sparse Transformer Processor with Entropy-Based Early Exit, Mixed-Precision Predication and Fine-Grained Power Management.
Proceedings of the IEEE International Solid- State Circuits Conference, 2023

Is the Future Cold or Tall? Design Space Exploration of Cryogenic and 3D Embedded Cache Memory.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2023

Characterizing the Scalability of Graph Convolutional Networks on Intel<sup>®</sup> PIUMA.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2023

VelociTI: An Architecture-level Performance Modeling Framework for Trapped Ion Quantum Computers.
Proceedings of the IEEE International Symposium on Workload Characterization, 2023

Carbon-Efficient Design Optimization for Computing Systems.
Proceedings of the 2nd Workshop on Sustainable Computer Systems, 2023

MAVFI: An End-to-End Fault Analysis Framework with Anomaly Detection and Recovery for Micro Aerial Vehicles.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2023

MP-Rec: Hardware-Software Co-design to Enable Multi-path Recommendation.
Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

Carbon Explorer: A Holistic Framework for Designing Carbon Aware Datacenters.
Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

2022
End-to-End Synthesis of Dynamically Controlled Machine Learning Accelerators.
IEEE Trans. Computers, 2022

Chasing Carbon: The Elusive Environmental Footprint of Computing.
IEEE Micro, 2022

Bridging Python to Silicon: The SODA Toolchain.
IEEE Micro, 2022

SMIV: A 16-nm 25-mm² SoC for IoT With Arm Cortex-A53, eFPGA, and Coherent Accelerators.
IEEE J. Solid State Circuits, 2022

Architectural Implications of Embedding Dimension during GCN on CPU and GPU.
CoRR, 2022

Impala: Low-Latency, Communication-Efficient Private Deep Learning Inference.
CoRR, 2022

Tabula: Efficiently Computing Nonlinear Activation Functions for Secure Neural Network Inference.
CoRR, 2022

A Holistic Approach for Designing Carbon Aware Datacenters.
CoRR, 2022

Trireme: Exploring Hierarchical Multi-Level Parallelism for Domain Specific Hardware Acceleration.
CoRR, 2022


Automatic Domain-Specific SoC Design for Autonomous Unmanned Aerial Vehicles.
Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022

ACT: designing sustainable computer systems with an architectural carbon modeling tool.
Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022

ASAP: automatic synthesis of area-efficient and precision-aware CGRAs.
Proceedings of the ICS '22: 2022 International Conference on Supercomputing, Virtual Event, June 28, 2022

A Scalable Methodology for Agile Chip Development with Open-Source Hardware Components.
Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design, 2022

NVMExplorer: A Framework for Cross-Stack Comparisons of Embedded Non-Volatile Memories.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022

CoopMC: Algorithm-Architecture Co-Optimization for Markov Chain Monte Carlo Accelerators.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022

From High-Level Frameworks to custom Silicon with SODA.
Proceedings of the 2022 IEEE Hot Chips 34 Symposium, 2022

A 12nm Agile-Designed SoC for Swarm-Based Perception with Heterogeneous IP Blocks, a Reconfigurable Memory Hierarchy, and an 800MHz Multi-Plane NoC.
Proceedings of the 48th IEEE European Solid State Circuits Conference, 2022

GoldenEye: A Platform for Evaluating Emerging Numerical Data Formats in DNN Accelerators.
Proceedings of the 52nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2022

OMU: A Probabilistic 3D Occupancy Mapping Accelerator for Real-time OctoMap at the Edge.
Proceedings of the 2022 Design, Automation & Test in Europe Conference & Exhibition, 2022

A joint management middleware to improve training performance of deep recommendation systems with SSDs.
Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022

2021
Exploiting Parallelism Opportunities with Deep Learning Frameworks.
ACM Trans. Archit. Code Optim., 2021

Sustainable AI: Environmental Implications, Challenges and Opportunities.
CoRR, 2021

MAVFI: An End-to-End Fault Analysis Framework with Anomaly Detection and Recovery for Micro Aerial Vehicles.
CoRR, 2021

Machine Learning-Based Automated Design Space Exploration for Autonomous Aerial Robots.
CoRR, 2021

EdgeBERT: Sentence-Level Energy Optimizations for Latency-Aware Multi-Task NLP Inference.
Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

RecPipe: Co-designing Models and Hardware to Jointly Optimize Recommendation Quality and Performance.
Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

9.8 A 25mm<sup>2</sup> SoC for IoT Devices with 18ms Noise-Robust Speech-to-Text Latency via Bayesian Speech Denoising and Attention-Based Sequence-to-Sequence DNN Speech Recognition in 16nm FinFET.
Proceedings of the IEEE International Solid-State Circuits Conference, 2021

Application-driven Design Exploration for Dense Ferroelectric Embedded Non-volatile Memories.
Proceedings of the IEEE/ACM International Symposium on Low Power Electronics and Design, 2021

Gradient Disaggregation: Breaking Privacy in Federated Learning by Reconstructing the User Participant Matrix.
Proceedings of the 38th International Conference on Machine Learning, 2021

Cheetah: Optimizing and Accelerating Homomorphic Encryption for Private Inference.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2021

SM6: A 16nm System-on-Chip for Accurate and Noise-Robust Attention-Based NLP Applications : The 33<sup>rd</sup> Hot Chips Symposium - August 22-24, 2021.
Proceedings of the IEEE Hot Chips 33 Symposium, 2021

RecSSD: near data processing for solid state drive based recommendation inference.
Proceedings of the ASPLOS '21: 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2021

Towards Automatic and Agile AI/ML Accelerator Design with End-to-End Synthesis.
Proceedings of the 32nd IEEE International Conference on Application-specific Systems, 2021

FlexACC: A Programmable Accelerator with Application-Specific ISA for Flexible Deep Neural Network Inference.
Proceedings of the 32nd IEEE International Conference on Application-specific Systems, 2021

2020
SMAUG: End-to-End Full-Stack Simulation Infrastructure for Deep Learning Workloads.
ACM Trans. Archit. Code Optim., 2020

CHIPKIT: An Agile, Reusable Open-Source Framework for Rapid Test Chip Development.
IEEE Micro, 2020

EdgeBERT: Optimizing On-Chip Inference for Multi-Task NLP.
CoRR, 2020

Cheetah: Optimizations and Methods for PrivacyPreserving Inference via Homomorphic Encryption.
CoRR, 2020

CHIPKIT: An agile, reusable open-source framework for rapid test chip development.
CoRR, 2020

The Sky Is Not the Limit: A Visual Performance Model for Cyber-Physical Co-Design in Autonomous Machines.
IEEE Comput. Archit. Lett., 2020

A 3mm<sup>2</sup> Programmable Bayesian Inference Accelerator for Unsupervised Machine Perception using Parallel Gibbs Sampling in 16nm.
Proceedings of the IEEE Symposium on VLSI Circuits, 2020

A Systematic Methodology for Analysis of Deep Learning Hardware and Software Platforms.
Proceedings of Machine Learning and Systems 2020, 2020


A comprehensive methodology to determine optimal coherence interfaces for many-accelerator SoCs.
Proceedings of the ISLPED '20: ACM/IEEE International Symposium on Low Power Electronics and Design, 2020

RecNMP: Accelerating Personalized Recommendation with Near-Memory Processing.
Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture, 2020

DeepRecSys: A System for Optimizing End-To-End At-Scale Neural Recommendation Inference.
Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture, 2020

Cross-Stack Workload Characterization of Deep Recommendation Systems.
Proceedings of the IEEE International Symposium on Workload Characterization, 2020

SODA: a New Synthesis Infrastructure for Agile Hardware Design of Machine Learning Accelerators.
Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2020

The Architectural Implications of Facebook's DNN-Based Personalized Recommendation.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2020

A Scalable Bayesian Inference Accelerator for Unsupervised Learning.
Proceedings of the IEEE Hot Chips 32 Symposium, 2020

Emerging Neural Workloads and Their Impact on Hardware.
Proceedings of the 2020 Design, Automation & Test in Europe Conference & Exhibition, 2020

Invited: Software Defined Accelerators From Learning Tools Environment.
Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020

Algorithm-Hardware Co-Design of Adaptive Floating-Point Encodings for Resilient Deep Learning Inference.
Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020

2019
Predicting New Workload or CPU Performance by Analyzing Public Datasets.
ACM Trans. Archit. Code Optim., 2019

MEMTI: Optimizing On-Chip Nonvolatile Storage for Visual Multitask Inference at the Edge.
IEEE Micro, 2019

A 16-nm Always-On DNN Processor With Adaptive Clocking and Multi-Cycle Banked SRAMs.
IEEE J. Solid State Circuits, 2019

A binary-activation, multi-level weight RNN and training algorithm for processing-in-memory inference with eNVM.
CoRR, 2019

MLPerf Training Benchmark.
CoRR, 2019

AdaptivFloat: A Floating-point based Data Type for Resilient Deep Learning Inference.
CoRR, 2019

Benchmarking TPU, GPU, and CPU Platforms for Deep Learning.
CoRR, 2019

The Architectural Implications of Facebook's DNN-based Personalized Recommendation.
CoRR, 2019

Determining Optimal Coherency Interface for Many-Accelerator SoCs Using Bayesian Optimization.
IEEE Comput. Archit. Lett., 2019

A 16nm 25mm<sup>2</sup> SoC with a 54.5x Flexibility-Efficiency Range from Dual-Core Arm Cortex-A53 to eFPGA and Cache-Coherent Accelerators.
Proceedings of the 2019 Symposium on VLSI Circuits, Kyoto, Japan, June 9-14, 2019, 2019

CHAMPVis: Comparative Hierarchical Analysis of Microarchitectural Performance.
Proceedings of the IEEE/ACM International Workshop on Programming and Performance Visualization Tools, 2019

MaxNVM: Maximizing DNN Storage Density and Inference Efficiency with Sparse Encoding and Error Mitigation.
Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019

Demystifying Bayesian Inference Workloads.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2019

Application of Approximate Matrix Multiplication to Neural Networks and Distributed SLAM.
Proceedings of the 2019 IEEE High Performance Extreme Computing Conference, 2019

Machine Learning at Facebook: Understanding Inference at the Edge.
Proceedings of the 25th IEEE International Symposium on High Performance Computer Architecture, 2019

Accelerating Bayesian Inference on Structured Graphs Using Parallel Gibbs Sampling.
Proceedings of the 29th International Conference on Field Programmable Logic and Applications, 2019

FlexGibbs: Reconfigurable Parallel Gibbs Sampling Accelerator for Structured Graphs.
Proceedings of the 27th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2019

MASR: A Modular Accelerator for Sparse RNNs.
Proceedings of the 28th International Conference on Parallel Architectures and Compilation Techniques, 2019

2018
Assisting High-Level Synthesis Improve SpMV Benchmark Through Dynamic Dependence Analysis.
IEEE Trans. Circuits Syst. II Express Briefs, 2018

An Area-Efficient 8-Bit Single-Ended ADC With Extended Input Voltage Range.
IEEE Trans. Circuits Syst. II Express Briefs, 2018

DNN Engine: A 28-nm Timing-Error Tolerant Sparse Deep Neural Network Processor for IoT Applications.
IEEE J. Solid State Circuits, 2018

Cloud No Longer a Silver Bullet, Edge to the Rescue.
CoRR, 2018

Co-designed systems for deep learning hardware accelerators.
Proceedings of the 2018 International Symposium on VLSI Design, 2018

Weightless: Lossy weight encoding for deep neural network compression.
Proceedings of the 6th International Conference on Learning Representations, 2018

Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018

A Wide Dynamic Range Sparse FC-DNN Processor with Multi-Cycle Banked SRAM Read and Adaptive Clocking in 16nm FinFET.
Proceedings of the 44th IEEE European Solid State Circuits Conference, 2018

Ares: a framework for quantifying the resilience of deep neural networks.
Proceedings of the 55th Annual Design Automation Conference, 2018

On-chip deep neural network storage with multi-level eNVM.
Proceedings of the 55th Annual Design Automation Conference, 2018

2017
Deep Learning for Computer Architects
Synthesis Lectures on Computer Architecture, Morgan & Claypool Publishers, ISBN: 978-3-031-01756-8, 2017

A 16-Core Voltage-Stacked System With Adaptive Clocking and an Integrated Switched-Capacitor DC-DC Converter.
IEEE Trans. Very Large Scale Integr. Syst., 2017

Cognitive Computing Safety: The New Horizon for Reliability / The Design and Evolution of Deep Learning Workloads.
IEEE Micro, 2017

Ultra-Low-Power Processors.
IEEE Micro, 2017

2017 International Symposium on Computer Architecture Influential Paper Award.
IEEE Micro, 2017

A Fully Integrated Battery-Powered System-on-Chip in 40-nm CMOS for Closed-Loop Control of Insect-Scale Pico-Aerial Vehicle.
IEEE J. Solid State Circuits, 2017

CARB: A C-State Power Management Arbiter for Latency-Critical Workloads.
IEEE Comput. Archit. Lett., 2017

Automatically accelerating non-numerical programs by architecture-compiler co-design.
Commun. ACM, 2017

Methods and infrastructure in the era of accelerator-centric architectures.
Proceedings of the IEEE 60th International Midwest Symposium on Circuits and Systems, 2017

14.3 A 28nm SoC with a 1.2GHz 568nJ/prediction sparse deep-neural-network engine with >0.1 timing error rate tolerance for IoT applications.
Proceedings of the 2017 IEEE International Solid-State Circuits Conference, 2017

21.5 A 3-to-5V input 100Vpp output 57.7mW 0.42% THD+N highly integrated piezoelectric actuator driver.
Proceedings of the 2017 IEEE International Solid-State Circuits Conference, 2017

A case for efficient accelerator design space exploration via Bayesian optimization.
Proceedings of the 2017 IEEE/ACM International Symposium on Low Power Electronics and Design, 2017

Using dynamic dependence analysis to improve the quality of high-level synthesis designs.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2017

Applications of Deep Neural Networks for Ultra Low Power IoT.
Proceedings of the 2017 IEEE International Conference on Computer Design, 2017


Ivory: Early-Stage Design Space Exploration Tool for Integrated Voltage Regulators.
Proceedings of the 54th Annual Design Automation Conference, 2017

Mallacc: Accelerating Memory Allocation.
Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, 2017

Sub-uJ deep neural networks for embedded applications.
Proceedings of the 51st Asilomar Conference on Signals, Systems, and Computers, 2017

2016
Profiling a Warehouse-Scale Computer.
IEEE Micro, 2016

A Fully Integrated Reconfigurable Switched-Capacitor DC-DC Converter With Four Stacked Output Channels for Voltage Stacking Applications.
IEEE J. Solid State Circuits, 2016

Co-designing accelerators and SoC interfaces using gem5-Aladdin.
Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016

Minerva: Enabling Low-Power, Highly-Accurate Deep Neural Network Accelerators.
Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture, 2016

Fathom: reference workloads for modern deep learning methods.
Proceedings of the 2016 IEEE International Symposium on Workload Characterization, 2016

2015
Research Infrastructures for Hardware Accelerators
Synthesis Lectures on Computer Architecture, Morgan & Claypool Publishers, ISBN: 978-3-031-01750-6, 2015

The Aladdin Approach to Accelerator Design and Modeling.
IEEE Micro, 2015

A multi-chip system optimized for insect-scale flapping-wing robots.
Proceedings of the Symposium on VLSI Circuits, 2015

A 16-core voltage-stacked system with an integrated switched-capacitor DC-DC converter.
Proceedings of the Symposium on VLSI Circuits, 2015

Circuit and system design for robotic flying vehicles.
Proceedings of the VLSI Design, Automation and Test, 2015

Addressing the computing technology-capability gap: The coming Golden Age of design.
Proceedings of the 10th IEEE International Conference on Networking, 2015

Quantifying sources of error in McPAT and potential impacts on architectural studies.
Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015

A power electronics unit to drive piezoelectric actuators for flying microrobots.
Proceedings of the 2015 IEEE Custom Integrated Circuits Conference, 2015

HELIX-UP: relaxing program semantics to unleash parallelization.
Proceedings of the 13th Annual IEEE/ACM International Symposium on Code Generation and Optimization, 2015

2014
Evaluating Adaptive Clocking for Supply-Noise Resilience in Battery-Powered Aerial Microrobotic System-on-Chip.
IEEE Trans. Circuits Syst. I Regul. Pap., 2014

Aladdin: A pre-RTL, power-performance accelerator simulator enabling large design space exploration of customized architectures.
Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture, 2014

HELIX-RC: An architecture-compiler co-design for automatic parallelization of irregular programs.
Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture, 2014

MachSuite: Benchmarks for accelerator design and customized architectures.
Proceedings of the 2014 IEEE International Symposium on Workload Characterization, 2014

Tradeoffs between power management and tail latency in warehouse-scale applications.
Proceedings of the 2014 IEEE International Symposium on Workload Characterization, 2014

Multi-accelerator system development with the ShrinkFit acceleration framework.
Proceedings of the 32nd IEEE International Conference on Computer Design, 2014

2013
Shrink-Fit: A Framework for Flexible Accelerator Sizing.
IEEE Comput. Archit. Lett., 2013

ISA-independent workload characterization and its implications for specialized architectures.
Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software, 2013

Characterizing and evaluating voltage noise in multi-core near-threshold processors.
Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED), 2013

Energy characterization and instruction-level energy model of Intel's Xeon Phi processor.
Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED), 2013

Quantifying acceleration: Power/performance trade-offs of application kernels in hardware.
Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED), 2013

Supply-noise resilient adaptive clocking for battery-powered aerial microrobotic System-on-Chip in 40nm CMOS.
Proceedings of the IEEE 2013 Custom Integrated Circuits Conference, 2013

A fully integrated battery-connected switched-capacitor 4: 1 voltage regulator with 70% peak efficiency using bottom-plate charge recycling.
Proceedings of the IEEE 2013 Custom Integrated Circuits Conference, 2013

2012
The accelerator store: A shared memory framework for accelerator-based systems.
ACM Trans. Archit. Code Optim., 2012

Helix: Making the Extraction of Thread-Level Parallelism Mainstream.
IEEE Micro, 2012

A Fully-Integrated 3-Level DC-DC Converter for Nanosecond-Scale DVFS.
IEEE J. Solid State Circuits, 2012

Evaluation of voltage stacking for near-threshold multicore computing.
Proceedings of the International Symposium on Low Power Electronics and Design, 2012

XIOSim: power-performance modeling of mobile x86 cores.
Proceedings of the International Symposium on Low Power Electronics and Design, 2012

The HELIX project: overview and directions.
Proceedings of the 49th Annual Design Automation Conference 2012, 2012

HELIX: automatic parallelization of irregular programs for chip multiprocessing.
Proceedings of the 10th Annual IEEE/ACM International Symposium on Code Generation and Optimization, 2012

2011
Automating Design of Voltage Interpolation to Address Process Variations.
IEEE Trans. Very Large Scale Integr. Syst., 2011

Resilient Architectures via Collaborative Design: Maximizing Commodity Processor Performance in the Presence of Variations.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2011

Voltage Noise in Production Processors.
IEEE Micro, 2011

CPUs, GPUs, and Hybrid Computing.
IEEE Micro, 2011

An Accelerator-Based Wireless Sensor Network Processor in 130 nm CMOS.
IEEE J. Emerg. Sel. Topics Circuits Syst., 2011

A fully-integrated 3-level DC/DC converter for nanosecond-scale DVS with fast shunt regulation.
Proceedings of the IEEE International Solid-State Circuits Conference, 2011

Hardware in the loop for optical flow sensing in a robotic bee.
Proceedings of the 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2011

Achieving uniform performance and maximizing throughput in the presence of heterogeneity.
Proceedings of the 17th International Conference on High-Performance Computer Architecture (HPCA-17 2011), 2011

Implementing a hybrid SRAM / eDRAM NUCA architecture.
Proceedings of the 18th International Conference on High Performance Computing, 2011

Dimetrodon: processor-level preventive thermal management via idle cycle injection.
Proceedings of the 48th Design Automation Conference, 2011

The alarms project: A hardware/software approach to addressing parameter variations.
Proceedings of the 16th Asia South Pacific Design Automation Conference, 2011

2010
Eliminating voltage emergencies via software-guided code transformations.
ACM Trans. Archit. Code Optim., 2010

Applied inference: Case studies in microarchitectural design.
ACM Trans. Archit. Code Optim., 2010

Predicting Voltage Droops Using Recurring Program and Microarchitectural Event Activity.
IEEE Micro, 2010

Can Subthreshold and Near-Threshold Circuits Go Mainstream?
IEEE Micro, 2010

The Accelerator Store framework for high-performance, low-power accelerator-based systems.
IEEE Comput. Archit. Lett., 2010

Voltage Smoothing: Characterizing and Mitigating Voltage Noise in Production Processors via Software-Guided Thread Scheduling.
Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture, 2010

2009
Energy- and area-efficient architectures through application clustering and architectural heterogeneity.
ACM Trans. Archit. Code Optim., 2009

Revival: A Variation-Tolerant Architecture Using Voltage Interpolation and Variable Latency.
IEEE Micro, 2009

Tribeca: design for PVT variations with local recovery and fine-grained adaptation.
Proceedings of the 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009), 2009

Place and route considerations for voltage interpolated designs.
Proceedings of the 10th International Symposium on Quality of Electronic Design (ISQED 2009), 2009

The design of a bloom filter hardware accelerator for ultra low power systems.
Proceedings of the 2009 International Symposium on Low Power Electronics and Design, 2009

Thread motion: fine-grained power management for multi-core systems.
Proceedings of the 36th International Symposium on Computer Architecture (ISCA 2009), 2009

Empirical performance models for 3T1D memories.
Proceedings of the 27th International Conference on Computer Design, 2009

Design and test strategies for microarchitectural post-fabrication tuning.
Proceedings of the 27th International Conference on Computer Design, 2009

Voltage emergency prediction: Using signatures to reduce operating margins.
Proceedings of the 15th International Conference on High-Performance Computer Architecture (HPCA-15 2009), 2009

An event-guided approach to reducing voltage noise in processors.
Proceedings of the Design, Automation and Test in Europe, 2009

Software-assisted hardware reliability: abstracting circuit-level challenges to the software stack.
Proceedings of the 46th Design Automation Conference, 2009

An accelerator-based wireless sensor network processor in 130nm CMOS.
Proceedings of the 2009 International Conference on Compilers, 2009

2008
Replacing 6T SRAMs with 3T1D DRAMs in the L1 Data Cache to Combat Process Variability.
IEEE Micro, 2008

Guest Editors' Introduction: Top Picks from the Computer Architecture Conferences of 2007.
IEEE Micro, 2008

Survey of Hardware Systems for Wireless Sensor Networks.
J. Low Power Electron., 2008

CPR: Composable performance regression for scalable multiprocessor models.
Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-41 2008), 2008

A Process-Variation-Tolerant Floating-Point Unit with Voltage Interpolation and Variable Latency.
Proceedings of the 2008 IEEE International Solid-State Circuits Conference, 2008

Instruction-driven clock scheduling with glitch mitigation.
Proceedings of the 2008 International Symposium on Low Power Electronics and Design, 2008

System design considerations for sensor network applications.
Proceedings of the International Symposium on Circuits and Systems (ISCAS 2008), 2008

Evaluation of voltage interpolation to address process variations.
Proceedings of the 2008 International Conference on Computer-Aided Design, 2008

Roughness of microarchitectural design topologies and its implications for optimization.
Proceedings of the 14th International Conference on High-Performance Computer Architecture (HPCA-14 2008), 2008

System level analysis of fast, per-core DVFS using on-chip switching regulators.
Proceedings of the 14th International Conference on High-Performance Computer Architecture (HPCA-14 2008), 2008

DeCoR: A Delayed Commit and Rollback mechanism for handling inductive noise in processors.
Proceedings of the 14th International Conference on High-Performance Computer Architecture (HPCA-14 2008), 2008

Efficiency trends and limits from comprehensive microarchitectural adaptivity.
Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems, 2008

2007
Spatial Sampling and Regression Strategies.
IEEE Micro, 2007

Power, Thermal, and Reliability Modeling in Nanometer-Scale Microprocessors.
IEEE Micro, 2007

Methods of inference and learning for performance modeling of parallel applications.
Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2007

Process Variation Tolerant 3T1D-Based Cache Architectures.
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-40 2007), 2007

Towards a software approach to mitigate voltage emergencies.
Proceedings of the 2007 International Symposium on Low Power Electronics and Design, 2007

Architectural power models for SRAM and CAM structures based on hybrid analytical/empirical techniques.
Proceedings of the 2007 International Conference on Computer-Aided Design, 2007

Illustrative Design Space Studies with Microarchitectural Regression Models.
Proceedings of the 13st International Conference on High-Performance Computer Architecture (HPCA-13 2007), 2007

Understanding voltage variations in chip multiprocessors using a distributed power-delivery network.
Proceedings of the 2007 Design, Automation and Test in Europe Conference and Exposition, 2007

2006
Dynamic-Compiler-Driven Control for Microprocessor Energy and Performance.
IEEE Micro, 2006

Mitigating the Impact of Process Variations on Processor Register Files and Execution Units.
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-39 2006), 2006

System-on-Chip Architecture Design for Intelligent Sensor Networks.
Proceedings of the Second International Conference on Intelligent Information Hiding and Multimedia Signal Processing (IIH-MSP 2006), 2006

Microarchitecture parameter selection to optimize system performance under process variation.
Proceedings of the 2006 International Conference on Computer-Aided Design, 2006

CMP design space exploration subject to physical constraints.
Proceedings of the 12th International Symposium on High-Performance Computer Architecture, 2006

Efficient architectures through application clustering and architectural heterogeneity.
Proceedings of the 2006 International Conference on Compilers, 2006

Architecture and circuit techniques for low-throughput, energy-constrained systems across technology generations.
Proceedings of the 2006 International Conference on Compilers, 2006

Accurate and efficient regression modeling for microarchitectural performance and power prediction.
Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems, 2006

2005
A Dynamic Compilation Framework for Controlling Microprocessor Energy and Performance.
Proceedings of the 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-38 2005), 2005

Power and thermal effects of SRAM vs. Latch-Mux design styles and clock gating choices.
Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005

An Ultra Low Power System Architecture for Sensor Network Applications.
Proceedings of the 32st International Symposium on Computer Architecture (ISCA 2005), 2005

Performance, Energy, and Thermal Considerations for SMT and CMP Architectures.
Proceedings of the 11th International Conference on High-Performance Computer Architecture (HPCA-11 2005), 2005

2004
Integrated Analysis of Power and Performance for Pipelined Microprocessors.
IEEE Trans. Computers, 2004

Power-performance simulation: design and validation strategies.
SIGMETRICS Perform. Evaluation Rev., 2004

TinyBench: The Case For A Standardized Benchmark Suite for TinyOS Based Wireless Sensor Network Devices.
Proceedings of the 29th Annual IEEE Conference on Local Computer Networks (LCN 2004), 2004

Understanding the energy efficiency of simultaneous multithreading.
Proceedings of the 2004 International Symposium on Low Power Electronics and Design, 2004

Eliminating voltage emergencies via microarchitectural voltage control feedback and dynamic optimization.
Proceedings of the 2004 International Symposium on Low Power Electronics and Design, 2004

Evaluating Techniques for Exploiting Instruction Slack.
Proceedings of the 22nd IEEE International Conference on Computer Design: VLSI in Computers & Processors (ICCD 2004), 2004

2003
New methodology for early-stage, microarchitecture-level power-performance analysis of microprocessors.
IBM J. Res. Dev., 2003

Control Techniques to Eliminate Voltage Emergencies in High Performance Processors.
Proceedings of the Ninth International Symposium on High-Performance Computer Architecture (HPCA'03), 2003

2002
Early-Stage Definition of LPX: A Low Power Issue-Execute Processor.
Proceedings of the Power-Aware Computer Systems, Second International Workshop, 2002

Optimizing pipelines for power and performance.
Proceedings of the 35th Annual International Symposium on Microarchitecture, 2002

2001
Dynamic Thermal Management for High-Performance Microprocessors.
Proceedings of the Seventh International Symposium on High-Performance Computer Architecture (HPCA'01), 2001

A circuit level implementation of an adaptive issue queue for power-aware microprocessors.
Proceedings of the 11th ACM Great Lakes Symposium on VLSI 2001, 2001

2000
Value-based clock gating and operation packing: dynamic strategies for improving processor power and performance.
ACM Trans. Comput. Syst., 2000

Power-Aware Microarchitecture: Design and Modeling Challenges for Next-Generation Microprocessors.
IEEE Micro, 2000

An Adaptive Issue Queue for Reduced Power at High Performance.
Proceedings of the Power-Aware Computer Systems, First International Workshop, 2000

Power-Performance Modeling and Tradeoff Analysis for a High End Microprocessor.
Proceedings of the Power-Aware Computer Systems, First International Workshop, 2000

Wattch: a framework for architectural-level power analysis and optimizations.
Proceedings of the 27th International Symposium on Computer Architecture (ISCA 2000), 2000

1999
Dynamically Exploiting Narrow Width Operands to Improve Processor Power and Performance.
Proceedings of the Fifth International Symposium on High-Performance Computer Architecture, 1999

Implementing Application-Specific Cache-Coherence Protocols in Configurable Hardware.
Proceedings of the Network-Based Parallel Computing: Communication, 1999


  Loading...