Gu-Yeon Wei

Orcid: 0000-0001-5730-9904

According to our database1, Gu-Yeon Wei authored at least 222 papers between 1996 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Silent Data Corruption in Robot Operating System: A Case for End-to-End System-Level Fault Analysis Using Autonomous UAVs.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., April, 2024

Application-level Validation of Accelerator Designs Using a Formal Software/Hardware Interface.
ACM Trans. Design Autom. Electr. Syst., March, 2024

Guac: Energy-Aware and SSA-Based Generation of Coarse-Grained Merged Accelerators from LLVM-IR.
CoRR, 2024

Flash: A Hybrid Private Inference Protocol for Deep CNNs with High Accuracy and Low Latency on CPU.
CoRR, 2024


CAMEL: Co-Designing AI Models and eDRAMs for Efficient On-Device Learning.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2024

GPU-based Private Information Retrieval for On-Device Machine Learning Inference.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

2023
SoCProbe: Compositional Post-Silicon Validation of Heterogeneous NoC-Based SoCs.
IEEE Des. Test, December, 2023

Abisko: Deep codesign of an architecture for spiking neural networks using novel neuromorphic materials.
Int. J. High Perform. Comput. Appl., July, 2023

Early DSE and Automatic Generation of Coarse-grained Merged Accelerators.
ACM Trans. Embed. Comput. Syst., March, 2023

A 16-nm SoC for Noise-Robust Speech and NLP Edge AI Inference With Bayesian Sound Source Separation and Attention-Based DNNs.
IEEE J. Solid State Circuits, February, 2023

A Binary-Activation, Multi-Level Weight RNN and Training Algorithm for ADC-/DAC-Free and Noise-Resilient Processing-in-Memory Inference With eNVM.
IEEE Trans. Emerg. Top. Comput., 2023

Trireme: Exploration of Hierarchical Multi-level Parallelism for Hardware Acceleration.
ACM Trans. Embed. Comput. Syst., 2023

Architectural CO<sub>2</sub> Footprint Tool: Designing Sustainable Computer Systems With an Architectural Carbon Modeling Tool.
IEEE Micro, 2023

Generative AI Beyond LLMs: System Implications of Multi-Modal Generation.
CoRR, 2023

Hardware Resilience Properties of Text-Guided Image Classifiers.
CoRR, 2023

MAD Max Beyond Single-Node: Enabling Large Machine Learning Model Acceleration on Distributed Systems.
CoRR, 2023

Guess & Sketch: Language Model Guided Transpilation.
CoRR, 2023

INT2.1: Towards Fine-Tunable Quantized Large Language Models with Error Correction through Low-Rank Adaptation.
CoRR, 2023

CAMEL: Co-Designing AI Models and Embedded DRAMs for Efficient On-Device Learning.
CoRR, 2023

Design Space Exploration and Optimization for Carbon-Efficient Extended Reality Systems.
CoRR, 2023

PerfSAGE: Generalized Inference Performance Predictor for Arbitrary Deep Learning Models on Edge Devices.
CoRR, 2023

Hardware Resilience Properties of Text-Guided Image Classifiers.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

S<sup>3</sup>: Increasing GPU Utilization during Generative Inference for Higher Throughput.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

A 12nm 18.1TFLOPs/W Sparse Transformer Processor with Entropy-Based Early Exit, Mixed-Precision Predication and Fine-Grained Power Management.
Proceedings of the IEEE International Solid- State Circuits Conference, 2023

Is the Future Cold or Tall? Design Space Exploration of Cryogenic and 3D Embedded Cache Memory.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2023

Characterizing the Scalability of Graph Convolutional Networks on Intel<sup>®</sup> PIUMA.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2023

VelociTI: An Architecture-level Performance Modeling Framework for Trapped Ion Quantum Computers.
Proceedings of the IEEE International Symposium on Workload Characterization, 2023

Carbon-Efficient Design Optimization for Computing Systems.
Proceedings of the 2nd Workshop on Sustainable Computer Systems, 2023

MAVFI: An End-to-End Fault Analysis Framework with Anomaly Detection and Recovery for Micro Aerial Vehicles.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2023

MP-Rec: Hardware-Software Co-design to Enable Multi-path Recommendation.
Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

2022
End-to-End Synthesis of Dynamically Controlled Machine Learning Accelerators.
IEEE Trans. Computers, 2022

Chasing Carbon: The Elusive Environmental Footprint of Computing.
IEEE Micro, 2022

Bridging Python to Silicon: The SODA Toolchain.
IEEE Micro, 2022

SMIV: A 16-nm 25-mm² SoC for IoT With Arm Cortex-A53, eFPGA, and Coherent Accelerators.
IEEE J. Solid State Circuits, 2022

Architectural Implications of Embedding Dimension during GCN on CPU and GPU.
CoRR, 2022

Impala: Low-Latency, Communication-Efficient Private Deep Learning Inference.
CoRR, 2022

Tabula: Efficiently Computing Nonlinear Activation Functions for Secure Neural Network Inference.
CoRR, 2022

Specialized Accelerators and Compiler Flows: Replacing Accelerator APIs with a Formal Software/Hardware Interface.
CoRR, 2022

Trireme: Exploring Hierarchical Multi-Level Parallelism for Domain Specific Hardware Acceleration.
CoRR, 2022

Automatic Domain-Specific SoC Design for Autonomous Unmanned Aerial Vehicles.
Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022

ACT: designing sustainable computer systems with an architectural carbon modeling tool.
Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022

ASAP: automatic synthesis of area-efficient and precision-aware CGRAs.
Proceedings of the ICS '22: 2022 International Conference on Supercomputing, Virtual Event, June 28, 2022

A Scalable Methodology for Agile Chip Development with Open-Source Hardware Components.
Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design, 2022

NVMExplorer: A Framework for Cross-Stack Comparisons of Embedded Non-Volatile Memories.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022

CoopMC: Algorithm-Architecture Co-Optimization for Markov Chain Monte Carlo Accelerators.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022

From High-Level Frameworks to custom Silicon with SODA.
Proceedings of the 2022 IEEE Hot Chips 34 Symposium, 2022

A 12nm Agile-Designed SoC for Swarm-Based Perception with Heterogeneous IP Blocks, a Reconfigurable Memory Hierarchy, and an 800MHz Multi-Plane NoC.
Proceedings of the 48th IEEE European Solid State Circuits Conference, 2022

GoldenEye: A Platform for Evaluating Emerging Numerical Data Formats in DNN Accelerators.
Proceedings of the 52nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2022

OMU: A Probabilistic 3D Occupancy Mapping Accelerator for Real-time OctoMap at the Edge.
Proceedings of the 2022 Design, Automation & Test in Europe Conference & Exhibition, 2022

A joint management middleware to improve training performance of deep recommendation systems with SSDs.
Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022

2021
MAVFI: An End-to-End Fault Analysis Framework with Anomaly Detection and Recovery for Micro Aerial Vehicles.
CoRR, 2021

Quantifying and Maximizing the Benefits of Back-End Noise Adaption on Attention-Based Speech Recognition Models.
CoRR, 2021

Machine Learning-Based Automated Design Space Exploration for Autonomous Aerial Robots.
CoRR, 2021

EdgeBERT: Sentence-Level Energy Optimizations for Latency-Aware Multi-Task NLP Inference.
Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

RecPipe: Co-designing Models and Hardware to Jointly Optimize Recommendation Quality and Performance.
Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

9.8 A 25mm<sup>2</sup> SoC for IoT Devices with 18ms Noise-Robust Speech-to-Text Latency via Bayesian Speech Denoising and Attention-Based Sequence-to-Sequence DNN Speech Recognition in 16nm FinFET.
Proceedings of the IEEE International Solid-State Circuits Conference, 2021

Application-driven Design Exploration for Dense Ferroelectric Embedded Non-volatile Memories.
Proceedings of the IEEE/ACM International Symposium on Low Power Electronics and Design, 2021

Gradient Disaggregation: Breaking Privacy in Federated Learning by Reconstructing the User Participant Matrix.
Proceedings of the 38th International Conference on Machine Learning, 2021

Cheetah: Optimizing and Accelerating Homomorphic Encryption for Private Inference.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2021

SM6: A 16nm System-on-Chip for Accurate and Noise-Robust Attention-Based NLP Applications : The 33<sup>rd</sup> Hot Chips Symposium - August 22-24, 2021.
Proceedings of the IEEE Hot Chips 33 Symposium, 2021

RecSSD: near data processing for solid state drive based recommendation inference.
Proceedings of the ASPLOS '21: 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2021

Towards Automatic and Agile AI/ML Accelerator Design with End-to-End Synthesis.
Proceedings of the 32nd IEEE International Conference on Application-specific Systems, 2021

FlexACC: A Programmable Accelerator with Application-Specific ISA for Flexible Deep Neural Network Inference.
Proceedings of the 32nd IEEE International Conference on Application-specific Systems, 2021

2020
SMAUG: End-to-End Full-Stack Simulation Infrastructure for Deep Learning Workloads.
ACM Trans. Archit. Code Optim., 2020

CHIPKIT: An Agile, Reusable Open-Source Framework for Rapid Test Chip Development.
IEEE Micro, 2020

MLPerf: An Industry Standard Benchmark Suite for Machine Learning Performance.
IEEE Micro, 2020

EdgeBERT: Optimizing On-Chip Inference for Multi-Task NLP.
CoRR, 2020

Cheetah: Optimizations and Methods for PrivacyPreserving Inference via Homomorphic Encryption.
CoRR, 2020

CHIPKIT: An agile, reusable open-source framework for rapid test chip development.
CoRR, 2020

The Sky Is Not the Limit: A Visual Performance Model for Cyber-Physical Co-Design in Autonomous Machines.
IEEE Comput. Archit. Lett., 2020

A 3mm<sup>2</sup> Programmable Bayesian Inference Accelerator for Unsupervised Machine Perception using Parallel Gibbs Sampling in 16nm.
Proceedings of the IEEE Symposium on VLSI Circuits, 2020

A Systematic Methodology for Analysis of Deep Learning Hardware and Software Platforms.
Proceedings of Machine Learning and Systems 2020, 2020


A comprehensive methodology to determine optimal coherence interfaces for many-accelerator SoCs.
Proceedings of the ISLPED '20: ACM/IEEE International Symposium on Low Power Electronics and Design, 2020

DeepRecSys: A System for Optimizing End-To-End At-Scale Neural Recommendation Inference.
Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture, 2020

Cross-Stack Workload Characterization of Deep Recommendation Systems.
Proceedings of the IEEE International Symposium on Workload Characterization, 2020

SODA: a New Synthesis Infrastructure for Agile Hardware Design of Machine Learning Accelerators.
Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2020

A Scalable Bayesian Inference Accelerator for Unsupervised Learning.
Proceedings of the IEEE Hot Chips 32 Symposium, 2020

Invited: Software Defined Accelerators From Learning Tools Environment.
Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020

Algorithm-Hardware Co-Design of Adaptive Floating-Point Encodings for Resilient Deep Learning Inference.
Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020

Structured Compression by Weight Encryption for Unstructured Pruning and Quantization.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019
Predicting New Workload or CPU Performance by Analyzing Public Datasets.
ACM Trans. Archit. Code Optim., 2019

MEMTI: Optimizing On-Chip Nonvolatile Storage for Visual Multitask Inference at the Edge.
IEEE Micro, 2019

A 16-nm Always-On DNN Processor With Adaptive Clocking and Multi-Cycle Banked SRAMs.
IEEE J. Solid State Circuits, 2019

A binary-activation, multi-level weight RNN and training algorithm for processing-in-memory inference with eNVM.
CoRR, 2019

MLPerf Training Benchmark.
CoRR, 2019

AdaptivFloat: A Floating-point based Data Type for Resilient Deep Learning Inference.
CoRR, 2019

Benchmarking TPU, GPU, and CPU Platforms for Deep Learning.
CoRR, 2019

Learning Low-Rank Approximation for CNNs.
CoRR, 2019

Structured Compression by Unstructured Pruning for Sparse Quantized Neural Networks.
CoRR, 2019

Network Pruning for Low-Rank Binary Indexing.
CoRR, 2019

Determining Optimal Coherency Interface for Many-Accelerator SoCs Using Bayesian Optimization.
IEEE Comput. Archit. Lett., 2019

A 16nm 25mm<sup>2</sup> SoC with a 54.5x Flexibility-Efficiency Range from Dual-Core Arm Cortex-A53 to eFPGA and Cache-Coherent Accelerators.
Proceedings of the 2019 Symposium on VLSI Circuits, Kyoto, Japan, June 9-14, 2019, 2019

CHAMPVis: Comparative Hierarchical Analysis of Microarchitectural Performance.
Proceedings of the IEEE/ACM International Workshop on Programming and Performance Visualization Tools, 2019

MaxNVM: Maximizing DNN Storage Density and Inference Efficiency with Sparse Encoding and Error Mitigation.
Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019

Demystifying Bayesian Inference Workloads.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2019

Accelerating Bayesian Inference on Structured Graphs Using Parallel Gibbs Sampling.
Proceedings of the 29th International Conference on Field Programmable Logic and Applications, 2019

FlexGibbs: Reconfigurable Parallel Gibbs Sampling Accelerator for Structured Graphs.
Proceedings of the 27th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2019

MASR: A Modular Accelerator for Sparse RNNs.
Proceedings of the 28th International Conference on Parallel Architectures and Compilation Techniques, 2019

2018
Assisting High-Level Synthesis Improve SpMV Benchmark Through Dynamic Dependence Analysis.
IEEE Trans. Circuits Syst. II Express Briefs, 2018

An Area-Efficient 8-Bit Single-Ended ADC With Extended Input Voltage Range.
IEEE Trans. Circuits Syst. II Express Briefs, 2018

DNN Engine: A 28-nm Timing-Error Tolerant Sparse Deep Neural Network Processor for IoT Applications.
IEEE J. Solid State Circuits, 2018

Cloud No Longer a Silver Bullet, Edge to the Rescue.
CoRR, 2018

Weightless: Lossy weight encoding for deep neural network compression.
Proceedings of the 6th International Conference on Learning Representations, 2018

A Wide Dynamic Range Sparse FC-DNN Processor with Multi-Cycle Banked SRAM Read and Adaptive Clocking in 16nm FinFET.
Proceedings of the 44th IEEE European Solid State Circuits Conference, 2018

Ares: a framework for quantifying the resilience of deep neural networks.
Proceedings of the 55th Annual Design Automation Conference, 2018

On-chip deep neural network storage with multi-level eNVM.
Proceedings of the 55th Annual Design Automation Conference, 2018

2017
Deep Learning for Computer Architects
Synthesis Lectures on Computer Architecture, Morgan & Claypool Publishers, ISBN: 978-3-031-01756-8, 2017

A 16-Core Voltage-Stacked System With Adaptive Clocking and an Integrated Switched-Capacitor DC-DC Converter.
IEEE Trans. Very Large Scale Integr. Syst., 2017

Cognitive Computing Safety: The New Horizon for Reliability / The Design and Evolution of Deep Learning Workloads.
IEEE Micro, 2017

A Fully Integrated Battery-Powered System-on-Chip in 40-nm CMOS for Closed-Loop Control of Insect-Scale Pico-Aerial Vehicle.
IEEE J. Solid State Circuits, 2017

Automatically accelerating non-numerical programs by architecture-compiler co-design.
Commun. ACM, 2017

Methods and infrastructure in the era of accelerator-centric architectures.
Proceedings of the IEEE 60th International Midwest Symposium on Circuits and Systems, 2017

14.3 A 28nm SoC with a 1.2GHz 568nJ/prediction sparse deep-neural-network engine with >0.1 timing error rate tolerance for IoT applications.
Proceedings of the 2017 IEEE International Solid-State Circuits Conference, 2017

21.5 A 3-to-5V input 100Vpp output 57.7mW 0.42% THD+N highly integrated piezoelectric actuator driver.
Proceedings of the 2017 IEEE International Solid-State Circuits Conference, 2017

A case for efficient accelerator design space exploration via Bayesian optimization.
Proceedings of the 2017 IEEE/ACM International Symposium on Low Power Electronics and Design, 2017

Using dynamic dependence analysis to improve the quality of high-level synthesis designs.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2017

Applications of Deep Neural Networks for Ultra Low Power IoT.
Proceedings of the 2017 IEEE International Conference on Computer Design, 2017


Ivory: Early-Stage Design Space Exploration Tool for Integrated Voltage Regulators.
Proceedings of the 54th Annual Design Automation Conference, 2017

Mallacc: Accelerating Memory Allocation.
Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, 2017

Sub-uJ deep neural networks for embedded applications.
Proceedings of the 51st Asilomar Conference on Signals, Systems, and Computers, 2017

2016
Profiling a Warehouse-Scale Computer.
IEEE Micro, 2016

A Fully Integrated Reconfigurable Switched-Capacitor DC-DC Converter With Four Stacked Output Channels for Voltage Stacking Applications.
IEEE J. Solid State Circuits, 2016

Co-designing accelerators and SoC interfaces using gem5-Aladdin.
Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016

Minerva: Enabling Low-Power, Highly-Accurate Deep Neural Network Accelerators.
Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture, 2016

Fathom: reference workloads for modern deep learning methods.
Proceedings of the 2016 IEEE International Symposium on Workload Characterization, 2016

2015
The Aladdin Approach to Accelerator Design and Modeling.
IEEE Micro, 2015

A multi-chip system optimized for insect-scale flapping-wing robots.
Proceedings of the Symposium on VLSI Circuits, 2015

A 16-core voltage-stacked system with an integrated switched-capacitor DC-DC converter.
Proceedings of the Symposium on VLSI Circuits, 2015

Quantifying sources of error in McPAT and potential impacts on architectural studies.
Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015

A power electronics unit to drive piezoelectric actuators for flying microrobots.
Proceedings of the 2015 IEEE Custom Integrated Circuits Conference, 2015

HELIX-UP: relaxing program semantics to unleash parallelization.
Proceedings of the 13th Annual IEEE/ACM International Symposium on Code Generation and Optimization, 2015

2014
ADC-Based Backplane Receiver Design-Space Exploration.
IEEE Trans. Very Large Scale Integr. Syst., 2014

Evaluating Adaptive Clocking for Supply-Noise Resilience in Battery-Powered Aerial Microrobotic System-on-Chip.
IEEE Trans. Circuits Syst. I Regul. Pap., 2014

Aladdin: A pre-RTL, power-performance accelerator simulator enabling large design space exploration of customized architectures.
Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture, 2014

HELIX-RC: An architecture-compiler co-design for automatic parallelization of irregular programs.
Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture, 2014

MachSuite: Benchmarks for accelerator design and customized architectures.
Proceedings of the 2014 IEEE International Symposium on Workload Characterization, 2014

Tradeoffs between power management and tail latency in warehouse-scale applications.
Proceedings of the 2014 IEEE International Symposium on Workload Characterization, 2014

Multi-accelerator system development with the ShrinkFit acceleration framework.
Proceedings of the 32nd IEEE International Conference on Computer Design, 2014

2013
Shrink-Fit: A Framework for Flexible Accelerator Sizing.
IEEE Comput. Archit. Lett., 2013

Characterizing and evaluating voltage noise in multi-core near-threshold processors.
Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED), 2013

Quantifying acceleration: Power/performance trade-offs of application kernels in hardware.
Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED), 2013

Supply-noise resilient adaptive clocking for battery-powered aerial microrobotic System-on-Chip in 40nm CMOS.
Proceedings of the IEEE 2013 Custom Integrated Circuits Conference, 2013

A fully integrated battery-connected switched-capacitor 4: 1 voltage regulator with 70% peak efficiency using bottom-plate charge recycling.
Proceedings of the IEEE 2013 Custom Integrated Circuits Conference, 2013

2012
The accelerator store: A shared memory framework for accelerator-based systems.
ACM Trans. Archit. Code Optim., 2012

Helix: Making the Extraction of Thread-Level Parallelism Mainstream.
IEEE Micro, 2012

A Fully-Integrated 3-Level DC-DC Converter for Nanosecond-Scale DVFS.
IEEE J. Solid State Circuits, 2012

Evaluation of voltage stacking for near-threshold multicore computing.
Proceedings of the International Symposium on Low Power Electronics and Design, 2012

XIOSim: power-performance modeling of mobile x86 cores.
Proceedings of the International Symposium on Low Power Electronics and Design, 2012

The HELIX project: overview and directions.
Proceedings of the 49th Annual Design Automation Conference 2012, 2012

HELIX: automatic parallelization of irregular programs for chip multiprocessing.
Proceedings of the 10th Annual IEEE/ACM International Symposium on Code Generation and Optimization, 2012

2011
Automating Design of Voltage Interpolation to Address Process Variations.
IEEE Trans. Very Large Scale Integr. Syst., 2011

Voltage Noise in Production Processors.
IEEE Micro, 2011

An Accelerator-Based Wireless Sensor Network Processor in 130 nm CMOS.
IEEE J. Emerg. Sel. Topics Circuits Syst., 2011

A fully-integrated 3-level DC/DC converter for nanosecond-scale DVS with fast shunt regulation.
Proceedings of the IEEE International Solid-State Circuits Conference, 2011

Hardware in the loop for optical flow sensing in a robotic bee.
Proceedings of the 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2011

Achieving uniform performance and maximizing throughput in the presence of heterogeneity.
Proceedings of the 17th International Conference on High-Performance Computer Architecture (HPCA-17 2011), 2011

Area efficient phase calibration of a 1.6 GHz multiphase DLL.
Proceedings of the 2011 IEEE Custom Integrated Circuits Conference, 2011

2010
Eliminating voltage emergencies via software-guided code transformations.
ACM Trans. Archit. Code Optim., 2010

Predicting Voltage Droops Using Recurring Program and Microarchitectural Event Activity.
IEEE Micro, 2010

The Accelerator Store framework for high-performance, low-power accelerator-based systems.
IEEE Comput. Archit. Lett., 2010

Voltage Smoothing: Characterizing and Mitigating Voltage Noise in Production Processors via Software-Guided Thread Scheduling.
Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture, 2010

Energetics of flapping-wing robotic insects: towards autonomous hovering flight.
Proceedings of the 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2010

2009
Revival: A Variation-Tolerant Architecture Using Voltage Interpolation and Variable Latency.
IEEE Micro, 2009

An 8×5 Gb/s Parallel Receiver With Collaborative Timing Recovery.
IEEE J. Solid State Circuits, 2009

Tribeca: design for PVT variations with local recovery and fine-grained adaptation.
Proceedings of the 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009), 2009

Place and route considerations for voltage interpolated designs.
Proceedings of the 10th International Symposium on Quality of Electronic Design (ISQED 2009), 2009

Thread motion: fine-grained power management for multi-core systems.
Proceedings of the 36th International Symposium on Computer Architecture (ISCA 2009), 2009

Milligram-scale high-voltage power electronics for piezoelectric microrobots.
Proceedings of the 2009 IEEE International Conference on Robotics and Automation, 2009

Empirical performance models for 3T1D memories.
Proceedings of the 27th International Conference on Computer Design, 2009

Design and test strategies for microarchitectural post-fabrication tuning.
Proceedings of the 27th International Conference on Computer Design, 2009

Voltage emergency prediction: Using signatures to reduce operating margins.
Proceedings of the 15th International Conference on High-Performance Computer Architecture (HPCA-15 2009), 2009

An event-guided approach to reducing voltage noise in processors.
Proceedings of the Design, Automation and Test in Europe, 2009

Software-assisted hardware reliability: abstracting circuit-level challenges to the software stack.
Proceedings of the 46th Design Automation Conference, 2009

Digital wireline and PLL techniques.
Proceedings of the IEEE Custom Integrated Circuits Conference, 2009

Design-space exploration of backplane receivers with high-speed ADCs and digital equalization.
Proceedings of the IEEE Custom Integrated Circuits Conference, 2009

An accelerator-based wireless sensor network processor in 130nm CMOS.
Proceedings of the 2009 International Conference on Compilers, 2009

2008
Replacing 6T SRAMs with 3T1D DRAMs in the L1 Data Cache to Combat Process Variability.
IEEE Micro, 2008

A High-Throughput Maximum a Posteriori Probability Detector.
IEEE J. Solid State Circuits, 2008

A Highly Digital MDLL-Based Clock Multiplier That Leverages a Self-Scrambling Time-to-Digital Converter to Achieve Subpicosecond Jitter Performance.
IEEE J. Solid State Circuits, 2008

A Wide-Tracking Range Clock and Data Recovery Circuit.
IEEE J. Solid State Circuits, 2008

A Sub-Picosecond Resolution 0.5-1.5 GHz Digital-to-Phase Converter.
IEEE J. Solid State Circuits, 2008

Survey of Hardware Systems for Wireless Sensor Networks.
J. Low Power Electron., 2008

A Process-Variation-Tolerant Floating-Point Unit with Voltage Interpolation and Variable Latency.
Proceedings of the 2008 IEEE International Solid-State Circuits Conference, 2008

An 8×3.2Gb/s Parallel Receiver with Collaborative Timing Recovery.
Proceedings of the 2008 IEEE International Solid-State Circuits Conference, 2008

Instruction-driven clock scheduling with glitch mitigation.
Proceedings of the 2008 International Symposium on Low Power Electronics and Design, 2008

Design of low-power short-distance opto-electronic transceiver front-ends with scalable supply voltages and frequencies.
Proceedings of the 2008 International Symposium on Low Power Electronics and Design, 2008

System design considerations for sensor network applications.
Proceedings of the International Symposium on Circuits and Systems (ISCAS 2008), 2008

A review of actuation and power electronics options for flapping-wing robotic insects.
Proceedings of the 2008 IEEE International Conference on Robotics and Automation, 2008

Evaluation of voltage interpolation to address process variations.
Proceedings of the 2008 International Conference on Computer-Aided Design, 2008

System level analysis of fast, per-core DVFS using on-chip switching regulators.
Proceedings of the 14th International Conference on High-Performance Computer Architecture (HPCA-14 2008), 2008

DeCoR: A Delayed Commit and Rollback mechanism for handling inductive noise in processors.
Proceedings of the 14th International Conference on High-Performance Computer Architecture (HPCA-14 2008), 2008

A 12.5-Gbps, 7-bit transmit DAC with 4-tap LUT-based equalization in 0.13μm CMOS.
Proceedings of the IEEE 2008 Custom Integrated Circuits Conference, 2008

A 8×5 Gb/s source-synchronous receiver with clock generator phase error correction.
Proceedings of the IEEE 2008 Custom Integrated Circuits Conference, 2008

2007
Process Variation Tolerant 3T1D-Based Cache Architectures.
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-40 2007), 2007

Towards a software approach to mitigate voltage emergencies.
Proceedings of the 2007 International Symposium on Low Power Electronics and Design, 2007

Serial Sum-Product Architecture for Low-Density Parity-Check Codes.
Proceedings of the 16th International Conference on Computer Communications and Networks, 2007

A Bit-Node Centric Architecture for Low-Density Parity-Check Decoders.
Proceedings of the Global Communications Conference, 2007

Understanding voltage variations in chip multiprocessors using a distributed power-delivery network.
Proceedings of the 2007 Design, Automation and Test in Europe Conference and Exposition, 2007

Digitally-Enhanced Phase-Locking Circuits.
Proceedings of the IEEE 2007 Custom Integrated Circuits Conference, 2007

A Comprehensive Phase-Transfer Model for Delay-Locked Loops.
Proceedings of the IEEE 2007 Custom Integrated Circuits Conference, 2007

2006
System-on-Chip Architecture Design for Intelligent Sensor Networks.
Proceedings of the Second International Conference on Intelligent Information Hiding and Multimedia Signal Processing (IIH-MSP 2006), 2006

Adaptive-Bandwidth Mixing PLL/DLL Based Multi-Phase Clock Generator for Optimal Jitter Performance.
Proceedings of the IEEE 2006 Custom Integrated Circuits Conference, 2006

Phase Mismatch Detection and Compensation for PLL/DLL Based Multi-Phase Clock Generator.
Proceedings of the IEEE 2006 Custom Integrated Circuits Conference, 2006

Pulsenet - A Parallel Flash Sampler and Digital Processor IC for Optical SETI.
Proceedings of the IEEE 2006 Custom Integrated Circuits Conference, 2006

A 1.6Gbps Digital Clock and Data Recovery Circuit.
Proceedings of the IEEE 2006 Custom Integrated Circuits Conference, 2006

Architecture and circuit techniques for low-throughput, energy-constrained systems across technology generations.
Proceedings of the 2006 International Conference on Compilers, 2006

2005
An Ultra Low Power System Architecture for Sensor Network Applications.
Proceedings of the 32st International Symposium on Computer Architecture (ISCA 2005), 2005

Exploring the Design Space of Power-Aware Opto-Electronic Networked Systems.
Proceedings of the 11th International Conference on High-Performance Computer Architecture (HPCA-11 2005), 2005

2004
Pipelined parallel architecture for high throughput MAP detectors.
Proceedings of the 2004 International Symposium on Circuits and Systems, 2004

Jitter in high-speed serial and parallel links.
Proceedings of the 2004 International Symposium on Circuits and Systems, 2004

A mixed PLL/DLL architecture for low jitter clock generation.
Proceedings of the 2004 International Symposium on Circuits and Systems, 2004

2003
Guest editorial.
IEEE Trans. Circuits Syst. II Express Briefs, 2003

Design of CMOS adaptive-bandwidth PLL/DLLs: a general approach.
IEEE Trans. Circuits Syst. II Express Briefs, 2003

Analysis of PLL clock jitter in high-speed serial links.
IEEE Trans. Circuits Syst. II Express Briefs, 2003

An adaptive PAM-4 5-Gb/s backplane transceiver in 0.25-μm CMOS.
IEEE J. Solid State Circuits, 2003

2002
An adaptive PAM-4 5 Gb/s backplane transceiver in 0.25 μm CMOS.
Proceedings of the IEEE 2002 Custom Integrated Circuits Conference, 2002

2000
A variable-frequency parallel I/O interface with adaptive power-supply regulation.
IEEE J. Solid State Circuits, 2000

1999
A fully digital, energy-efficient, adaptive power-supply regulator.
IEEE J. Solid State Circuits, 1999

1996
A low power switching power supply for self-clocked systems.
Proceedings of the 1996 International Symposium on Low Power Electronics and Design, 1996


  Loading...