Gu-Yeon Wei

According to our database1, Gu-Yeon Wei authored at least 153 papers between 1996 and 2020.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Other 

Links

On csauthors.net:

Bibliography

2020
SMAUG: End-to-End Full-Stack Simulation Infrastructure for Deep Learning Workloads.
ACM Trans. Archit. Code Optim., 2020

CHIPKIT: An Agile, Reusable Open-Source Framework for Rapid Test Chip Development.
IEEE Micro, 2020

MLPerf: An Industry Standard Benchmark Suite for Machine Learning Performance.
IEEE Micro, 2020

EdgeBERT: Optimizing On-Chip Inference for Multi-Task NLP.
CoRR, 2020

Chasing Carbon: The Elusive Environmental Footprint of Computing.
CoRR, 2020

Cheetah: Optimizations and Methods for PrivacyPreserving Inference via Homomorphic Encryption.
CoRR, 2020

CHIPKIT: An agile, reusable open-source framework for rapid test chip development.
CoRR, 2020

DeepRecSys: A System for Optimizing End-To-End At-scale Neural Recommendation Inference.
CoRR, 2020

The Sky Is Not the Limit: A Visual Performance Model for Cyber-Physical Co-Design in Autonomous Machines.
IEEE Comput. Archit. Lett., 2020

A 3mm<sup>2</sup> Programmable Bayesian Inference Accelerator for Unsupervised Machine Perception using Parallel Gibbs Sampling in 16nm.
Proceedings of the IEEE Symposium on VLSI Circuits, 2020

A Systematic Methodology for Analysis of Deep Learning Hardware and Software Platforms.
Proceedings of Machine Learning and Systems 2020, 2020


A comprehensive methodology to determine optimal coherence interfaces for many-accelerator SoCs.
Proceedings of the ISLPED '20: ACM/IEEE International Symposium on Low Power Electronics and Design, 2020

DeepRecSys: A System for Optimizing End-To-End At-Scale Neural Recommendation Inference.
Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture, 2020

Cross-Stack Workload Characterization of Deep Recommendation Systems.
Proceedings of the IEEE International Symposium on Workload Characterization, 2020

SODA: a New Synthesis Infrastructure for Agile Hardware Design of Machine Learning Accelerators.
Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2020

A Scalable Bayesian Inference Accelerator for Unsupervised Learning.
Proceedings of the IEEE Hot Chips 32 Symposium, 2020

Invited: Software Defined Accelerators From Learning Tools Environment.
Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020

Algorithm-Hardware Co-Design of Adaptive Floating-Point Encodings for Resilient Deep Learning Inference.
Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020

Structured Compression by Weight Encryption for Unstructured Pruning and Quantization.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019
Predicting New Workload or CPU Performance by Analyzing Public Datasets.
ACM Trans. Archit. Code Optim., 2019

MEMTI: Optimizing On-Chip Nonvolatile Storage for Visual Multitask Inference at the Edge.
IEEE Micro, 2019

A 16-nm Always-On DNN Processor With Adaptive Clocking and Multi-Cycle Banked SRAMs.
IEEE J. Solid State Circuits, 2019

SMAUG: End-to-End Full-Stack Simulation Infrastructure for Deep Learning Workloads.
CoRR, 2019

A binary-activation, multi-level weight RNN and training algorithm for processing-in-memory inference with eNVM.
CoRR, 2019

MLPerf Training Benchmark.
CoRR, 2019

AdaptivFloat: A Floating-point based Data Type for Resilient Deep Learning Inference.
CoRR, 2019

Benchmarking TPU, GPU, and CPU Platforms for Deep Learning.
CoRR, 2019

Learning Low-Rank Approximation for CNNs.
CoRR, 2019

Structured Compression by Unstructured Pruning for Sparse Quantized Neural Networks.
CoRR, 2019

Network Pruning for Low-Rank Binary Indexing.
CoRR, 2019

Determining Optimal Coherency Interface for Many-Accelerator SoCs Using Bayesian Optimization.
IEEE Comput. Archit. Lett., 2019

A 16nm 25mm<sup>2</sup> SoC with a 54.5x Flexibility-Efficiency Range from Dual-Core Arm Cortex-A53 to eFPGA and Cache-Coherent Accelerators.
Proceedings of the 2019 Symposium on VLSI Circuits, Kyoto, Japan, June 9-14, 2019, 2019

MaxNVM: Maximizing DNN Storage Density and Inference Efficiency with Sparse Encoding and Error Mitigation.
Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019

Demystifying Bayesian Inference Workloads.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2019

Accelerating Bayesian Inference on Structured Graphs Using Parallel Gibbs Sampling.
Proceedings of the 29th International Conference on Field Programmable Logic and Applications, 2019

FlexGibbs: Reconfigurable Parallel Gibbs Sampling Accelerator for Structured Graphs.
Proceedings of the 27th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2019

MASR: A Modular Accelerator for Sparse RNNs.
Proceedings of the 28th International Conference on Parallel Architectures and Compilation Techniques, 2019

2018
Assisting High-Level Synthesis Improve SpMV Benchmark Through Dynamic Dependence Analysis.
IEEE Trans. Circuits Syst. II Express Briefs, 2018

An Area-Efficient 8-Bit Single-Ended ADC With Extended Input Voltage Range.
IEEE Trans. Circuits Syst. II Express Briefs, 2018

DNN Engine: A 28-nm Timing-Error Tolerant Sparse Deep Neural Network Processor for IoT Applications.
IEEE J. Solid State Circuits, 2018

Cloud No Longer a Silver Bullet, Edge to the Rescue.
CoRR, 2018

Weightless: Lossy weight encoding for deep neural network compression.
Proceedings of the 6th International Conference on Learning Representations, 2018

A Wide Dynamic Range Sparse FC-DNN Processor with Multi-Cycle Banked SRAM Read and Adaptive Clocking in 16nm FinFET.
Proceedings of the 44th IEEE European Solid State Circuits Conference, 2018

Ares: a framework for quantifying the resilience of deep neural networks.
Proceedings of the 55th Annual Design Automation Conference, 2018

On-chip deep neural network storage with multi-level eNVM.
Proceedings of the 55th Annual Design Automation Conference, 2018

2017
Deep Learning for Computer Architects
Synthesis Lectures on Computer Architecture, Morgan & Claypool Publishers, 2017

A 16-Core Voltage-Stacked System With Adaptive Clocking and an Integrated Switched-Capacitor DC-DC Converter.
IEEE Trans. Very Large Scale Integr. Syst., 2017

Cognitive Computing Safety: The New Horizon for Reliability / The Design and Evolution of Deep Learning Workloads.
IEEE Micro, 2017

A Fully Integrated Battery-Powered System-on-Chip in 40-nm CMOS for Closed-Loop Control of Insect-Scale Pico-Aerial Vehicle.
IEEE J. Solid State Circuits, 2017

Automatically accelerating non-numerical programs by architecture-compiler co-design.
Commun. ACM, 2017

Methods and infrastructure in the era of accelerator-centric architectures.
Proceedings of the IEEE 60th International Midwest Symposium on Circuits and Systems, 2017

14.3 A 28nm SoC with a 1.2GHz 568nJ/prediction sparse deep-neural-network engine with >0.1 timing error rate tolerance for IoT applications.
Proceedings of the 2017 IEEE International Solid-State Circuits Conference, 2017

21.5 A 3-to-5V input 100Vpp output 57.7mW 0.42% THD+N highly integrated piezoelectric actuator driver.
Proceedings of the 2017 IEEE International Solid-State Circuits Conference, 2017

A case for efficient accelerator design space exploration via Bayesian optimization.
Proceedings of the 2017 IEEE/ACM International Symposium on Low Power Electronics and Design, 2017

Using dynamic dependence analysis to improve the quality of high-level synthesis designs.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2017

Applications of Deep Neural Networks for Ultra Low Power IoT.
Proceedings of the 2017 IEEE International Conference on Computer Design, 2017


Ivory: Early-Stage Design Space Exploration Tool for Integrated Voltage Regulators.
Proceedings of the 54th Annual Design Automation Conference, 2017

Mallacc: Accelerating Memory Allocation.
Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, 2017

Sub-uJ deep neural networks for embedded applications.
Proceedings of the 51st Asilomar Conference on Signals, Systems, and Computers, 2017

2016
Profiling a Warehouse-Scale Computer.
IEEE Micro, 2016

A Fully Integrated Reconfigurable Switched-Capacitor DC-DC Converter With Four Stacked Output Channels for Voltage Stacking Applications.
IEEE J. Solid State Circuits, 2016

Co-designing accelerators and SoC interfaces using gem5-Aladdin.
Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016

Minerva: Enabling Low-Power, Highly-Accurate Deep Neural Network Accelerators.
Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture, 2016

Fathom: reference workloads for modern deep learning methods.
Proceedings of the 2016 IEEE International Symposium on Workload Characterization, 2016

2015
The Aladdin Approach to Accelerator Design and Modeling.
IEEE Micro, 2015

A multi-chip system optimized for insect-scale flapping-wing robots.
Proceedings of the Symposium on VLSI Circuits, 2015

A 16-core voltage-stacked system with an integrated switched-capacitor DC-DC converter.
Proceedings of the Symposium on VLSI Circuits, 2015

Quantifying sources of error in McPAT and potential impacts on architectural studies.
Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015

A power electronics unit to drive piezoelectric actuators for flying microrobots.
Proceedings of the 2015 IEEE Custom Integrated Circuits Conference, 2015

HELIX-UP: relaxing program semantics to unleash parallelization.
Proceedings of the 13th Annual IEEE/ACM International Symposium on Code Generation and Optimization, 2015

2014
ADC-Based Backplane Receiver Design-Space Exploration.
IEEE Trans. Very Large Scale Integr. Syst., 2014

Evaluating Adaptive Clocking for Supply-Noise Resilience in Battery-Powered Aerial Microrobotic System-on-Chip.
IEEE Trans. Circuits Syst. I Regul. Pap., 2014

Aladdin: A pre-RTL, power-performance accelerator simulator enabling large design space exploration of customized architectures.
Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture, 2014

HELIX-RC: An architecture-compiler co-design for automatic parallelization of irregular programs.
Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture, 2014

MachSuite: Benchmarks for accelerator design and customized architectures.
Proceedings of the 2014 IEEE International Symposium on Workload Characterization, 2014

Tradeoffs between power management and tail latency in warehouse-scale applications.
Proceedings of the 2014 IEEE International Symposium on Workload Characterization, 2014

Multi-accelerator system development with the ShrinkFit acceleration framework.
Proceedings of the 32nd IEEE International Conference on Computer Design, 2014

2013
Shrink-Fit: A Framework for Flexible Accelerator Sizing.
IEEE Comput. Archit. Lett., 2013

Characterizing and evaluating voltage noise in multi-core near-threshold processors.
Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED), 2013

Quantifying acceleration: Power/performance trade-offs of application kernels in hardware.
Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED), 2013

Supply-noise resilient adaptive clocking for battery-powered aerial microrobotic System-on-Chip in 40nm CMOS.
Proceedings of the IEEE 2013 Custom Integrated Circuits Conference, 2013

A fully integrated battery-connected switched-capacitor 4: 1 voltage regulator with 70% peak efficiency using bottom-plate charge recycling.
Proceedings of the IEEE 2013 Custom Integrated Circuits Conference, 2013

2012
The accelerator store: A shared memory framework for accelerator-based systems.
ACM Trans. Archit. Code Optim., 2012

Helix: Making the Extraction of Thread-Level Parallelism Mainstream.
IEEE Micro, 2012

A Fully-Integrated 3-Level DC-DC Converter for Nanosecond-Scale DVFS.
IEEE J. Solid State Circuits, 2012

Evaluation of voltage stacking for near-threshold multicore computing.
Proceedings of the International Symposium on Low Power Electronics and Design, 2012

XIOSim: power-performance modeling of mobile x86 cores.
Proceedings of the International Symposium on Low Power Electronics and Design, 2012

The HELIX project: overview and directions.
Proceedings of the 49th Annual Design Automation Conference 2012, 2012

HELIX: automatic parallelization of irregular programs for chip multiprocessing.
Proceedings of the 10th Annual IEEE/ACM International Symposium on Code Generation and Optimization, 2012

2011
Automating Design of Voltage Interpolation to Address Process Variations.
IEEE Trans. Very Large Scale Integr. Syst., 2011

Voltage Noise in Production Processors.
IEEE Micro, 2011

An Accelerator-Based Wireless Sensor Network Processor in 130 nm CMOS.
IEEE J. Emerg. Sel. Topics Circuits Syst., 2011

A fully-integrated 3-level DC/DC converter for nanosecond-scale DVS with fast shunt regulation.
Proceedings of the IEEE International Solid-State Circuits Conference, 2011

Hardware in the loop for optical flow sensing in a robotic bee.
Proceedings of the 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2011

Achieving uniform performance and maximizing throughput in the presence of heterogeneity.
Proceedings of the 17th International Conference on High-Performance Computer Architecture (HPCA-17 2011), 2011

Area efficient phase calibration of a 1.6 GHz multiphase DLL.
Proceedings of the 2011 IEEE Custom Integrated Circuits Conference, 2011

2010
Eliminating voltage emergencies via software-guided code transformations.
ACM Trans. Archit. Code Optim., 2010

Predicting Voltage Droops Using Recurring Program and Microarchitectural Event Activity.
IEEE Micro, 2010

The Accelerator Store framework for high-performance, low-power accelerator-based systems.
IEEE Comput. Archit. Lett., 2010

Voltage Smoothing: Characterizing and Mitigating Voltage Noise in Production Processors via Software-Guided Thread Scheduling.
Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture, 2010

Energetics of flapping-wing robotic insects: towards autonomous hovering flight.
Proceedings of the 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2010

2009
Revival: A Variation-Tolerant Architecture Using Voltage Interpolation and Variable Latency.
IEEE Micro, 2009

An 8×5 Gb/s Parallel Receiver With Collaborative Timing Recovery.
IEEE J. Solid State Circuits, 2009

Tribeca: design for PVT variations with local recovery and fine-grained adaptation.
Proceedings of the 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009), 2009

Place and route considerations for voltage interpolated designs.
Proceedings of the 10th International Symposium on Quality of Electronic Design (ISQED 2009), 2009

Thread motion: fine-grained power management for multi-core systems.
Proceedings of the 36th International Symposium on Computer Architecture (ISCA 2009), 2009

Milligram-scale high-voltage power electronics for piezoelectric microrobots.
Proceedings of the 2009 IEEE International Conference on Robotics and Automation, 2009

Empirical performance models for 3T1D memories.
Proceedings of the 27th International Conference on Computer Design, 2009

Design and test strategies for microarchitectural post-fabrication tuning.
Proceedings of the 27th International Conference on Computer Design, 2009

Voltage emergency prediction: Using signatures to reduce operating margins.
Proceedings of the 15th International Conference on High-Performance Computer Architecture (HPCA-15 2009), 2009

An event-guided approach to reducing voltage noise in processors.
Proceedings of the Design, Automation and Test in Europe, 2009

Software-assisted hardware reliability: abstracting circuit-level challenges to the software stack.
Proceedings of the 46th Design Automation Conference, 2009

Digital wireline and PLL techniques.
Proceedings of the IEEE Custom Integrated Circuits Conference, 2009

Design-space exploration of backplane receivers with high-speed ADCs and digital equalization.
Proceedings of the IEEE Custom Integrated Circuits Conference, 2009

An accelerator-based wireless sensor network processor in 130nm CMOS.
Proceedings of the 2009 International Conference on Compilers, 2009

2008
Replacing 6T SRAMs with 3T1D DRAMs in the L1 Data Cache to Combat Process Variability.
IEEE Micro, 2008

A High-Throughput Maximum a Posteriori Probability Detector.
IEEE J. Solid State Circuits, 2008

A Highly Digital MDLL-Based Clock Multiplier That Leverages a Self-Scrambling Time-to-Digital Converter to Achieve Subpicosecond Jitter Performance.
IEEE J. Solid State Circuits, 2008

A Wide-Tracking Range Clock and Data Recovery Circuit.
IEEE J. Solid State Circuits, 2008

A Sub-Picosecond Resolution 0.5-1.5 GHz Digital-to-Phase Converter.
IEEE J. Solid State Circuits, 2008

Survey of Hardware Systems for Wireless Sensor Networks.
J. Low Power Electron., 2008

A Process-Variation-Tolerant Floating-Point Unit with Voltage Interpolation and Variable Latency.
Proceedings of the 2008 IEEE International Solid-State Circuits Conference, 2008

An 8×3.2Gb/s Parallel Receiver with Collaborative Timing Recovery.
Proceedings of the 2008 IEEE International Solid-State Circuits Conference, 2008

Instruction-driven clock scheduling with glitch mitigation.
Proceedings of the 2008 International Symposium on Low Power Electronics and Design, 2008

Design of low-power short-distance opto-electronic transceiver front-ends with scalable supply voltages and frequencies.
Proceedings of the 2008 International Symposium on Low Power Electronics and Design, 2008

System design considerations for sensor network applications.
Proceedings of the International Symposium on Circuits and Systems (ISCAS 2008), 2008

A review of actuation and power electronics options for flapping-wing robotic insects.
Proceedings of the 2008 IEEE International Conference on Robotics and Automation, 2008

Evaluation of voltage interpolation to address process variations.
Proceedings of the 2008 International Conference on Computer-Aided Design, 2008

System level analysis of fast, per-core DVFS using on-chip switching regulators.
Proceedings of the 14th International Conference on High-Performance Computer Architecture (HPCA-14 2008), 2008

DeCoR: A Delayed Commit and Rollback mechanism for handling inductive noise in processors.
Proceedings of the 14th International Conference on High-Performance Computer Architecture (HPCA-14 2008), 2008

A 12.5-Gbps, 7-bit transmit DAC with 4-tap LUT-based equalization in 0.13μm CMOS.
Proceedings of the IEEE 2008 Custom Integrated Circuits Conference, 2008

A 8×5 Gb/s source-synchronous receiver with clock generator phase error correction.
Proceedings of the IEEE 2008 Custom Integrated Circuits Conference, 2008

2007
Process Variation Tolerant 3T1D-Based Cache Architectures.
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-40 2007), 2007

Towards a software approach to mitigate voltage emergencies.
Proceedings of the 2007 International Symposium on Low Power Electronics and Design, 2007

Serial Sum-Product Architecture for Low-Density Parity-Check Codes.
Proceedings of the 16th International Conference on Computer Communications and Networks, 2007

A Bit-Node Centric Architecture for Low-Density Parity-Check Decoders.
Proceedings of the Global Communications Conference, 2007

Understanding voltage variations in chip multiprocessors using a distributed power-delivery network.
Proceedings of the 2007 Design, Automation and Test in Europe Conference and Exposition, 2007

Digitally-Enhanced Phase-Locking Circuits.
Proceedings of the IEEE 2007 Custom Integrated Circuits Conference, 2007

A Comprehensive Phase-Transfer Model for Delay-Locked Loops.
Proceedings of the IEEE 2007 Custom Integrated Circuits Conference, 2007

2006
System-on-Chip Architecture Design for Intelligent Sensor Networks.
Proceedings of the Second International Conference on Intelligent Information Hiding and Multimedia Signal Processing (IIH-MSP 2006), 2006

Adaptive-Bandwidth Mixing PLL/DLL Based Multi-Phase Clock Generator for Optimal Jitter Performance.
Proceedings of the IEEE 2006 Custom Integrated Circuits Conference, 2006

Phase Mismatch Detection and Compensation for PLL/DLL Based Multi-Phase Clock Generator.
Proceedings of the IEEE 2006 Custom Integrated Circuits Conference, 2006

Pulsenet - A Parallel Flash Sampler and Digital Processor IC for Optical SETI.
Proceedings of the IEEE 2006 Custom Integrated Circuits Conference, 2006

A 1.6Gbps Digital Clock and Data Recovery Circuit.
Proceedings of the IEEE 2006 Custom Integrated Circuits Conference, 2006

Architecture and circuit techniques for low-throughput, energy-constrained systems across technology generations.
Proceedings of the 2006 International Conference on Compilers, 2006

2005
An Ultra Low Power System Architecture for Sensor Network Applications.
Proceedings of the 32st International Symposium on Computer Architecture (ISCA 2005), 2005

Exploring the Design Space of Power-Aware Opto-Electronic Networked Systems.
Proceedings of the 11th International Conference on High-Performance Computer Architecture (HPCA-11 2005), 2005

2004
Pipelined parallel architecture for high throughput MAP detectors.
Proceedings of the 2004 International Symposium on Circuits and Systems, 2004

Jitter in high-speed serial and parallel links.
Proceedings of the 2004 International Symposium on Circuits and Systems, 2004

A mixed PLL/DLL architecture for low jitter clock generation.
Proceedings of the 2004 International Symposium on Circuits and Systems, 2004

1996
A low power switching power supply for self-clocked systems.
Proceedings of the 1996 International Symposium on Low Power Electronics and Design, 1996


  Loading...