Nam Sung Kim

Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2021

Doing more with less: training large DNN models on commodity servers for the masses.

[BibT_eX]

[DOI]

Proceedings of the HotOS '21: Workshop on Hot Topics in Operating Systems, 2021

Aquabolt-XL: Samsung HBM2-PIM with in-memory processing for ML accelerators and beyond.

[BibT_eX]

[DOI]

Proceedings of the IEEE Hot Chips 33 Symposium, 2021

DiAG: a dataflow-inspired architecture for general-purpose processors.

[BibT_eX]

[DOI]

Dong Kai Wang

Proceedings of the ASPLOS '21: 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2021

2020

Errata to "Exploring Fault-Tolerant Erasure Codes for Scalable All-Flash Array Clusters".

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2020

Design and Implementation of SSD-Assisted Backup and Recovery for Database Systems.

[BibT_eX]

[DOI]

IEEE Trans. Knowl. Data Eng., 2020

IOCA: High-Speed I/O-Aware LLC Management for Network-Centric Multi-Tenant Platform.

[BibT_eX]

[DOI]

Tsung-Yuan Charlie Tai

CoRR, 2020

FastDrain: Removing Page Victimization Overheads in NVMe Storage Stack.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2020

Network Packet Processing Mode-Aware Power Management for Data Center Servers.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2020

Leveraging Dynamic Partial Reconfiguration with Scalable ILP Based Task Scheduling.

[BibT_eX]

[DOI]

Proceedings of the 33rd International Conference on VLSI Design and 19th International Conference on Embedded Systems, 2020

Planaria: Dynamic Architecture Fission for Spatial Multi-Tenant Acceleration of Deep Neural Networks.

[BibT_eX]

[DOI]

Brahmendra Reddy Yatham

Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020

FReaC Cache: Folded-logic Reconfigurable Computing in the Last Level Cache.

[BibT_eX]

[DOI]

Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020

22.1 A 1.1V 16GB 640GB/s HBM2E DRAM with a Data-Bus Window-Extension Technique and a Synergetic On-Die ECC Scheme.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Solid- State Circuits Conference, 2020

Data Direct I/O Characterization for Future I/O System Exploration.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2020

Bit-Parallel Vector Composability for Neural Acceleration.

[BibT_eX]

[DOI]

Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020

Mixed-Signal Charge-Domain Acceleration of Deep Neural Networks through Interleaved Bit-Partitioned Arithmetic.

[BibT_eX]

[DOI]

Proceedings of the PACT '20: International Conference on Parallel Architectures and Compilation Techniques, 2020

2019

Exploring Fault-Tolerant Erasure Codes for Scalable All-Flash Array Clusters.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2019

An Efficient GPU Cache Architecture for Applications with Irregular Memory Access Patterns.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2019

An Energy-Efficient Programmable Mixed-Signal Accelerator for Machine Learning Algorithms.

[BibT_eX]

[DOI]

IEEE Micro, 2019

Mixed-Signal Charge-Domain Acceleration of Deep Neural networks through Interleaved Bit-Partitioned Arithmetic.

[BibT_eX]

[DOI]

CoRR, 2019

Exploiting OS-Level Memory Offlining for DRAM Power Management.

[BibT_eX]

[DOI]

Seunghak Lee

Daehoon Kim

IEEE Comput. Archit. Lett., 2019

Ghost routers: energy-efficient asymmetric multicore processors with symmetric NoCs.

[BibT_eX]

[DOI]

Proceedings of the 13th IEEE/ACM International Symposium on Networks-on-Chip, 2019

NetDIMM: Low-Latency Near-Memory Network Interface Architecture.

[BibT_eX]

[DOI]

Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019

SMART: STT-MRAM architecture for smart activation and sensing.

[BibT_eX]

[DOI]

Proceedings of the International Symposium on Memory Systems, 2019

Near-Memory and In-Storage FPGA Acceleration for Emerging Cognitive Computing Workloads.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE Computer Society Annual Symposium on VLSI, 2019

A<sup>2</sup>M: Approximate Algebraic Memory Using Polynomials Rings.

[BibT_eX]

[DOI]

Dong Kai Wang

Proceedings of the 2019 IEEE/ACM International Symposium on Low Power Electronics and Design, 2019

AxMemo: hardware-compiler co-design for approximate code memoization.

[BibT_eX]

[DOI]

Proceedings of the 46th International Symposium on Computer Architecture, 2019

LL-PCM: Low-Latency Phase Change Memory Architecture.

[BibT_eX]

[DOI]

Proceedings of the 56th Annual Design Automation Conference 2019, 2019

Practical Near-Data Processing to Evolve Memory and Storage Devices into Mainstream Heterogeneous Computing Systems.

[BibT_eX]

[DOI]

Pankaj Mehra

Proceedings of the 56th Annual Design Automation Conference 2019, 2019

FlatFlash: Exploiting the Byte-Accessibility of SSDs within a Unified Memory-Storage Hierarchy.

[BibT_eX]

[DOI]

Ahmed H. M. O. Abulila

Vikram Sharma Mailthody

Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, 2019

Approximate Ultra-Low Voltage Many-Core Processor Design.

[BibT_eX]

[DOI]

Proceedings of the Approximate Circuits, Methodologies and CAD., 2019

2018

CNFET-Based High Throughput SIMD Architecture.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2018

SiMul: An Algorithm-Driven Approximate Multiplier Design for Machine Learning.

[BibT_eX]

[DOI]

IEEE Micro, 2018

GANAX: A Unified MIMD-SIMD Acceleration for Generative Adversarial Networks.

[BibT_eX]

[DOI]

CoRR, 2018

Semi-Coherent DMA: An Alternative I/O Coherency Management for Embedded Systems.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2018

SimpleSSD: Modeling Solid State Drives for Holistic System Simulation.

[BibT_eX]

[DOI]

Myoungsoo Jung

Jie Zhang

Ahmed H. M. O. Abulila

IEEE Comput. Archit. Lett., 2018

Leveraging Power-Performance Relationship of Energy-Efficient Modern DRAM Devices.

[BibT_eX]

[DOI]

IEEE Access, 2018

FlashShare: Punching Through Server Storage Stack from Kernel to Firmware for Ultra-Low Latency SSDs.

[BibT_eX]

[DOI]

Mahmut Taylan Kandemir

Jihong Kim

Myoungsoo Jung

Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation, 2018

GradiVeQ: Vector Quantization for Bandwidth-Efficient Gradient Aggregation in Distributed CNN Training.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Pipe-SGD: A Decentralized Pipelined SGD Framework for Distributed Deep Net Training.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

A Network-Centric Hardware/Algorithm Co-Design to Accelerate Distributed Training of Deep Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018

Amber*: Enabling Precise Full-System Simulation with Detailed Modeling of All SSD Resources.

[BibT_eX]

[DOI]

Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018

Application-Transparent Near-Memory Processing Architecture with Memory Channel Network.

[BibT_eX]

[DOI]

Seungwon Min

Hadi Asgharimoghaddam

Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018

A load balancing technique for memory channels.

[BibT_eX]

[DOI]

Proceedings of the International Symposium on Memory Systems, 2018

Load-Triggered Warp Approximation on GPU.

[BibT_eX]

[DOI]

Zhenhong Liu

Daniel Wong

Proceedings of the International Symposium on Low Power Electronics and Design, 2018

GANAX: A Unified MIMD-SIMD Acceleration for Generative Adversarial Networks.

[BibT_eX]

[DOI]

Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture, 2018

PROMISE: An End-to-End Design of a Programmable Mixed-Signal Accelerator for Machine-Learning Algorithms.

[BibT_eX]

[DOI]

Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture, 2018

CIAO: Cache Interference-Aware Throughput-Oriented Architecture and Scheduling for GPUs.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

CTA-Aware Prefetching and Scheduling for GPU.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

Simulating PCI-Express Interconnect for Future System Exploration.

[BibT_eX]

[DOI]

Krishna Parasuram Srinivasan

Proceedings of the 2018 IEEE International Symposium on Workload Characterization, 2018

FlexiGAN: An End-to-End Solution for FPGA Acceleration of Generative Adversarial Networks.

[BibT_eX]

[DOI]

Proceedings of the 26th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2018

VIP: Virtual Performance-State for Efficient Power Management of Virtual Machines.

[BibT_eX]

[DOI]

Proceedings of the ACM Symposium on Cloud Computing, 2018

Practical Challenges in Supporting Function in Memory.

[BibT_eX]

[DOI]

Proceedings of the IEEE Asian Solid-State Circuits Conference, 2018

In-DRAM near-data approximate acceleration for GPUs.

[BibT_eX]

[DOI]

Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques, 2018

3D-Xpath: high-density managed DRAM architecture with cost-effective alternative paths for memory transactions.

[BibT_eX]

[DOI]

Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques, 2018

2017

Smart Gait-Aid Glasses for Parkinson's Disease Patients.

[BibT_eX]

[DOI]

IEEE Trans. Biomed. Eng., 2017

Heterogeneous Computing Meets Near-Memory Acceleration and High-Level Synthesis in the Post-Moore Era.

[BibT_eX]

[DOI]

IEEE Micro, 2017

Pageforge: a near-memory content-aware page-merging architecture.

[BibT_eX]

[DOI]

Dimitrios Skarlatos

Josep Torrellas

Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

dist-gem5: Distributed simulation of computer clusters.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Symposium on Performance Analysis of Systems and Software, 2017

Temporal codes in on-chip interconnects.

[BibT_eX]

[DOI]

Michael Mishkin

Proceedings of the 2017 IEEE/ACM International Symposium on Low Power Electronics and Design, 2017

Elastic-Cache: GPU Cache Architecture for Efficient Fine- and Coarse-Grained Cache-Line Management.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

Understanding power-performance relationship of energy-efficient modern DRAM devices.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Symposium on Workload Characterization, 2017

Understanding system characteristics of online erasure coding on scalable, distributed and large-scale SSD array systems.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Symposium on Workload Characterization, 2017

Rebooting the Data Access Hierarchy of Computing Systems.

[BibT_eX]

[DOI]

Wen-mei W. Hwu

Izzat El Hajj

Simon Garcia De Gonzalo

Proceedings of the IEEE International Conference on Rebooting Computing, 2017

Collaborative (CPU + GPU) algorithms for triangle counting and truss decomposition on the Minsky architecture: Static graph challenge: Subgraph isomorphism.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE High Performance Extreme Computing Conference, 2017

G-Scalar: Cost-Effective Generalized Scalar Execution Architecture for Power-Efficient GPUs.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture, 2017

Defect Analysis and Cost-Effective Resilience Architecture for Future DRAM Devices.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture, 2017

NCAP: Network-Driven, Packet Context-Aware Power Management for Client-Server Architecture.

[BibT_eX]

[DOI]

Ahmed H. M. O. Abulila

Lokesh Jindal

Daehoon Kim

Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture, 2017

Janus: supporting heterogeneous power management in virtualized environments.

[BibT_eX]

[DOI]

Proceedings of the 2017 Symposium on Cloud Computing, SoCC 2017, Santa Clara, CA, USA, 2017

2016

Workload-Aware Optimal Power Allocation on Single-Chip Heterogeneous Processors.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2016

SpinWise: A Practical Energy-Efficient Synchronization Technique for CMPs.

[BibT_eX]

[DOI]

Hadi Asgharimoghaddam

SIGARCH Comput. Archit. News, 2016

Near-DRAM Acceleration with Single-ISA Heterogeneous Processing in Standard Memory Modules.

[BibT_eX]

[DOI]

IEEE Micro, 2016

Exploring new features of high-bandwidth memory for GPUs.

[BibT_eX]

[DOI]

IEICE Electron. Express, 2016

Approximate Computing: A Survey.

[BibT_eX]

[DOI]

Qiang Xu

Todd Mytkowicz

IEEE Des. Test, 2016

Guest Editors' Introduction: Approximate Computing.

[BibT_eX]

[DOI]

Qiang Xu

Todd Mytkowicz

IEEE Des. Test, 2016

pd-gem5: Simulation Infrastructure for Parallel/Distributed Computer Systems.

[BibT_eX]

[DOI]

Daehoon Kim

Robert C. N. Pilawa-Podgurski

IEEE Comput. Archit. Lett., 2016

Snatch: Opportunistically reassigning power allocation between processor and memory in 3D stacks.

[BibT_eX]

[DOI]

Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016

Chameleon: Versatile and practical near-DRAM acceleration architecture for large memory systems.

[BibT_eX]

[DOI]

Young Hoon Son

Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016

On Effective and Efficient Quality Management for Approximate Computing.

[BibT_eX]

[DOI]

Proceedings of the 2016 International Symposium on Low Power Electronics and Design, 2016

Bit Serializing a Microprocessor for Ultra-low-power.

[BibT_eX]

[DOI]

Proceedings of the 2016 International Symposium on Low Power Electronics and Design, 2016

Fine-Grained Task Migration for Graph Algorithms Using Processing in Memory.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

CNFET-based high throughput register file architecture.

[BibT_eX]

[DOI]

Proceedings of the 34th IEEE International Conference on Computer Design, 2016

VARIUS-TC: A modular architecture-level model of parametric variation for thin-channel switches.

[BibT_eX]

[DOI]

Proceedings of the 34th IEEE International Conference on Computer Design, 2016

DUANG: Fast and lightweight page migration in asymmetric memory systems.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture, 2016

ScalCore: Designing a core for voltage scalability.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture, 2016

Approximating warps with intra-warp operand value similarity.

[BibT_eX]

[DOI]

Daniel Wong

Murali Annavaram

Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture, 2016

VR-scale: runtime dynamic phase scaling of processor voltage regulators for improving power efficiency.

[BibT_eX]

[DOI]

Abhishek Arvind Sinkar

Indrani Paul

Srinivasan Narayanamoorthy

Proceedings of the 53rd Annual Design Automation Conference, 2016

2015

Energy-Efficient Approximate Multiplication for Digital Signal Processing and Classification Applications.

[BibT_eX]

[DOI]

Zhenhong Liu

Taejoon Park

IEEE Trans. Very Large Scale Integr. Syst., 2015

Decoupled Control and Data Processing for Approximate Near-Threshold Voltage Computing.

[BibT_eX]

[DOI]

Ismail Akturk

IEEE Micro, 2015

DRAMA: An Architecture for Accelerated Processing Near Memory.

[BibT_eX]

[DOI]

Sankaralingam Panneerselvam

IEEE Comput. Archit. Lett., 2015

Bolt: Faster Reconfiguration in Operating Systems.

[BibT_eX]

[DOI]

Michael M. Swift

Proceedings of the 2015 USENIX Annual Technical Conference, 2015

vCache: architectural support for transparent and isolated virtual LLCs in virtualized environments.

[BibT_eX]

[DOI]

Proceedings of the 48th International Symposium on Microarchitecture, 2015

GPU register file virtualization.

[BibT_eX]

[DOI]

Hyeran Jeon

Gokul Subramanian Ravi

Murali Annavaram

Proceedings of the 48th International Symposium on Microarchitecture, 2015

COP: to compress and protect main memory.

[BibT_eX]

[DOI]

Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

Comparison of single-ISA heterogeneous versus wide dynamic range processors for mobile applications.

[BibT_eX]

[DOI]

Proceedings of the 33rd IEEE International Conference on Computer Design, 2015

Alloy: Parallel-serial memory channel architecture for single-chip heterogeneous processor systems.

[BibT_eX]

[DOI]

Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015

CiDRA: A cache-inspired DRAM resilience architecture.

[BibT_eX]

[DOI]

Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015

iPatch: Intelligent fault patching to improve energy efficiency.

[BibT_eX]

[DOI]

Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015

NDA: Near-DRAM acceleration architecture leveraging commodity DRAM devices and standard memory modules.

[BibT_eX]

[DOI]

Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015

Online and Operand-Aware Detection of Failures Utilizing False Alarm Vectors.

[BibT_eX]

[DOI]

Proceedings of the 25th edition on Great Lakes Symposium on VLSI, GLVLSI 2015, Pittsburgh, PA, USA, May 20, 2015

2014

Low-Cost Per-Core Voltage Domain Support for Power-Constrained High-Performance Processors.

[BibT_eX]

[DOI]

IEEE Trans. Very Large Scale Integr. Syst., 2014

Energy-Efficient Pixel-Arithmetic.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2014

Optimization of a Cell Counting Algorithm for Mobile Point-of-Care Testing Platforms.

[BibT_eX]

[DOI]

Sensors, 2014

Low-cost scratchpad memory organizations using heterogeneous cell sizes for low-voltage operations.

[BibT_eX]

[DOI]

Taejoon Park

Microprocess. Microsystems, 2014

Maximizing throughput of power/thermal-constrained processors by balancing power consumption of cores.

[BibT_eX]

[DOI]

Proceedings of the Fifteenth International Symposium on Quality Electronic Design, 2014

Energy-efficient reconfigurable cache architectures for accelerator-enabled embedded systems.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE International Symposium on Performance Analysis of Systems and Software, 2014

Quantitative comparison of the power reduction techniques for samsung reconfigurable processor.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Circuits and Systemss, 2014

Row-buffer decoupling: A case for low-latency DRAM microarchitecture.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture, 2014

SleepScale: Runtime joint speed scaling and sleep states management for power efficient data centers.

[BibT_eX]

[DOI]

Yanpei Liu

Stark C. Draper

Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture, 2014

Fair share: Allocation of GPU resources for both performance and fairness.

[BibT_eX]

[DOI]

Paula Aguilera

Proceedings of the 32nd IEEE International Conference on Computer Design, 2014

Precision-aware soft error protection for GPUs.

[BibT_eX]

[DOI]

Proceedings of the 20th IEEE International Symposium on High Performance Computer Architecture, 2014

Accordion: Toward soft Near-Threshold Voltage Computing.

[BibT_eX]

[DOI]

Ismail Akturk

Proceedings of the 20th IEEE International Symposium on High Performance Computer Architecture, 2014

Process variation-aware workload partitioning algorithms for GPUs supporting spatial-multitasking.

[BibT_eX]

[DOI]

Paula Aguilera

Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2014

QoS-aware dynamic resource allocation for spatial-multitasking GPUs.

[BibT_eX]

[DOI]

Paula Aguilera

Proceedings of the 19th Asia and South Pacific Design Automation Conference, 2014

Memory scheduling towards high-throughput cooperative heterogeneous computing.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Parallel Architectures and Compilation, 2014

RCS: runtime resource and core scaling for power-constrained multi-core processors.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Parallel Architectures and Compilation, 2014

2013

Clamping Virtual Supply Voltage of Power-Gated Circuits for Active Leakage Reduction and Gate-Oxide Reliability.

[BibT_eX]

[DOI]

Taejoon Park

IEEE Trans. Very Large Scale Integr. Syst., 2013

Improving Throughput of Power-Constrained Many-Core Processors Based on Unreliable Devices.

[BibT_eX]

[DOI]

IEEE Micro, 2013

Resilient High-Performance Processors with Spare RIBs.

[BibT_eX]

[DOI]

IEEE Micro, 2013

Coping with Parametric Variation at Near-Threshold Voltages.

[BibT_eX]

[DOI]

Josep Torrellas

IEEE Micro, 2013

Queuing Theoretic Analysis of Power-performance Tradeoff in Power-efficient Computing

[BibT_eX]

[DOI]

Yanpei Liu

Stark C. Draper

CoRR, 2013

Exploiting GPU peak-power and performance tradeoffs through reduced effective pipeline latency.

[BibT_eX]

[DOI]

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, 2013

REEL: Reducing effective execution latency of floating point operations.

[BibT_eX]

[DOI]

Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED), 2013

GPUWattch: enabling energy optimizations in GPGPUs.

[BibT_eX]

[DOI]

Jingwen Leng

Tayler H. Hetherington

Proceedings of the 40th Annual International Symposium on Computer Architecture, 2013

Improving platform energy: chip area trade-off in near-threshold computing environment.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, 2013

Dynamic bandwidth scaling for embedded DSPs with 3D-stacked DRAM and wide I/Os.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, 2013

EnergySmart: Toward energy-efficient manycores for Near-Threshold Computing.

[BibT_eX]

[DOI]

Proceedings of the 19th IEEE International Symposium on High Performance Computer Architecture, 2013

Reevaluating the latency claims of 3D stacked memories.

[BibT_eX]

[DOI]

Proceedings of the 18th Asia and South Pacific Design Automation Conference, 2013

2012

Analyzing Potential Throughput Improvement of Power- and Thermal-Constrained Multicore Processors by Exploiting DVFS and PCPG.

[BibT_eX]

[DOI]

IEEE Trans. Very Large Scale Integr. Syst., 2012

Maximizing Frequency and Yield of Power-Constrained Designs Using Programmable Power-Gating.

[BibT_eX]

[DOI]

IEEE Trans. Very Large Scale Integr. Syst., 2012

Analyzing the Impact of Joint Optimization of Cell Size, Redundancy, and ECC on Low-Voltage SRAM Array Total Area.

[BibT_eX]

[DOI]

IEEE Trans. Very Large Scale Integr. Syst., 2012

The case for GPGPU spatial multitasking.

[BibT_eX]

[DOI]

Proceedings of the 18th IEEE International Symposium on High Performance Computer Architecture, 2012

Mitigating random variation with spare RIBs: Redundant intermediate bitslices.

[BibT_eX]

[DOI]

Proceedings of the IEEE/IFIP International Conference on Dependable Systems and Networks, 2012

VARIUS-NTV: A microarchitectural model to capture the increased sensitivity of manycores to process variations at near-threshold voltages.

[BibT_eX]

[DOI]

Proceedings of the IEEE/IFIP International Conference on Dependable Systems and Networks, 2012

Workload-aware voltage regulator optimization for power efficient multi-core processors.

[BibT_eX]

[DOI]

Proceedings of the 2012 Design, Automation & Test in Europe Conference & Exhibition, 2012

Cost-effective power delivery to support per-core voltage domains for power-constrained processors.

[BibT_eX]

[DOI]

Proceedings of the 49th Annual Design Automation Conference 2012, 2012

A Linear Algebra Core Design for Efficient Level-3 BLAS.

[BibT_eX]

[DOI]

Ardavan Pedram

Robert A. van de Geijn

Andreas Gerstlauer

Proceedings of the 23rd IEEE International Conference on Application-Specific Systems, 2012

Virtual Floating-Point Units for Low-Power Embedded Processors.

[BibT_eX]

[DOI]

Proceedings of the 23rd IEEE International Conference on Application-Specific Systems, 2012

Workload and power budget partitioning for single-chip heterogeneous processors.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

Lossless and lossy memory I/O link compression for improving performance of GPGPU workloads.

[BibT_eX]

[DOI]

Vijay Sathish

Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

Power-efficient computing for compute-intensive GPGPU applications.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

2011

Analyzing the performance and energy impact of 3D memory integration on embedded DSPs.

[BibT_eX]

[DOI]

Daniel W. Chang

Proceedings of the 2011 International Conference on Embedded Computer Systems: Architectures, 2011

A low cost approach to calibrate on-chip thermal sensors.

[BibT_eX]

[DOI]

Krishna Bharath

Chunhua Yao

Parameswaran Ramanathan

Kewal K. Saluja

Proceedings of the 12th International Symposium on Quality Electronic Design, 2011

Analyzing throughput of GPGPUs exploiting within-die core-to-core frequency variation.

[BibT_eX]

[DOI]

Paritosh Pratap Ajgaonkar

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2011

Low-voltage on-chip cache architecture using heterogeneous cell sizes for high-performance processors.

[BibT_eX]

[DOI]

Stark C. Draper

Proceedings of the 17th International Conference on High-Performance Computer Architecture (HPCA-17 2011), 2011

Time redundant parity for low-cost transient error detection.

[BibT_eX]

[DOI]

Proceedings of the Design, Automation and Test in Europe, 2011

Scratchpad memory optimizations for digital signal processing applications.

[BibT_eX]

[DOI]

Proceedings of the Design, Automation and Test in Europe, 2011

AVS-aware power-gate sizing for maximum performance and power efficiency of power-constrained processors.

[BibT_eX]

[DOI]

Proceedings of the 16th Asia South Pacific Design Automation Conference, 2011

Energy-efficient floating-point arithmetic for software-defined radio architectures.

[BibT_eX]

[DOI]

Proceedings of the 22nd IEEE International Conference on Application-specific Systems, 2011

Energy-efficient floating-point arithmetic for digital signal processors.

[BibT_eX]

[DOI]

Proceedings of the Conference Record of the Forty Fifth Asilomar Conference on Signals, 2011

Improving Throughput of Power-Constrained GPUs Using Dynamic Voltage/Frequency and Core Scaling.

[BibT_eX]

[DOI]

Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011

2010

Combating Aging with the Colt Duty Cycle Equalizer.

[BibT_eX]

[DOI]

Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture, 2010

Analyzing and minimizing effects of temperature variation and NBTI on active leakage power of power-gated circuits.

[BibT_eX]

[DOI]

Proceedings of the 11th International Symposium on Quality of Electronic Design (ISQED 2010), 2010

The compatibility analysis of thread migration and DVFS in multi-core processor.

[BibT_eX]

[DOI]

Dongkeun Oh

Charlie Chung-Ping Chen

Yu Hen Hu

Proceedings of the 11th International Symposium on Quality of Electronic Design (ISQED 2010), 2010

Workload-adaptive process tuning strategy for power-efficient multi-core processors.

[BibT_eX]

[DOI]

Proceedings of the 2010 International Symposium on Low Power Electronics and Design, 2010

Minimizing total area of low-voltage SRAM arrays through joint optimization of cell size, redundancy, and ECC.

[BibT_eX]

[DOI]

Proceedings of the 28th International Conference on Computer Design, 2010

Optimal algorithm for profile-based power gating: A compiler technique for reducing leakage on execution units in microprocessors.

[BibT_eX]

[DOI]

Proceedings of the 2010 International Conference on Computer-Aided Design, 2010

Runtime temperature-based power estimation for optimizing throughput of thermal-constrained multi-core processors.

[BibT_eX]

[DOI]

Dongkeun Oh

Charlie Chung-Ping Chen

Azadeh Davoodi

Yu Hen Hu

Proceedings of the 15th Asia South Pacific Design Automation Conference, 2010

Analyzing impact of multiple ABB and AVS domains on throughput of power and thermal-constrained multi-core processors.

[BibT_eX]

[DOI]

Shi-Ting Zhou

Proceedings of the 15th Asia South Pacific Design Automation Conference, 2010

2009

Analyzing potential power reduction with adaptive voltage positioning optimized for multicore processors.

[BibT_eX]

[DOI]

Proceedings of the 2009 International Symposium on Low Power Electronics and Design, 2009

Optimizing total power of many-core processors considering voltage scaling limit and process variations.

[BibT_eX]

[DOI]

Proceedings of the 2009 International Symposium on Low Power Electronics and Design, 2009

Frequency and yield optimization using power gates in power-constrained designs.

[BibT_eX]

[DOI]

Proceedings of the 2009 International Symposium on Low Power Electronics and Design, 2009

Statistical static timing analysis considering leakage variability in power gated designs.

[BibT_eX]

[DOI]

Proceedings of the 2009 International Symposium on Low Power Electronics and Design, 2009

Optimizing throughput of power- and thermal-constrained multicore processors using DVFS and per-core power-gating.

[BibT_eX]

[DOI]

Proceedings of the 46th Design Automation Conference, 2009

2008

On-chip cache device scaling limits and effective fault repair techniques in future nanoscale technology.

[BibT_eX]

[DOI]

David Roberts

Microprocess. Microsystems, 2008

2007

Yield-driven near-threshold SRAM design.

[BibT_eX]

[DOI]

Proceedings of the 2007 International Conference on Computer-Aided Design, 2007

2005

Quantitative analysis and optimization techniques for on-chip cache leakage power.

[BibT_eX]

[DOI]

David T. Blaauw

IEEE Trans. Very Large Scale Integr. Syst., 2005

Total leakage optimization strategies for multi-level caches.

[BibT_eX]

[DOI]

Proceedings of the 15th ACM Great Lakes Symposium on VLSI 2005, 2005

Power-Performance Trade-Offs in Nanometer-Scale Multi-Level Caches Considering Total Leakage.

[BibT_eX]

[DOI]

Proceedings of the 2005 Design, 2005

2004

Circuit and microarchitectural techniques for reducing cache leakage power.

[BibT_eX]

[DOI]

IEEE Trans. Very Large Scale Integr. Syst., 2004

Razor: Circuit-Level Correction of Timing Errors for Low-Power Operation.

[BibT_eX]

[DOI]

IEEE Micro, 2004

Microarchitectural power modeling techniques for deep sub-micron microprocessors.

[BibT_eX]

[DOI]

Proceedings of the 2004 International Symposium on Low Power Electronics and Design, 2004

2003

Leakage Current: Moore's Law Meets Static Power.

[BibT_eX]

[DOI]

Narayanan Vijaykrishnan

Computer, 2003

Razor: A Low-Power Pipeline Based on Circuit-Level Timing Speculation.

[BibT_eX]

[DOI]

Proceedings of the 36th Annual International Symposium on Microarchitecture, 2003

The microarchitecture of a low power register file.

[BibT_eX]

[DOI]

Proceedings of the 2003 International Symposium on Low Power Electronics and Design, 2003

Reducing register ports using delayed write-back queues and operand pre-fetch.

[BibT_eX]

[DOI]

Proceedings of the 17th Annual International Conference on Supercomputing, 2003

Leakage Power Optimization Techniques for Ultra Deep Sub-Micron Multi-Level Caches.

[BibT_eX]

[DOI]

David T. Blaauw

Proceedings of the 2003 International Conference on Computer-Aided Design, 2003

A 2.3Gb/s fully integrated and synthesizable AES Rijndael core.

[BibT_eX]

[DOI]