Nam Sung Kim

Orcid: 0000-0002-0442-5634

Affiliations:
  • University of Illinois, Urbana-Champaign, IL, USA


According to our database1, Nam Sung Kim authored at least 231 papers between 2002 and 2024.

Collaborative distances:

Awards

ACM Fellow

ACM Fellow 2020, "For contributions to design and modeling of power-efficient computer architectures".

IEEE Fellow

IEEE Fellow 2016, "For contribution to circuits and architectures for power-efficient microprocessors".

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Harmonic: Hardware-assisted RDMA Performance Isolation for Public Clouds.
Proceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation, 2024

An LPDDR-based CXL-PNM Platform for TCO-efficient Inference of Transformer-based Large Language Models.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2024

SPADE: Sparse Pillar-based 3D Object Detection Accelerator for Autonomous Driving.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2024

ScaleCache: A Scalable Page Cache for Multiple Solid-State Drives.
Proceedings of the Nineteenth European Conference on Computer Systems, 2024

TAROT: A CXL SmartNIC-Based Defense Against Multi-bit Errors by Row-Hammer Attacks.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

A Quantitative Analysis and Guidelines of Data Streaming Accelerator in Modern Intel Xeon Scalable Processors.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

Tandem Processor: Grappling with Emerging Operators in Neural Networks.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

AttAcc! Unleashing the Power of PIM for Batched Transformer-based Generative Model Inference.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

2023
Rethinking DRAM's Page Mode With STT-MRAM.
IEEE Trans. Computers, May, 2023

Special Issue on Emerging System Interconnects.
IEEE Micro, 2023

A Quantitative Analysis and Guideline of Data Streaming Accelerator in Intel 4th Gen Xeon Scalable Processors.
CoRR, 2023

Demystifying CXL Memory with Genuine CXL-Ready Systems and Devices.
CoRR, 2023

Asynchronous Persistence with ASAP.
CoRR, 2023

Defensive ML: Defending Architectural Side-channels with Adversarial Obfuscation.
CoRR, 2023

X-ray: Discovering DRAM Internal Structure and Error Characteristics by Issuing Memory Commands.
IEEE Comput. Archit. Lett., 2023

LADIO: Leakage-Aware Direct I/O for I/O-Intensive Workloads.
IEEE Comput. Archit. Lett., 2023

Unleashing the Potential of PIM: Accelerating Large Batched Inference of Transformer-Based Generative Models.
IEEE Comput. Archit. Lett., 2023

STYX: Exploiting SmartNIC Capability to Reduce Datacenter Memory Tax.
Proceedings of the 2023 USENIX Annual Technical Conference, 2023

Demystifying CXL Memory with Genuine CXL-Ready Systems and Devices.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

How to Kill the Second Bird with One ECC: The Pursuit of Row Hammer Resilient DRAM.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

Analyzing Energy Efficiency of a Server with a SmartNIC under SLO Constraints.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2023

MESA: Microarchitecture Extensions for Spatial Architecture Generation.
Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

Making Sense of Using a SmartNIC to Reduce Datacenter Tax from SLO and TCO Perspectives.
Proceedings of the IEEE International Symposium on Workload Characterization, 2023

Rambda: RDMA-driven Acceleration Framework for Memory-intensive µs-scale Datacenter Applications.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2023

SHADOW: Preventing Row Hammer in DRAM with Intra-Subarray Row Shuffling.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2023

Towards a Manageable Intra-Host Network.
Proceedings of the 19th Workshop on Hot Topics in Operating Systems, 2023

2022
DML: Dynamic Partial Reconfiguration With Scalable Task Scheduling for Multi-Applications on FPGAs.
IEEE Trans. Computers, 2022

OSC: An Online Self-Configuring Big Data Framework for Optimization of QoS.
IEEE Trans. Computers, 2022

Harmony: Overcoming the hurdles of GPU memory capacity to train massive DNN models on commodity servers.
Proc. VLDB Endow., 2022

Near-Memory Processing in Action: Accelerating Personalized Recommendation With AxDIMM.
IEEE Micro, 2022

Aquabolt-XL HBM2-PIM, LPDDR5-PIM With In-Memory Processing, and AXDIMM With Acceleration Buffer.
IEEE Micro, 2022

Coordinated Science Laboratory 70th Anniversary Symposium: The Future of Computing.
CoRR, 2022

ORCA: A Network and Architecture Co-design for Offloading us-scale Datacenter Applications.
CoRR, 2022

Unlocking the Power of Inline Floating-Point Operations on Programmable Switches.
Proceedings of the 19th USENIX Symposium on Networked Systems Design and Implementation, 2022

BNS-GCN: Efficient Full-Graph Training of Graph Convolutional Networks with Partition-Parallelism and Random Boundary Node Sampling.
Proceedings of Machine Learning and Systems 2022, 2022

IDIO: Network-Driven, Inbound Network Data Orchestration on Server Processors.
Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022

ASAP: architecture support for asynchronous persistence.
Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022

PipeGCN: Efficient Full-Graph Training of Graph Convolutional Networks with Pipelined Feature Communication.
Proceedings of the Tenth International Conference on Learning Representations, 2022

An FPGA-based RNN-T Inference Accelerator with PIM-HBM.
Proceedings of the FPGA '22: The 2022 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Virtual Event, USA, 27 February 2022, 2022

2021
Virtual-Cache: A cache-line borrowing technique for efficient GPU cache architectures.
Microprocess. Microsystems, September, 2021

BabelFish: Fusing Address Translations for Containers.
IEEE Micro, 2021

An 8.5-Gb/s/Pin 12-Gb LPDDR5 SDRAM With a Hybrid-Bank Architecture, Low Power, and Speed-Boosting Techniques.
IEEE J. Solid State Circuits, 2021

A 16-GB 640-GB/s HBM2E DRAM With a Data-Bus Window Extension Technique and a Synergetic On-Die ECC Scheme.
IEEE J. Solid State Circuits, 2021

IDIO: Orchestrating Inbound Network Data on Server Processors.
IEEE Comput. Archit. Lett., 2021

GreenDIMM: OS-assisted DRAM Power Management for DRAM with a Sub-array Granularity Power-Down State.
Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

NMAP: Power Management Based on Network Packet Processing Mode Transition for Latency-Critical Workloads.
Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

25.4 A 20nm 6GB Function-In-Memory DRAM, Based on HBM2 with a 1.2TFLOPS Programmable Computing Unit Using Bank-Level Parallelism, for Machine Learning Applications.
Proceedings of the IEEE International Solid-State Circuits Conference, 2021

25.2 A 16Gb Sub-1V 7.14Gb/s/pin LPDDR5 SDRAM Applying a Mosaic Architecture with a Short-Feedback 1-Tap DFE, an FSS Bus with Low-Level Swing and an Adaptively Controlled Body Biasing in a 3<sup>rd</sup>-Generation 10nm DRAM.
Proceedings of the IEEE International Solid-State Circuits Conference, 2021

Don't Forget the I/O When Allocating Your LLC.
Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021

Hardware Architecture and Software Stack for PIM Based on Commercial DRAM Technology : Industrial Product.
Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021

Revamping Storage Class Memory With Hardware Automated Memory-Over-Storage Solution.
Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021

QEI: Query Acceleration Can be Generic and Efficient in the Cloud.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2021

Doing more with less: training large DNN models on commodity servers for the masses.
Proceedings of the HotOS '21: Workshop on Hot Topics in Operating Systems, 2021

Aquabolt-XL: Samsung HBM2-PIM with in-memory processing for ML accelerators and beyond.
Proceedings of the IEEE Hot Chips 33 Symposium, 2021

DiAG: a dataflow-inspired architecture for general-purpose processors.
Proceedings of the ASPLOS '21: 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2021

2020
Errata to "Exploring Fault-Tolerant Erasure Codes for Scalable All-Flash Array Clusters".
IEEE Trans. Parallel Distributed Syst., 2020

Design and Implementation of SSD-Assisted Backup and Recovery for Database Systems.
IEEE Trans. Knowl. Data Eng., 2020

IOCA: High-Speed I/O-Aware LLC Management for Network-Centric Multi-Tenant Platform.
CoRR, 2020

FastDrain: Removing Page Victimization Overheads in NVMe Storage Stack.
IEEE Comput. Archit. Lett., 2020

Network Packet Processing Mode-Aware Power Management for Data Center Servers.
IEEE Comput. Archit. Lett., 2020

Leveraging Dynamic Partial Reconfiguration with Scalable ILP Based Task Scheduling.
Proceedings of the 33rd International Conference on VLSI Design and 19th International Conference on Embedded Systems, 2020

Planaria: Dynamic Architecture Fission for Spatial Multi-Tenant Acceleration of Deep Neural Networks.
Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020

FReaC Cache: Folded-logic Reconfigurable Computing in the Last Level Cache.
Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020


Data Direct I/O Characterization for Future I/O System Exploration.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2020

Bit-Parallel Vector Composability for Neural Acceleration.
Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020

Mixed-Signal Charge-Domain Acceleration of Deep Neural Networks through Interleaved Bit-Partitioned Arithmetic.
Proceedings of the PACT '20: International Conference on Parallel Architectures and Compilation Techniques, 2020

2019
Exploring Fault-Tolerant Erasure Codes for Scalable All-Flash Array Clusters.
IEEE Trans. Parallel Distributed Syst., 2019

An Efficient GPU Cache Architecture for Applications with Irregular Memory Access Patterns.
ACM Trans. Archit. Code Optim., 2019

An Energy-Efficient Programmable Mixed-Signal Accelerator for Machine Learning Algorithms.
IEEE Micro, 2019

Mixed-Signal Charge-Domain Acceleration of Deep Neural networks through Interleaved Bit-Partitioned Arithmetic.
CoRR, 2019

Exploiting OS-Level Memory Offlining for DRAM Power Management.
IEEE Comput. Archit. Lett., 2019

Ghost routers: energy-efficient asymmetric multicore processors with symmetric NoCs.
Proceedings of the 13th IEEE/ACM International Symposium on Networks-on-Chip, 2019

NetDIMM: Low-Latency Near-Memory Network Interface Architecture.
Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019

SMART: STT-MRAM architecture for smart activation and sensing.
Proceedings of the International Symposium on Memory Systems, 2019

Near-Memory and In-Storage FPGA Acceleration for Emerging Cognitive Computing Workloads.
Proceedings of the 2019 IEEE Computer Society Annual Symposium on VLSI, 2019

A<sup>2</sup>M: Approximate Algebraic Memory Using Polynomials Rings.
Proceedings of the 2019 IEEE/ACM International Symposium on Low Power Electronics and Design, 2019

AxMemo: hardware-compiler co-design for approximate code memoization.
Proceedings of the 46th International Symposium on Computer Architecture, 2019

LL-PCM: Low-Latency Phase Change Memory Architecture.
Proceedings of the 56th Annual Design Automation Conference 2019, 2019

Practical Near-Data Processing to Evolve Memory and Storage Devices into Mainstream Heterogeneous Computing Systems.
Proceedings of the 56th Annual Design Automation Conference 2019, 2019

FlatFlash: Exploiting the Byte-Accessibility of SSDs within a Unified Memory-Storage Hierarchy.
Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, 2019

Approximate Ultra-Low Voltage Many-Core Processor Design.
Proceedings of the Approximate Circuits, Methodologies and CAD., 2019

2018
CNFET-Based High Throughput SIMD Architecture.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2018

SiMul: An Algorithm-Driven Approximate Multiplier Design for Machine Learning.
IEEE Micro, 2018

GANAX: A Unified MIMD-SIMD Acceleration for Generative Adversarial Networks.
CoRR, 2018

Semi-Coherent DMA: An Alternative I/O Coherency Management for Embedded Systems.
IEEE Comput. Archit. Lett., 2018

SimpleSSD: Modeling Solid State Drives for Holistic System Simulation.
IEEE Comput. Archit. Lett., 2018

Leveraging Power-Performance Relationship of Energy-Efficient Modern DRAM Devices.
IEEE Access, 2018

FlashShare: Punching Through Server Storage Stack from Kernel to Firmware for Ultra-Low Latency SSDs.
Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation, 2018

GradiVeQ: Vector Quantization for Bandwidth-Efficient Gradient Aggregation in Distributed CNN Training.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Pipe-SGD: A Decentralized Pipelined SGD Framework for Distributed Deep Net Training.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

A Network-Centric Hardware/Algorithm Co-Design to Accelerate Distributed Training of Deep Neural Networks.
Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018

Amber*: Enabling Precise Full-System Simulation with Detailed Modeling of All SSD Resources.
Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018

Application-Transparent Near-Memory Processing Architecture with Memory Channel Network.
Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018

A load balancing technique for memory channels.
Proceedings of the International Symposium on Memory Systems, 2018

Load-Triggered Warp Approximation on GPU.
Proceedings of the International Symposium on Low Power Electronics and Design, 2018

GANAX: A Unified MIMD-SIMD Acceleration for Generative Adversarial Networks.
Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture, 2018

PROMISE: An End-to-End Design of a Programmable Mixed-Signal Accelerator for Machine-Learning Algorithms.
Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture, 2018

CIAO: Cache Interference-Aware Throughput-Oriented Architecture and Scheduling for GPUs.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

CTA-Aware Prefetching and Scheduling for GPU.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

Simulating PCI-Express Interconnect for Future System Exploration.
Proceedings of the 2018 IEEE International Symposium on Workload Characterization, 2018

FlexiGAN: An End-to-End Solution for FPGA Acceleration of Generative Adversarial Networks.
Proceedings of the 26th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2018

VIP: Virtual Performance-State for Efficient Power Management of Virtual Machines.
Proceedings of the ACM Symposium on Cloud Computing, 2018

Practical Challenges in Supporting Function in Memory.
Proceedings of the IEEE Asian Solid-State Circuits Conference, 2018

In-DRAM near-data approximate acceleration for GPUs.
Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques, 2018

3D-Xpath: high-density managed DRAM architecture with cost-effective alternative paths for memory transactions.
Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques, 2018

2017
Smart Gait-Aid Glasses for Parkinson's Disease Patients.
IEEE Trans. Biomed. Eng., 2017

Heterogeneous Computing Meets Near-Memory Acceleration and High-Level Synthesis in the Post-Moore Era.
IEEE Micro, 2017

Pageforge: a near-memory content-aware page-merging architecture.
Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

dist-gem5: Distributed simulation of computer clusters.
Proceedings of the 2017 IEEE International Symposium on Performance Analysis of Systems and Software, 2017

Temporal codes in on-chip interconnects.
Proceedings of the 2017 IEEE/ACM International Symposium on Low Power Electronics and Design, 2017

Elastic-Cache: GPU Cache Architecture for Efficient Fine- and Coarse-Grained Cache-Line Management.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

Understanding power-performance relationship of energy-efficient modern DRAM devices.
Proceedings of the 2017 IEEE International Symposium on Workload Characterization, 2017

Understanding system characteristics of online erasure coding on scalable, distributed and large-scale SSD array systems.
Proceedings of the 2017 IEEE International Symposium on Workload Characterization, 2017

Rebooting the Data Access Hierarchy of Computing Systems.
Proceedings of the IEEE International Conference on Rebooting Computing, 2017

Collaborative (CPU + GPU) algorithms for triangle counting and truss decomposition on the Minsky architecture: Static graph challenge: Subgraph isomorphism.
Proceedings of the 2017 IEEE High Performance Extreme Computing Conference, 2017

G-Scalar: Cost-Effective Generalized Scalar Execution Architecture for Power-Efficient GPUs.
Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture, 2017

Defect Analysis and Cost-Effective Resilience Architecture for Future DRAM Devices.
Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture, 2017

NCAP: Network-Driven, Packet Context-Aware Power Management for Client-Server Architecture.
Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture, 2017

Janus: supporting heterogeneous power management in virtualized environments.
Proceedings of the 2017 Symposium on Cloud Computing, SoCC 2017, Santa Clara, CA, USA, 2017

2016
Workload-Aware Optimal Power Allocation on Single-Chip Heterogeneous Processors.
IEEE Trans. Parallel Distributed Syst., 2016

SpinWise: A Practical Energy-Efficient Synchronization Technique for CMPs.
SIGARCH Comput. Archit. News, 2016

Near-DRAM Acceleration with Single-ISA Heterogeneous Processing in Standard Memory Modules.
IEEE Micro, 2016

Exploring new features of high-bandwidth memory for GPUs.
IEICE Electron. Express, 2016

Approximate Computing: A Survey.
IEEE Des. Test, 2016

Guest Editors' Introduction: Approximate Computing.
IEEE Des. Test, 2016

pd-gem5: Simulation Infrastructure for Parallel/Distributed Computer Systems.
IEEE Comput. Archit. Lett., 2016

Snatch: Opportunistically reassigning power allocation between processor and memory in 3D stacks.
Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016

Chameleon: Versatile and practical near-DRAM acceleration architecture for large memory systems.
Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016

On Effective and Efficient Quality Management for Approximate Computing.
Proceedings of the 2016 International Symposium on Low Power Electronics and Design, 2016

Bit Serializing a Microprocessor for Ultra-low-power.
Proceedings of the 2016 International Symposium on Low Power Electronics and Design, 2016

Fine-Grained Task Migration for Graph Algorithms Using Processing in Memory.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

CNFET-based high throughput register file architecture.
Proceedings of the 34th IEEE International Conference on Computer Design, 2016

VARIUS-TC: A modular architecture-level model of parametric variation for thin-channel switches.
Proceedings of the 34th IEEE International Conference on Computer Design, 2016

DUANG: Fast and lightweight page migration in asymmetric memory systems.
Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture, 2016

ScalCore: Designing a core for voltage scalability.
Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture, 2016

Approximating warps with intra-warp operand value similarity.
Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture, 2016

VR-scale: runtime dynamic phase scaling of processor voltage regulators for improving power efficiency.
Proceedings of the 53rd Annual Design Automation Conference, 2016

2015
Energy-Efficient Approximate Multiplication for Digital Signal Processing and Classification Applications.
IEEE Trans. Very Large Scale Integr. Syst., 2015

Decoupled Control and Data Processing for Approximate Near-Threshold Voltage Computing.
IEEE Micro, 2015

DRAMA: An Architecture for Accelerated Processing Near Memory.
IEEE Comput. Archit. Lett., 2015

Bolt: Faster Reconfiguration in Operating Systems.
Proceedings of the 2015 USENIX Annual Technical Conference, 2015

vCache: architectural support for transparent and isolated virtual LLCs in virtualized environments.
Proceedings of the 48th International Symposium on Microarchitecture, 2015

GPU register file virtualization.
Proceedings of the 48th International Symposium on Microarchitecture, 2015

COP: to compress and protect main memory.
Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

Comparison of single-ISA heterogeneous versus wide dynamic range processors for mobile applications.
Proceedings of the 33rd IEEE International Conference on Computer Design, 2015

Alloy: Parallel-serial memory channel architecture for single-chip heterogeneous processor systems.
Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015

CiDRA: A cache-inspired DRAM resilience architecture.
Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015

iPatch: Intelligent fault patching to improve energy efficiency.
Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015

NDA: Near-DRAM acceleration architecture leveraging commodity DRAM devices and standard memory modules.
Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015

Online and Operand-Aware Detection of Failures Utilizing False Alarm Vectors.
Proceedings of the 25th edition on Great Lakes Symposium on VLSI, GLVLSI 2015, Pittsburgh, PA, USA, May 20, 2015

2014
Low-Cost Per-Core Voltage Domain Support for Power-Constrained High-Performance Processors.
IEEE Trans. Very Large Scale Integr. Syst., 2014

Energy-Efficient Pixel-Arithmetic.
IEEE Trans. Computers, 2014

Optimization of a Cell Counting Algorithm for Mobile Point-of-Care Testing Platforms.
Sensors, 2014

Low-cost scratchpad memory organizations using heterogeneous cell sizes for low-voltage operations.
Microprocess. Microsystems, 2014

Maximizing throughput of power/thermal-constrained processors by balancing power consumption of cores.
Proceedings of the Fifteenth International Symposium on Quality Electronic Design, 2014

Energy-efficient reconfigurable cache architectures for accelerator-enabled embedded systems.
Proceedings of the 2014 IEEE International Symposium on Performance Analysis of Systems and Software, 2014

Quantitative comparison of the power reduction techniques for samsung reconfigurable processor.
Proceedings of the IEEE International Symposium on Circuits and Systemss, 2014

Row-buffer decoupling: A case for low-latency DRAM microarchitecture.
Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture, 2014

SleepScale: Runtime joint speed scaling and sleep states management for power efficient data centers.
Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture, 2014

Fair share: Allocation of GPU resources for both performance and fairness.
Proceedings of the 32nd IEEE International Conference on Computer Design, 2014

Precision-aware soft error protection for GPUs.
Proceedings of the 20th IEEE International Symposium on High Performance Computer Architecture, 2014

Accordion: Toward soft Near-Threshold Voltage Computing.
Proceedings of the 20th IEEE International Symposium on High Performance Computer Architecture, 2014

Process variation-aware workload partitioning algorithms for GPUs supporting spatial-multitasking.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2014

QoS-aware dynamic resource allocation for spatial-multitasking GPUs.
Proceedings of the 19th Asia and South Pacific Design Automation Conference, 2014

Memory scheduling towards high-throughput cooperative heterogeneous computing.
Proceedings of the International Conference on Parallel Architectures and Compilation, 2014

RCS: runtime resource and core scaling for power-constrained multi-core processors.
Proceedings of the International Conference on Parallel Architectures and Compilation, 2014

2013
Clamping Virtual Supply Voltage of Power-Gated Circuits for Active Leakage Reduction and Gate-Oxide Reliability.
IEEE Trans. Very Large Scale Integr. Syst., 2013

Improving Throughput of Power-Constrained Many-Core Processors Based on Unreliable Devices.
IEEE Micro, 2013

Resilient High-Performance Processors with Spare RIBs.
IEEE Micro, 2013

Coping with Parametric Variation at Near-Threshold Voltages.
IEEE Micro, 2013

Queuing Theoretic Analysis of Power-performance Tradeoff in Power-efficient Computing
CoRR, 2013

Exploiting GPU peak-power and performance tradeoffs through reduced effective pipeline latency.
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, 2013

REEL: Reducing effective execution latency of floating point operations.
Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED), 2013

GPUWattch: enabling energy optimizations in GPGPUs.
Proceedings of the 40th Annual International Symposium on Computer Architecture, 2013

Improving platform energy: chip area trade-off in near-threshold computing environment.
Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, 2013

Dynamic bandwidth scaling for embedded DSPs with 3D-stacked DRAM and wide I/Os.
Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, 2013

EnergySmart: Toward energy-efficient manycores for Near-Threshold Computing.
Proceedings of the 19th IEEE International Symposium on High Performance Computer Architecture, 2013

Reevaluating the latency claims of 3D stacked memories.
Proceedings of the 18th Asia and South Pacific Design Automation Conference, 2013

2012
Analyzing Potential Throughput Improvement of Power- and Thermal-Constrained Multicore Processors by Exploiting DVFS and PCPG.
IEEE Trans. Very Large Scale Integr. Syst., 2012

Maximizing Frequency and Yield of Power-Constrained Designs Using Programmable Power-Gating.
IEEE Trans. Very Large Scale Integr. Syst., 2012

Analyzing the Impact of Joint Optimization of Cell Size, Redundancy, and ECC on Low-Voltage SRAM Array Total Area.
IEEE Trans. Very Large Scale Integr. Syst., 2012

The case for GPGPU spatial multitasking.
Proceedings of the 18th IEEE International Symposium on High Performance Computer Architecture, 2012

Mitigating random variation with spare RIBs: Redundant intermediate bitslices.
Proceedings of the IEEE/IFIP International Conference on Dependable Systems and Networks, 2012

VARIUS-NTV: A microarchitectural model to capture the increased sensitivity of manycores to process variations at near-threshold voltages.
Proceedings of the IEEE/IFIP International Conference on Dependable Systems and Networks, 2012

Workload-aware voltage regulator optimization for power efficient multi-core processors.
Proceedings of the 2012 Design, Automation & Test in Europe Conference & Exhibition, 2012

Cost-effective power delivery to support per-core voltage domains for power-constrained processors.
Proceedings of the 49th Annual Design Automation Conference 2012, 2012

A Linear Algebra Core Design for Efficient Level-3 BLAS.
Proceedings of the 23rd IEEE International Conference on Application-Specific Systems, 2012

Virtual Floating-Point Units for Low-Power Embedded Processors.
Proceedings of the 23rd IEEE International Conference on Application-Specific Systems, 2012

Workload and power budget partitioning for single-chip heterogeneous processors.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

Lossless and lossy memory I/O link compression for improving performance of GPGPU workloads.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

Power-efficient computing for compute-intensive GPGPU applications.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

2011
Analyzing the performance and energy impact of 3D memory integration on embedded DSPs.
Proceedings of the 2011 International Conference on Embedded Computer Systems: Architectures, 2011

A low cost approach to calibrate on-chip thermal sensors.
Proceedings of the 12th International Symposium on Quality Electronic Design, 2011

Analyzing throughput of GPGPUs exploiting within-die core-to-core frequency variation.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2011

Low-voltage on-chip cache architecture using heterogeneous cell sizes for high-performance processors.
Proceedings of the 17th International Conference on High-Performance Computer Architecture (HPCA-17 2011), 2011

Time redundant parity for low-cost transient error detection.
Proceedings of the Design, Automation and Test in Europe, 2011

Scratchpad memory optimizations for digital signal processing applications.
Proceedings of the Design, Automation and Test in Europe, 2011

AVS-aware power-gate sizing for maximum performance and power efficiency of power-constrained processors.
Proceedings of the 16th Asia South Pacific Design Automation Conference, 2011

Energy-efficient floating-point arithmetic for software-defined radio architectures.
Proceedings of the 22nd IEEE International Conference on Application-specific Systems, 2011

Energy-efficient floating-point arithmetic for digital signal processors.
Proceedings of the Conference Record of the Forty Fifth Asilomar Conference on Signals, 2011

Improving Throughput of Power-Constrained GPUs Using Dynamic Voltage/Frequency and Core Scaling.
Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011

2010
Combating Aging with the Colt Duty Cycle Equalizer.
Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture, 2010

Analyzing and minimizing effects of temperature variation and NBTI on active leakage power of power-gated circuits.
Proceedings of the 11th International Symposium on Quality of Electronic Design (ISQED 2010), 2010

The compatibility analysis of thread migration and DVFS in multi-core processor.
Proceedings of the 11th International Symposium on Quality of Electronic Design (ISQED 2010), 2010

Workload-adaptive process tuning strategy for power-efficient multi-core processors.
Proceedings of the 2010 International Symposium on Low Power Electronics and Design, 2010

Minimizing total area of low-voltage SRAM arrays through joint optimization of cell size, redundancy, and ECC.
Proceedings of the 28th International Conference on Computer Design, 2010

Optimal algorithm for profile-based power gating: A compiler technique for reducing leakage on execution units in microprocessors.
Proceedings of the 2010 International Conference on Computer-Aided Design, 2010

Runtime temperature-based power estimation for optimizing throughput of thermal-constrained multi-core processors.
Proceedings of the 15th Asia South Pacific Design Automation Conference, 2010

Analyzing impact of multiple ABB and AVS domains on throughput of power and thermal-constrained multi-core processors.
Proceedings of the 15th Asia South Pacific Design Automation Conference, 2010

2009
Analyzing potential power reduction with adaptive voltage positioning optimized for multicore processors.
Proceedings of the 2009 International Symposium on Low Power Electronics and Design, 2009

Optimizing total power of many-core processors considering voltage scaling limit and process variations.
Proceedings of the 2009 International Symposium on Low Power Electronics and Design, 2009

Frequency and yield optimization using power gates in power-constrained designs.
Proceedings of the 2009 International Symposium on Low Power Electronics and Design, 2009

Statistical static timing analysis considering leakage variability in power gated designs.
Proceedings of the 2009 International Symposium on Low Power Electronics and Design, 2009

Optimizing throughput of power- and thermal-constrained multicore processors using DVFS and per-core power-gating.
Proceedings of the 46th Design Automation Conference, 2009

2008
On-chip cache device scaling limits and effective fault repair techniques in future nanoscale technology.
Microprocess. Microsystems, 2008

2007
Yield-driven near-threshold SRAM design.
Proceedings of the 2007 International Conference on Computer-Aided Design, 2007

2005
Quantitative analysis and optimization techniques for on-chip cache leakage power.
IEEE Trans. Very Large Scale Integr. Syst., 2005

Total leakage optimization strategies for multi-level caches.
Proceedings of the 15th ACM Great Lakes Symposium on VLSI 2005, 2005

Power-Performance Trade-Offs in Nanometer-Scale Multi-Level Caches Considering Total Leakage.
Proceedings of the 2005 Design, 2005

2004
Circuit and microarchitectural techniques for reducing cache leakage power.
IEEE Trans. Very Large Scale Integr. Syst., 2004

Razor: Circuit-Level Correction of Timing Errors for Low-Power Operation.
IEEE Micro, 2004

Microarchitectural power modeling techniques for deep sub-micron microprocessors.
Proceedings of the 2004 International Symposium on Low Power Electronics and Design, 2004

2003
Leakage Current: Moore's Law Meets Static Power.
Computer, 2003

Razor: A Low-Power Pipeline Based on Circuit-Level Timing Speculation.
Proceedings of the 36th Annual International Symposium on Microarchitecture, 2003

The microarchitecture of a low power register file.
Proceedings of the 2003 International Symposium on Low Power Electronics and Design, 2003

Reducing register ports using delayed write-back queues and operand pre-fetch.
Proceedings of the 17th Annual International Conference on Supercomputing, 2003

Leakage Power Optimization Techniques for Ultra Deep Sub-Micron Multi-Level Caches.
Proceedings of the 2003 International Conference on Computer-Aided Design, 2003

A 2.3Gb/s fully integrated and synthesizable AES Rijndael core.
Proceedings of the IEEE Custom Integrated Circuits Conference, 2003

2002
Drowsy instruction caches: leakage power reduction using dynamic voltage scaling and cache sub-bank prediction.
Proceedings of the 35th Annual International Symposium on Microarchitecture, 2002

Drowsy Caches: Simple Techniques for Reducing Leakage Power.
Proceedings of the 29th International Symposium on Computer Architecture (ISCA 2002), 2002


  Loading...