Onur Mutlu

According to our database1, Onur Mutlu authored at least 293 papers between 2003 and 2018.

Collaborative distances:

Awards

ACM Fellow

ACM Fellow 2017, "For contributions to computer architecture research, especially in memory systems".

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Other 

Links

Homepages:

On csauthors.net:

Bibliography

2018
Mosaic: Enabling Application-Transparent Support for Multiple Page Sizes in Throughput Processors.
Operating Systems Review, 2018

Efficient Document Analytics on Compressed Data: Method, Challenges, Algorithms, Insights.
PVLDB, 2018

ECI-Cache: A High-Endurance and Cost-Efficient I/O Caching Scheme for Virtualized Platforms.
POMACS, 2018

Iterative Modulo Scheduling.
IEEE Micro, 2018

Enabling Efficient RDMA-based Synchronous Mirroring of Persistent Memory Transactions.
CoRR, 2018

SLIDER: Fast and Efficient Computation of Banded Sequence Alignment.
CoRR, 2018

D-RaNGe: Violating DRAM Timing Constraints for High-Throughput True Random Number Generation using Commodity DRAM Devices.
CoRR, 2018

Improving 3D NAND Flash Memory Lifetime by Tolerating Early Retention Loss and Process Variation.
CoRR, 2018

What Your DRAM Power Models Are Not Telling You: Lessons from a Detailed Experimental Study.
CoRR, 2018

Techniques for Efficiently Handling Power Surges in Fuel Cell Powered Data Centers: Modeling, Analysis, Results.
CoRR, 2018

Recent Advances in DRAM and Flash Memory Architectures.
CoRR, 2018

Recent Advances in Overcoming Bottlenecks in Memory Systems and Managing Memory Resources in GPU Systems.
CoRR, 2018

Predictable Performance and Fairness Through Accurate Slowdown Estimation in Shared Main Memory Systems.
CoRR, 2018

Exploiting Row-Level Temporal Locality in DRAM to Reduce the Memory Access Latency.
CoRR, 2018

RowClone: Accelerating Data Movement and Initialization Using DRAM.
CoRR, 2018

Characterizing, Exploiting, and Mitigating Vulnerabilities in MLC NAND Flash Memory Programming.
CoRR, 2018

Read Disturb Errors in MLC NAND Flash Memory.
CoRR, 2018

SoftMC: Practical DRAM Characterization Using an FPGA-Based Infrastructure.
CoRR, 2018

LISA: Increasing Internal Connectivity in DRAM for Fast Data Movement and Low Latency.
CoRR, 2018

Voltron: Understanding and Exploiting the Voltage-Latency-Reliability Trade-Offs in Modern DRAM Chips to Improve Energy Efficiency.
CoRR, 2018

Flexible-Latency DRAM: Understanding and Exploiting Latency Variation in Modern DRAM Chips.
CoRR, 2018

Tiered-Latency DRAM: Enabling Low-Latency Main Memory at Low Cost.
CoRR, 2018

Adaptive-Latency DRAM: Reducing DRAM Latency by Exploiting Timing Margins.
CoRR, 2018

Experimental Characterization, Optimization, and Recovery of Data Retention Errors in MLC NAND Flash Memory.
CoRR, 2018

Decoupling GPU Programming Models from Resource Management for Enhanced Programming Ease, Portability, and Performance.
CoRR, 2018

Exploiting the DRAM Microarchitecture to Increase Memory-Level Parallelism.
CoRR, 2018

Reducing DRAM Refresh Overheads with Refresh-Access Parallelism.
CoRR, 2018

ECI-Cache: A High-Endurance and Cost-Efficient I/O Caching Scheme for Virtualized Platforms.
CoRR, 2018

Mosaic: An Application-Transparent Hardware-Software Cooperative Memory Manager for GPUs.
CoRR, 2018

High-Performance and Energy-Effcient Memory Scheduler Design for Heterogeneous Systems.
CoRR, 2018

A Memory Controller with Row Buffer Locality Awareness for Hybrid Memory Systems.
CoRR, 2018

Holistic Management of the GPGPU Memory Hierarchy to Manage Warp-level Latency Tolerance.
CoRR, 2018

Zorua: Enhancing Programming Ease, Portability, and Performance in GPUs by Decoupling Programming Models from Resource Management.
CoRR, 2018

Enabling the Adoption of Processing-in-Memory: Challenges, Mechanisms, Future Research Directions.
CoRR, 2018

Focus: Querying Large Video Datasets with Low Latency and Low Cost.
CoRR, 2018

Improving 3D NAND Flash Memory Lifetime by Tolerating Early Retention Loss and Process Variation.
Proceedings of the Abstracts of the 2018 ACM International Conference on Measurement and Modeling of Computer Systems, 2018

What Your DRAM Power Models Are Not Telling You: Lessons from a Detailed Experimental Study.
Proceedings of the Abstracts of the 2018 ACM International Conference on Measurement and Modeling of Computer Systems, 2018

ECI-Cache: A High-Endurance and Cost-Efficient I/O Caching Scheme for Virtualized Platforms.
Proceedings of the Abstracts of the 2018 ACM International Conference on Measurement and Modeling of Computer Systems, 2018

Focus: Querying Large Video Datasets with Low Latency and Low Cost.
Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation, 2018

A Case for Richer Cross-Layer Abstractions: Bridging the Semantic Gap with Expressive Memory.
Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture, 2018

The Locality Descriptor: A Holistic Cross-Layer Abstraction to Express Data Locality In GPUs.
Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture, 2018

FLIN: Enabling Fairness and Enhancing Performance in Modern NVMe Solid State Drives.
Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture, 2018

HICOMB Keynote 2.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium Workshops, 2018

A Large Scale Study of Data Center Network Reliability.
Proceedings of the Internet Measurement Conference 2018, 2018

Zwift: A Programming Framework for High Performance Text Analytics on Compressed Data.
Proceedings of the 32nd International Conference on Supercomputing, 2018

HeatWatch: Improving 3D NAND Flash Memory Device Reliability by Exploiting Self-Recovery and Temperature Awareness.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018

The DRAM Latency PUF: Quickly Evaluating Physical Unclonable Functions by Exploiting the Latency-Reliability Tradeoff in Modern Commodity DRAM Devices.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018

MQSim: A Framework for Enabling Realistic Studies of Modern Multi-Queue SSD Devices.
Proceedings of the 16th USENIX Conference on File and Storage Technologies, 2018

VRL-DRAM: improving DRAM performance via variable refresh latency.
Proceedings of the 55th Annual Design Automation Conference, 2018

LTRF: Enabling High-Capacity Register Files for GPUs via Hardware/Software Cooperative Register Prefetching.
Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems, 2018

SPECTR: Formal Supervisory Control and Coordination for Many-core Systems Resource Management.
Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems, 2018

Google Workloads for Consumer Devices: Mitigating Data Movement Bottlenecks.
Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems, 2018

Slim NoC: A Low-Diameter On-Chip Network Topology for High Energy Efficiency and Scalability.
Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems, 2018

MASK: Redesigning the GPU Memory Hierarchy to Support Multi-Application Concurrency.
Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems, 2018

2017
Design-Induced Latency Variation in Modern DRAM Chips: Characterization, Analysis, and Latency Reduction Mechanisms.
POMACS, 2017

Understanding Reduced-Voltage Operation in Modern DRAM Devices: Experimental Characterization, Analysis, and Mechanisms.
POMACS, 2017

Error Characterization, Mitigation, and Recovery in Flash-Memory-Based Solid-State Drives.
Proceedings of the IEEE, 2017

Improving DRAM Performance by Parallelizing Refreshes with Accesses.
CoRR, 2017

Errors in Flash-Memory-Based Solid-State Drives: Analysis, Mitigation, and Recovery.
CoRR, 2017

Improving Multi-Application Concurrency Support Within the GPU Memory System.
CoRR, 2017

Banshee: Bandwidth-Efficient DRAM Caching Via Software/Hardware Cooperation.
CoRR, 2017

The RowHammer Problem and Other Issues We May Face as Memory Becomes Denser.
CoRR, 2017

Using ECC DRAM to Adaptively Increase Memory Capacity.
CoRR, 2017

Improving the Performance and Endurance of Persistent Memory with Loose-Ordering Consistency.
CoRR, 2017

Understanding Reduced-Voltage Operation in Modern DRAM Chips: Characterization, Analysis, and Mechanisms.
CoRR, 2017

Error Characterization, Mitigation, and Recovery in Flash Memory Based Solid-State Drives.
CoRR, 2017

LazyPIM: Efficient Support for Cache Coherence in Processing-in-Memory Architectures.
CoRR, 2017

A Case for Memory Content-Based Detection and Mitigation of Data-Dependent Failures in DRAM.
Computer Architecture Letters, 2017

LazyPIM: An Efficient Cache Coherence Mechanism for Processing-in-Memory.
Computer Architecture Letters, 2017

GateKeeper: a new hardware architecture for accelerating pre-alignment in DNA short read mapping.
Bioinformatics, 2017

Chapter Four - Simple Operations in Memory to Reduce Data Movement.
Advances in Computers, 2017

Concurrent Data Structures for Near-Memory Computing.
Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures, 2017

Design-Induced Latency Variation in Modern DRAM Chips: Characterization, Analysis, and Latency Reduction Mechanisms.
Proceedings of the 2017 ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer Systems, Urbana-Champaign, IL, USA, June 05, 2017

Understanding Reduced-Voltage Operation in Modern DRAM Devices: Experimental Characterization, Analysis, and Mechanisms.
Proceedings of the 2017 ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer Systems, Urbana-Champaign, IL, USA, June 05, 2017

Gaia: Geo-Distributed Machine Learning Approaching LAN Speeds.
Proceedings of the 14th USENIX Symposium on Networked Systems Design and Implementation, 2017

Banshee: bandwidth-efficient DRAM caching via software/hardware cooperation.
Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

Ambit: in-memory accelerator for bulk bitwise operations using commodity DRAM technology.
Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

Detecting and mitigating data-dependent DRAM failures by exploiting current memory content.
Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

Mosaic: a GPU memory manager with application-transparent support for multiple page sizes.
Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

The Reach Profiler (REAPER): Enabling the Mitigation of DRAM Retention Failures via Profiling at Aggressive Conditions.
Proceedings of the 44th Annual International Symposium on Computer Architecture, 2017

Carpool: a bufferless on-chip network supporting adaptive multicast and hotspot alleviation.
Proceedings of the International Conference on Supercomputing, 2017

SoftMC: A Flexible and Practical Open-Source Infrastructure for Enabling Experimental DRAM Studies.
Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture, 2017

Vulnerabilities in MLC NAND Flash Memory Programming: Experimental Analysis, Exploits, and Mitigation Techniques.
Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture, 2017

FPGA-Accelerated Dense Linear Machine Learning: A Precision-Convergence Trade-Off.
Proceedings of the 25th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2017

The RowHammer problem and other issues we may face as memory becomes denser.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2017

Utility-Based Hybrid Memory Management.
Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017

2016
BLISS: Balancing Performance, Fairness and Complexity in Memory Access Scheduling.
IEEE Trans. Parallel Distrib. Syst., 2016

RFVP: Rollback-Free Value Prediction with Safe-to-Approximate Loads.
TACO, 2016

DASH: Deadline-Aware High-Performance Memory Scheduler for Heterogeneous Systems with Hardware Accelerators.
TACO, 2016

Simultaneous Multi-Layer Access: Improving 3D-Stacked Memory Bandwidth at Low Cost.
TACO, 2016

Bounding and reducing memory interference in COTS-based multi-core systems.
Real-Time Systems, 2016

A case for hierarchical rings with deflection routing: An energy-efficient on-chip communication substrate.
Parallel Computing, 2016

The 2014 MICRO Test of Time Award Winners: From 1978 to 1992.
IEEE Micro, 2016

Common Bonds: MIPS, HPS, Two-Level Branch Prediction, and Compressed Code RISC Processor.
IEEE Micro, 2016

Enabling Accurate and Practical Online Flash Channel Modeling for Modern MLC NAND Flash Memory.
IEEE Journal on Selected Areas in Communications, 2016

Mitigating the Memory Bottleneck With Approximate Load Value Prediction.
IEEE Design & Test, 2016

A Framework for Accelerating Bottlenecks in GPU Execution with Assist Warps.
CoRR, 2016

The Processing Using Memory Paradigm: In-DRAM Bulk Copy, Initialization, Bitwise AND and OR.
CoRR, 2016

Buddy-RAM: Improving the Performance and Efficiency of Bulk Bitwise Operations Using DRAM.
CoRR, 2016

Heterogeneous-Reliability Memory: Exploiting Application-Level Memory Error Tolerance.
CoRR, 2016

Tiered-Latency DRAM (TL-DRAM).
CoRR, 2016

Reducing DRAM Latency by Exploiting Design-Induced Latency Variation in Modern DRAM Chips.
CoRR, 2016

Adaptive-Latency DRAM (AL-DRAM).
CoRR, 2016

RowHammer: Reliability Analysis and Security Implications.
CoRR, 2016

Enabling Efficient Dynamic Resizing of Large DRAM Caches via A Hardware Consistent Hashing Mechanism.
CoRR, 2016

Reducing Performance Impact of DRAM Refresh by Parallelizing Refreshes with Accesses.
CoRR, 2016

Achieving both High Energy Efficiency and High Performance in On-Chip Communication using Hierarchical Rings with Deflection Routing.
CoRR, 2016

GateKeeper: Enabling Fast Pre-Alignment in DNA Short Read Mapping with a New Streaming Accelerator Architecture.
CoRR, 2016

Ramulator: A Fast and Extensible DRAM Simulator.
Computer Architecture Letters, 2016

Optimal seed solver: optimizing seed selection in read mapping.
Bioinformatics, 2016

Exploiting Core Criticality for Enhanced GPU Performance.
Proceedings of the 2016 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Science, 2016

Understanding Latency Variation in Modern DRAM Chips: Experimental Characterization, Analysis, and Optimization.
Proceedings of the 2016 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Science, 2016

Keynote: rethinking memory system design.
Proceedings of the 2016 International Symposium on Rapid System Prototyping, 2016

Yak: A High-Performance Big-Data-Friendly Garbage Collector.
Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation, 2016

NVMOVE: Helping Programmers Move to Byte-Based Persistence.
Proceedings of the 4th Workshop on Interactions of NVM/Flash with Operating Systems and Workloads, 2016

Zorua: A holistic approach to resource virtualization in GPUs.
Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016

Continuous runahead: Transparent hardware acceleration for memory intensive workloads.
Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016

Transparent Offloading and Mapping (TOM): Enabling Programmer-Transparent Near-Data Processing in GPU Systems.
Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture, 2016

Accelerating Dependent Cache Misses with an Enhanced Memory Controller.
Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture, 2016

A model for Application Slowdown Estimation in on-chip networks and its use for improving system fairness and performance.
Proceedings of the 34th IEEE International Conference on Computer Design, 2016

Accelerating pointer chasing in 3D-stacked memory: Challenges, mechanisms, evaluation.
Proceedings of the 34th IEEE International Conference on Computer Design, 2016

A case for toggle-aware compression for GPU systems.
Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture, 2016

SizeCap: Efficiently handling power surges in fuel cell powered data centers.
Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture, 2016

ChargeCache: Reducing DRAM latency by exploiting row access locality.
Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture, 2016

Low-Cost Inter-Linked Subarrays (LISA): Enabling fast inter-subarray data movement in DRAM.
Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture, 2016

PARBOR: An Efficient System-Level Technique to Detect Data-Dependent Failures in DRAM.
Proceedings of the 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2016

Invited - Who is the major threat to tomorrow's security?: you, the hardware designer.
Proceedings of the 53rd Annual Design Automation Conference, 2016

Scheduling Techniques for GPU Architectures with Processing-In-Memory Capabilities.
Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

μC-States: Fine-grained GPU Datapath Power Management.
Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

2015
High-Performance and Lightweight Transaction Support in Flash-Based SSDs.
IEEE Trans. Computers, 2015

Introducing the MICRO Test of Time Awards: Concept, Process, 2014 Winners, and the Future.
IEEE Micro, 2015

Optimal Seed Solver: Optimizing Seed Selection in Read Mapping.
CoRR, 2015

SQUASH: Simple QoS-Aware High-Performance Memory Scheduler for Heterogeneous Systems with Hardware Accelerators.
CoRR, 2015

The Blacklisting Memory Scheduler: Balancing Performance, Fairness and Complexity.
CoRR, 2015

Managing Hybrid Main Memories with a Page-Utility Driven Performance Model.
CoRR, 2015

Simultaneous Multi Layer Access: A High Bandwidth and Low Cost 3D-Stacked Memory Interface.
CoRR, 2015

Fast Bulk Bitwise AND and OR in DRAM.
Computer Architecture Letters, 2015

Toggle-Aware Compression for GPUs.
Computer Architecture Letters, 2015

Shifted Hamming distance: a fast and accurate SIMD-friendly filter to accelerate alignment verification in read mapping.
Bioinformatics, 2015

A-DRM: Architecture-aware Distributed Resource Management of Virtualized Clusters.
Proceedings of the 11th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, 2015

A Large-Scale Study of Flash Memory Failures in the Field.
Proceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, 2015

Rethinking memory system design for data-intensive computing.
Proceedings of the 2015 International Conference on Embedded Computer Systems: Architectures, 2015

A Low-Overhead, Fully-Distributed, Guaranteed-Delivery Routing Algorithm for Faulty Network-on-Chips.
Proceedings of the 9th International Symposium on Networks-on-Chip, 2015

WARM: Improving NAND flash memory lifetime with write-hotness aware retention management.
Proceedings of the IEEE 31st Symposium on Mass Storage Systems and Technologies, 2015

Amnesic cache management for non-volatile memory.
Proceedings of the IEEE 31st Symposium on Mass Storage Systems and Technologies, 2015

The application slowdown model: quantifying and controlling the impact of inter-application interference at shared caches and main memory.
Proceedings of the 48th International Symposium on Microarchitecture, 2015

Gather-scatter DRAM: in-DRAM address translation to improve the spatial locality of non-unit strided accesses.
Proceedings of the 48th International Symposium on Microarchitecture, 2015

ThyNVM: enabling software-transparent crash consistency in persistent memory systems.
Proceedings of the 48th International Symposium on Microarchitecture, 2015

Rethinking Memory System Design (along with Interconnects).
Proceedings of the 8th International Workshop on Network on Chip Architectures, 2015

Comparative evaluation of FPGA and ASIC implementations of bufferless and buffered routing algorithms for on-chip networks.
Proceedings of the Sixteenth International Symposium on Quality Electronic Design, 2015

A case for core-assisted bottleneck acceleration in GPUs: enabling flexible data compression with assist warps.
Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

Page overlays: an enhanced virtual memory framework to enable fine-grained memory management.
Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

PIM-enabled instructions: a low-overhead, locality-aware processing-in-memory architecture.
Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

A scalable processing-in-memory accelerator for parallel graph processing.
Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

Exploiting compressed block size as an indicator of future reuse.
Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015

Adaptive-latency DRAM: Optimizing DRAM timing for the common-case.
Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015

Data retention in MLC NAND flash memory: Characterization, optimization, and recovery.
Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015

AVATAR: A Variable-Retention-Time (VRT) Aware Refresh for DRAM Systems.
Proceedings of the 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2015

Revisiting Memory Errors in Large-Scale Production Data Centers: Analysis and Modeling of New Trends from the Field.
Proceedings of the 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2015

Read Disturb Errors in MLC NAND Flash Memory: Characterization, Mitigation, and Recovery.
Proceedings of the 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2015

Decoupled Direct Memory Access: Isolating CPU and IO Traffic by Leveraging a Dual-Data-Port DRAM.
Proceedings of the 2015 International Conference on Parallel Architecture and Compilation, 2015

Exploiting Inter-Warp Heterogeneity to Improve GPGPU Performance.
Proceedings of the 2015 International Conference on Parallel Architecture and Compilation, 2015

2014
Efficient Data Mapping and Buffering Techniques for Multilevel Cell Phase-Change Memories.
TACO, 2014

Mitigating Prefetcher-Caused Pollution Using Informed Caching Policies for Prefetched Blocks.
TACO, 2014

The efficacy of error mitigation techniques for DRAM retention failures: a comparative experimental study.
Proceedings of the ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer Systems, 2014

Neighbor-cell assisted error correction for MLC NAND flash memories.
Proceedings of the ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer Systems, 2014

Design and Evaluation of Hierarchical Rings with Deflection Routing.
Proceedings of the 26th IEEE International Symposium on Computer Architecture and High Performance Computing, 2014

Bounding memory interference delay in COTS-based multi-core systems.
Proceedings of the 20th IEEE Real-Time and Embedded Technology and Applications Symposium, 2014

FIRM: Fair and High-Performance Memory Control for Persistent Memory Systems.
Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014

Managing GPU Concurrency in Heterogeneous Architectures.
Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014

The Dirty-Block Index.
Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture, 2014

Flipping bits in memory without accessing them: An experimental study of DRAM disturbance errors.
Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture, 2014

The Blacklisting Memory Scheduler: Achieving high performance and fairness at low cost.
Proceedings of the 32nd IEEE International Conference on Computer Design, 2014

Loose-Ordering Consistency for persistent memory.
Proceedings of the 32nd IEEE International Conference on Computer Design, 2014

The heterogeneous block architecture.
Proceedings of the 32nd IEEE International Conference on Computer Design, 2014

Improving cache performance using read-write partitioning.
Proceedings of the 20th IEEE International Symposium on High Performance Computer Architecture, 2014

Improving DRAM performance by parallelizing refreshes with accesses.
Proceedings of the 20th IEEE International Symposium on High Performance Computer Architecture, 2014

Characterizing Application Memory Error Vulnerability to Optimize Datacenter Cost via Heterogeneous-Reliability Memory.
Proceedings of the 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2014

Rollback-free value prediction with approximate loads.
Proceedings of the International Conference on Parallel Architectures and Compilation, 2014

Warp-aware trace scheduling for GPUs.
Proceedings of the International Conference on Parallel Architectures and Compilation, 2014

Memory Systems.
Proceedings of the Computing Handbook, 2014

2013
Accelerating read mapping with FastHASH.
BMC Genomics, 2013

RowClone: fast and energy-efficient in-DRAM bulk data copy and initialization.
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, 2013

Linearly compressed pages: a low-complexity, low-latency main memory compression framework.
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, 2013

EMERALD: Characterization of emerging applications and algorithms for low-power devices.
Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software, 2013

Evaluating STT-RAM as an energy-efficient main memory alternative.
Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software, 2013

An experimental study of data retention behavior in modern DRAM devices: implications for retention time profiling mechanisms.
Proceedings of the 40th Annual International Symposium on Computer Architecture, 2013

Orchestrated scheduling and prefetching for GPGPUs.
Proceedings of the 40th Annual International Symposium on Computer Architecture, 2013

Utility-based acceleration of multithreaded applications on asymmetric CMPs.
Proceedings of the 40th Annual International Symposium on Computer Architecture, 2013

LightTx: A lightweight transactional design in flash-based SSDs to support flexible transactions.
Proceedings of the 2013 IEEE 31st International Conference on Computer Design, 2013

Program interference in MLC NAND flash memory: Characterization, modeling, and mitigation.
Proceedings of the 2013 IEEE 31st International Conference on Computer Design, 2013

MISE: Providing performance predictability and improving fairness in shared main memory systems.
Proceedings of the 19th IEEE International Symposium on High Performance Computer Architecture, 2013

Tiered-latency DRAM: A low latency and low cost DRAM architecture.
Proceedings of the 19th IEEE International Symposium on High Performance Computer Architecture, 2013

Application-to-core mapping policies to reduce memory system interference in multi-core systems.
Proceedings of the 19th IEEE International Symposium on High Performance Computer Architecture, 2013

Threshold voltage distribution in MLC NAND flash memory: characterization, analysis, and modeling.
Proceedings of the Design, Automation and Test in Europe, 2013

A heterogeneous multiple network-on-chip design: an application-aware approach.
Proceedings of the 50th Annual Design Automation Conference 2013, 2013

OWL: cooperative thread array aware scheduling techniques for improving GPGPU performance.
Proceedings of the Architectural Support for Programming Languages and Operating Systems, 2013

2012
Fairness via Source Throttling: A Configurable and High-Performance Fairness Substrate for Multicore Memory Systems.
ACM Trans. Comput. Syst., 2012

A QoS-Enabled On-Die Interconnect Fabric for Kilo-Node Chips.
IEEE Micro, 2012

Enabling Efficient and Scalable Hybrid Memories Using Fine-Granularity DRAM Cache Management.
Computer Architecture Letters, 2012

On-chip networks from a networking perspective: congestion and scalability in many-core interconnects.
Proceedings of the ACM SIGCOMM 2012 Conference, 2012

HAT: Heterogeneous Adaptive Throttling for On-Chip Networks.
Proceedings of the IEEE 24th International Symposium on Computer Architecture and High Performance Computing, 2012

MinBD: Minimally-Buffered Deflection Routing for Energy-Efficient Interconnect.
Proceedings of the 2012 Sixth IEEE/ACM International Symposium on Networks-on-Chip (NoCS), 2012

RAIDR: Retention-aware intelligent DRAM refresh.
Proceedings of the 39th International Symposium on Computer Architecture (ISCA 2012), 2012

A case for exploiting subarray-level parallelism (SALP) in DRAM.
Proceedings of the 39th International Symposium on Computer Architecture (ISCA 2012), 2012

Staged memory scheduling: Achieving high performance and scalability in heterogeneous systems.
Proceedings of the 39th International Symposium on Computer Architecture (ISCA 2012), 2012

Row buffer locality aware caching policies for hybrid memories.
Proceedings of the 30th International IEEE Conference on Computer Design, 2012

A case for small row buffers in non-volatile main memories.
Proceedings of the 30th International IEEE Conference on Computer Design, 2012

Flash correct-and-refresh: Retention-aware error management for increased flash memory lifetime.
Proceedings of the 30th International IEEE Conference on Computer Design, 2012

Error patterns in MLC NAND flash memory: Measurement, characterization, and analysis.
Proceedings of the 2012 Design, Automation & Test in Europe Conference & Exhibition, 2012

Bottleneck identification and scheduling in multithreaded applications.
Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems, 2012

The evicted-address filter: a unified mechanism to address both cache pollution and thrashing.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

Base-delta-immediate compression: practical data compression for on-chip caches.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

Linearly compressed pages: a main memory compression framework with low complexity and low latency.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

Application-aware prefetch prioritization in on-chip networks.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

Application-to-core mapping policies to reduce memory interference in multi-core systems.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

2011
Prefetch-Aware Memory Controllers.
IEEE Trans. Computers, 2011

Data Marshaling for Multicore Systems.
IEEE Micro, 2011

Top Picks [Guest editors' introduction].
IEEE Micro, 2011

Thread Cluster Memory Scheduling.
IEEE Micro, 2011

Aérgia: A Network-on-Chip Exploiting Packet Latency Slack.
IEEE Micro, 2011

FIST: A fast, lightweight, FPGA-friendly packet latency estimator for NoC modeling in full-system simulations.
Proceedings of the NOCS 2011, 2011

Improving GPU performance via large warps and two-level warp scheduling.
Proceedings of the 44rd Annual IEEE/ACM International Symposium on Microarchitecture, 2011

Reducing memory interference in multicore systems via application-aware memory channel partitioning.
Proceedings of the 44rd Annual IEEE/ACM International Symposium on Microarchitecture, 2011

Parallel application memory scheduling.
Proceedings of the 44rd Annual IEEE/ACM International Symposium on Microarchitecture, 2011

Memory systems in the many-core era: challenges, opportunities, and solution directions.
Proceedings of the 10th International Symposium on Memory Management, 2011

Kilo-NOC: a heterogeneous network-on-chip architecture for scalability and service guarantees.
Proceedings of the 38th International Symposium on Computer Architecture (ISCA 2011), 2011

Prefetch-aware shared resource management for multi-core systems.
Proceedings of the 38th International Symposium on Computer Architecture (ISCA 2011), 2011

Poster: revisiting virtual channel memory for performance and fairness on multi-core architecture.
Proceedings of the 25th International Conference on Supercomputing, 2011, Tucson, AZ, USA, May 31, 2011

Memory power management via dynamic voltage/frequency scaling.
Proceedings of the 8th International Conference on Autonomic Computing, 2011

CHIPPER: A low-complexity bufferless deflection router.
Proceedings of the 17th International Conference on High-Performance Computer Architecture (HPCA-17 2011), 2011

2010
Accelerating Critical Section Execution with Asymmetric Multicore Architectures.
IEEE Micro, 2010

Phase-Change Technology and the Future of Main Memory.
IEEE Micro, 2010

Phase change memory architecture and the quest for scalability.
Commun. ACM, 2010

Concurrent autonomous self-test for uncore components in system-on-chips.
Proceedings of the 28th IEEE VLSI Test Symposium, 2010

QuaLe: A Quantum-Leap Inspired Model for Non-stationary Analysis of NoC Traffic in Chip Multi-processors.
Proceedings of the NOCS 2010, 2010

Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior.
Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture, 2010

Data marshaling for multi-core architectures.
Proceedings of the 37th International Symposium on Computer Architecture (ISCA 2010), 2010

Topology-Aware Quality-of-Service Support in Highly Integrated Chip Multiprocessors.
Proceedings of the Computer Architecture, 2010

Aérgia: exploiting packet latency slack in on-chip networks.
Proceedings of the 37th International Symposium on Computer Architecture (ISCA 2010), 2010

ATLAS: A scalable and high-performance scheduling algorithm for multiple memory controllers.
Proceedings of the 16th International Conference on High-Performance Computer Architecture (HPCA-16 2010), 2010

Next generation on-chip networks: what kind of congestion control do we need?
Proceedings of the 9th ACM Workshop on Hot Topics in Networks. HotNets 2010, Monterey, CA, USA - October 20, 2010

Fairness via source throttling: a configurable and high-performance fairness substrate for multi-core memory systems.
Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems, 2010

Efficient runahead threads.
Proceedings of the 19th International Conference on Parallel Architecture and Compilation Techniques, 2010

2009
Virtual Program Counter (VPC) Prediction: Very Low Cost Indirect Branch Prediction Using Conditional Branch Prediction Hardware.
IEEE Trans. Computers, 2009

A Flexible Software-Based Framework for Online Detection of Hardware Defects.
IEEE Trans. Computers, 2009

Parallelism-Aware Batch Scheduling: Enabling High-Performance and Fair Shared Memory Controllers.
IEEE Micro, 2009

Improving memory bank-level parallelism in the presence of prefetching.
Proceedings of the 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009), 2009

Preemptive virtual clock: a flexible, efficient, and cost-effective QOS scheme for networks-on-chip.
Proceedings of the 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009), 2009

Coordinated control of multiple prefetchers in multi-core systems.
Proceedings of the 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009), 2009

Application-aware prioritization mechanisms for on-chip networks.
Proceedings of the 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009), 2009

A case for bufferless routing in on-chip networks.
Proceedings of the 36th International Symposium on Computer Architecture (ISCA 2009), 2009

Architecting phase change memory as a scalable dram alternative.
Proceedings of the 36th International Symposium on Computer Architecture (ISCA 2009), 2009

Flexible reference-counting-based hardware acceleration for garbage collection.
Proceedings of the 36th International Symposium on Computer Architecture (ISCA 2009), 2009

Operating system scheduling for efficient online self-test in robust systems.
Proceedings of the 2009 International Conference on Computer-Aided Design, 2009

Express Cube Topologies for on-Chip Interconnects.
Proceedings of the 15th International Conference on High-Performance Computer Architecture (HPCA-15 2009), 2009

Techniques for bandwidth-efficient prefetching of linked data structures in hybrid prefetching systems.
Proceedings of the 15th International Conference on High-Performance Computer Architecture (HPCA-15 2009), 2009

Accelerating critical section execution with asymmetric multi-core architectures.
Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems, 2009

2008
Guest Editors' Introduction: Interaction of Many-Core Computer Architecture and Operating Systems.
IEEE Micro, 2008

Dynamic Predication of Indirect Jumps.
Computer Architecture Letters, 2008

Distributed order scheduling and its application to multi-core dram controllers.
Proceedings of the Twenty-Seventh Annual ACM Symposium on Principles of Distributed Computing, 2008

Prefetch-Aware DRAM Controllers.
Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-41 2008), 2008

Online design bug detection: RTL analysis, flexible mechanisms, and evaluation.
Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-41 2008), 2008

Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems.
Proceedings of the 35th International Symposium on Computer Architecture (ISCA 2008), 2008

Self-Optimizing Memory Controllers: A Reinforcement Learning Approach.
Proceedings of the 35th International Symposium on Computer Architecture (ISCA 2008), 2008

Performance-aware speculation control using wrong path usefulness prediction.
Proceedings of the 14th International Conference on High-Performance Computer Architecture (HPCA-14 2008), 2008

Improving the performance of object-oriented languages with dynamic predication of indirect jumps.
Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems, 2008

2007
Diverge-Merge Processor: Generalized and Energy-Efficient Dynamic Predication.
IEEE Micro, 2007

Dynamic Predication of Indirect Jumps.
Computer Architecture Letters, 2007

Memory Performance Attacks: Denial of Memory Service in Multi-Core Systems.
Proceedings of the 16th USENIX Security Symposium, Boston, MA, USA, August 6-10, 2007, 2007

Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors.
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-40 2007), 2007

Software-Based Online Detection of Hardware Defects Mechanisms, Architectural Support, and Evaluation.
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-40 2007), 2007

VPC prediction: reducing the cost of indirect branches via hardware-based dynamic devirtualization.
Proceedings of the 34th International Symposium on Computer Architecture (ISCA 2007), 2007

Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers.
Proceedings of the 13st International Conference on High-Performance Computer Architecture (HPCA-13 2007), 2007

Profile-assisted Compiler Support for Dynamic Predication in Diverge-Merge Processors.
Proceedings of the Fifth International Symposium on Code Generation and Optimization (CGO 2007), 2007

2006
Address-Value Delta (AVD) Prediction: A Hardware Technique for Efficiently Parallelizing Dependent Cache Misses.
IEEE Trans. Computers, 2006

Efficient Runahead Execution: Power-Efficient Memory Latency Tolerance.
IEEE Micro, 2006

Wish Branches: Enabling Adaptive and Aggressive Predicated Execution.
IEEE Micro, 2006

Diverge-Merge Processor (DMP): Dynamic Predicated Execution of Complex Control-Flow Graphs Based on Frequently Executed Paths.
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-39 2006), 2006

A Case for MLP-Aware Cache Replacement.
Proceedings of the 33rd International Symposium on Computer Architecture (ISCA 2006), 2006

2D-Profiling: Detecting Input-Dependent Branches with a Single Input Data Set.
Proceedings of the Fourth IEEE/ACM International Symposium on Code Generation and Optimization (CGO 2006), 2006

2005
An Analysis of the Performance Impact of Wrong-Path Memory References on Out-of-Order and Runahead Execution Processors.
IEEE Trans. Computers, 2005

Using the First-Level Caches as Filters to Reduce the Pollution Caused by Speculative Memory References.
International Journal of Parallel Programming, 2005

On Reusing the Results of Pre-Executed Instructions in a Runahead Execution Processor.
Computer Architecture Letters, 2005

Address-Value Delta (AVD) Prediction: Increasing the Effectiveness of Runahead Execution by Exploiting Regular Memory Allocation Patterns.
Proceedings of the 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-38 2005), 2005

Wish Branches: Combining Conditional Branching and Predication for Adaptive Predicated Execution.
Proceedings of the 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-38 2005), 2005

Techniques for Efficient Processing in Runahead Execution Engines.
Proceedings of the 32st International Symposium on Computer Architecture (ISCA 2005), 2005

Microarchitecture-Based Introspection: A Technique for Transient-Fault Tolerance in Microprocessors.
Proceedings of the 2005 International Conference on Dependable Systems and Networks (DSN 2005), 28 June, 2005

2004
Understanding the effects of wrong-path memory references on processor performance.
Proceedings of the 3rd Workshop on Memory Performance Issues, 2004

Cache Filtering Techniques to Reduce the Negative Impact of Useless Speculative Memory References on Processor Performance.
Proceedings of the 16th Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2004), 2004

Wrong Path Events: Exploiting Unusual and Illegal Program Behavior for Early Misprediction Detection and Recovery.
Proceedings of the 37th Annual International Symposium on Microarchitecture (MICRO-37 2004), 2004

2003
Runahead Execution: An Effective Alternative to Large Instruction Windows.
IEEE Micro, 2003

Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors.
Proceedings of the Ninth International Symposium on High-Performance Computer Architecture (HPCA'03), 2003


  Loading...