Scott A. Mahlke

Kamalavasan Kamalakkannan

Michael Garland

Christos Kozyrakis

CoRR, August, 2025

DX100: Programmable Data Access Accelerator for Indirection.

[BibT_eX]

[DOI]

Alireza Khadem

Zhenyan Zhu

Akash Poptani

Yufeng Gu

Jered Benjamin Dominguez-Trujillo

Proceedings of the 52nd Annual International Symposium on Computer Architecture, 2025

Characterizing Adaptive Mesh Refinement on Heterogeneous Platforms with Parthenon-VIBE.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Workload Characterization, 2025

Multi-Dimensional Vector ISA Extension for Mobile In-Cache Computing.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2025

2024

LSM-GNN: Large-scale Storage-based Multi-GPU GNN Training by Optimizing Data Transfer Scheme.

[BibT_eX]

[DOI]

Jeongmin Brian Park

Kun Wu

Vikram Sharma Mailthody

Zaid Qureshi

Sanjay Sri Vallabh Singapuram

Wen-Mei W. Hwu

CoRR, 2024

SlimSLAM: An Adaptive Runtime for Visual-Inertial Simultaneous Localization and Mapping.

[BibT_eX]

[DOI]

Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

2023

BitSET: Bit-Serial Early Termination for Computation Reduction in Convolutional Neural Networks.

[BibT_eX]

[DOI]

ACM Trans. Embed. Comput. Syst., October, 2023

Vector-Processing for Mobile Devices: Benchmark and Analysis.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Workload Characterization, 2023

2022

Multi-Layer In-Memory Processing.

[BibT_eX]

[DOI]

Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022

AVMaestro: A Centralized Policy Enforcement Framework for Safe Autonomous-driving Environments.

[BibT_eX]

[DOI]

Ze Zhang

Proceedings of the 2022 IEEE Intelligent Vehicles Symposium, 2022

SoftFusion: A Low-Cost Approach to Enhance Reliability of Object Detection Applications.

[BibT_eX]

[DOI]

Salar Latifi

Babak Zamirai

Proceedings of the IEEE 40th International Conference on Computer Design, 2022

SRTuner: Effective Compiler Optimization Customization by Exposing Synergistic Relations.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2022

Loner: utilizing the CPU vector datapath to process scalar integer data.

[BibT_eX]

[DOI]

Armand Behroozi

Proceedings of the CC '22: 31st ACM SIGPLAN International Conference on Compiler Construction, Seoul, South Korea, April 2, 2022

2021

A Systematic Framework to Identify Violations of Scenario-dependent Driving Rules in Autonomous Vehicle Software.

[BibT_eX]

[DOI]

Proceedings of the SIGMETRICS '21: ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer Systems, 2021

Prodigy: Improving the Memory Latency of Data-Indirect Irregular Workloads Using Hardware-Software Co-Design.

[BibT_eX]

[DOI]

Christos Vasiladiotis

Michael F. P. O'Boyle

Ronald G. Dreslinski

Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2021

2020

Path Sensitive Signatures for Control Flow Error Detection.

[BibT_eX]

[DOI]

Ze Zhang

Sanjay Sri Vallabh Singapuram

Proceedings of the 21st ACM SIGPLAN/SIGBED International Conference on Languages, 2020

Automatic Feature Isolation in Network Protocol Software Implementations.

[BibT_eX]

[DOI]

Ze Zhang

Qingzhao Zhang

Brandon Nguyen

Z. Morley Mao

Proceedings of the 2020 ACM Workshop on Forming an Ecosystem Around Software Transformation, 2020

AVGuardian: Detecting and Mitigating Publish-Subscribe Overprivilege for Autonomous Vehicle Systems.

[BibT_eX]

[DOI]

Proceedings of the IEEE European Symposium on Security and Privacy, 2020

PolygraphMR: Enhancing the Reliability and Dependability of CNNs.

[BibT_eX]

[DOI]

Salar Latifi

Babak Zamirai

Proceedings of the 50th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2020

SIEVE: Speculative Inference on the Edge with Versatile Exportation.

[BibT_eX]

[DOI]

Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020

Low-cost prediction-based fault protection strategy.

[BibT_eX]

[DOI]

Proceedings of the CGO '20: 18th ACM/IEEE International Symposium on Code Generation and Optimization, 2020

2019

TF-Net: Deploying Sub-Byte Deep Neural Networks on Microcontrollers.

[BibT_eX]

[DOI]

ACM Trans. Embed. Comput. Syst., 2019

Multi-objective Exploration for Practical Optimization Decisions in Binary Translation.

[BibT_eX]

[DOI]

ACM Trans. Embed. Comput. Syst., 2019

Characterization of Unnecessary Computations in Web Applications.

[BibT_eX]

[DOI]

Hossein Golestani

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2019

Duality cache for data parallel acceleration.

[BibT_eX]

[DOI]

Daichi Fujiki

Reetuparna Das

Proceedings of the 46th International Symposium on Computer Architecture, 2019

POSTER: Pairing Up CNNs for High Throughput Deep Learning.

[BibT_eX]

[DOI]

Babak Zamirai

Salar Latifi

Proceedings of the 28th International Conference on Parallel Architectures and Compilation Techniques, 2019

2018

Scratch That (But Cache This): A Hybrid Register Cache/Scratchpad for GPUs.

[BibT_eX]

[DOI]

Jonathan Bailey

John Kloosterman

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2018

Iterative Modulo Scheduling.

[BibT_eX]

[DOI]

IEEE Micro, 2018

Rethinking Numerical Representations for Deep Neural Networks.

[BibT_eX]

[DOI]

Marios C. Papaefthymiou

CoRR, 2018

Sculptor: Flexible Approximation with Selective Dynamic Loop Perforation.

[BibT_eX]

[DOI]

Shikai Li

Proceedings of the 32nd International Conference on Supercomputing, 2018

Low Cost Transient Fault Protection Using Loop Output Prediction.

[BibT_eX]

[DOI]

Shikai Li

Proceedings of the 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops, 2018

In-Memory Data Parallel Processor.

[BibT_eX]

[DOI]

Daichi Fujiki

Reetuparna Das

Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems, 2018

2017

Mirage cores: the illusion of many out-of-order cores using in-order hardware.

[BibT_eX]

[DOI]

Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

Regless: just-in-time operand staging for GPUs.

[BibT_eX]

[DOI]

John Kloosterman

Jonathan Beaumont

Jonathan Bailey

Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

DeftNN: addressing bottlenecks for DNN execution on GPUs via synapse vector elimination and near-compute data fission.

[BibT_eX]

[DOI]

Michael A. Laurenzano

Lingjia Tang

Jason Mars

Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

Scalpel: Customizing DNN Pruning to the Underlying Hardware Parallelism.

[BibT_eX]

[DOI]

Proceedings of the 44th Annual International Symposium on Computer Architecture, 2017

Dynamic Resource Management for Efficient Utilization of Multitasking GPUs.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, 2017

In-memory Data Flow Processor.

[BibT_eX]

[DOI]

Daichi Fujiki

Reetuparna Das

Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques, 2017

2016

Exploring Fine-Grained Heterogeneity with Composite Cores.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2016

Quality Control for Approximate Accelerators by Error Prediction.

[BibT_eX]

[DOI]

IEEE Des. Test, 2016

A bypass first policy for energy-efficient last level caches.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Embedded Computer Systems: Architectures, 2016

Input responsiveness: using canary inputs to dynamically steer approximation.

[BibT_eX]

[DOI]

Michael A. Laurenzano

Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2016

Concise loads and stores: The case for an asymmetric compute-memory architecture for approximation.

[BibT_eX]

[DOI]

Michael A. Laurenzano

Lingjia Tang

Jason Mars

Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016

BugMD: automatic mismatch diagnosis for bug triaging.

[BibT_eX]

[DOI]

Proceedings of the 35th International Conference on Computer-Aided Design, 2016

2015

Using Graphics Processing Units in an LTE Base Station.

[BibT_eX]

[DOI]

J. Signal Process. Syst., 2015

SKMD: Single Kernel on Multiple Devices for Transparent CPU-GPU Collaboration.

[BibT_eX]

[DOI]

ACM Trans. Comput. Syst., 2015

Tango: Accelerating Mobile Applications through Flip-Flop Replication.

[BibT_eX]

[DOI]

GetMobile Mob. Comput. Commun., 2015

ELF: maximizing memory-level parallelism for GPUs with coordinated warp and fetch scheduling.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2015

Accelerating Mobile Applications through Flip-Flop Replication.

[BibT_eX]

[DOI]

Proceedings of the 13th Annual International Conference on Mobile Systems, 2015

DynaMOS: dynamic schedule migration for heterogeneous cores.

[BibT_eX]

[DOI]

Proceedings of the 48th International Symposium on Microarchitecture, 2015

WarpPool: sharing requests with inter-warp coalescing for throughput processors.

[BibT_eX]

[DOI]

Proceedings of the 48th International Symposium on Microarchitecture, 2015

Rumba: an online quality management system for approximate computing.

[BibT_eX]

[DOI]

Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

Accelerating asynchronous programs through event sneak peek.

[BibT_eX]

[DOI]

Gaurav Chadha

Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

Mascar: Speeding up GPU warps by reducing memory pitstops.

[BibT_eX]

[DOI]

Ankit Sethia

Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015

Chimera: Collaborative Preemption for Multitasking on a Shared GPU.

[BibT_eX]

[DOI]

Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, 2015

Fine Grain Cache Partitioning Using Per-Instruction Working Blocks.

[BibT_eX]

[DOI]

Proceedings of the 2015 International Conference on Parallel Architectures and Compilation, 2015

Orchestrating Multiple Data-Parallel Kernels on Multiple Devices.

[BibT_eX]

[DOI]

Proceedings of the 2015 International Conference on Parallel Architectures and Compilation, 2015

2014

Scaling Performance via Self-Tuning Approximation for Graphics Engines.

[BibT_eX]

[DOI]

ACM Trans. Comput. Syst., 2014

Leveraging GPUs using cooperative loop speculation.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2014

Equalizer: Dynamic Tuning of GPU Resources for Efficient Execution.

[BibT_eX]

[DOI]

Ankit Sethia

Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014

Harnessing Soft Computations for Low-Budget Fault Tolerance.

[BibT_eX]

[DOI]

Daya Shanker Khudia

Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014

Embracing heterogeneity with dynamic core boosting.

[BibT_eX]

[DOI]

Hyoun Kyu Cho

Proceedings of the Computing Frontiers Conference, CF'14, 2014

Paraprox: pattern-based approximation for data parallel applications.

[BibT_eX]

[DOI]

Proceedings of the Architectural Support for Programming Languages and Operating Systems, 2014

Heterogeneous microarchitectures trump voltage scaling for low-power cores.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Parallel Architectures and Compilation, 2014

VAST: the illusion of a large memory space for GPUs.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Parallel Architectures and Compilation, 2014

D<sup>2</sup>MA: accelerating coarse-grained data transfer for GPUs.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Parallel Architectures and Compilation, 2014

EFetch: optimizing instruction fetch for event-driven webapplications.

[BibT_eX]

[DOI]

Gaurav Chadha

Proceedings of the International Conference on Parallel Architectures and Compilation, 2014

2013

Eliminating Concurrency Bugs in Multithreaded Software: A New Approach Based on Discrete-Event Control.

[BibT_eX]

[DOI]

IEEE Trans. Control. Syst. Technol., 2013

Optimal Liveness-Enforcing Control for a Class of Petri Nets Arising in Multithreaded Software.

[BibT_eX]

[DOI]

IEEE Trans. Autom. Control., 2013

Concurrency bugs in multithreaded software: modeling and analysis using Petri nets.

[BibT_eX]

[DOI]

Discret. Event Dyn. Syst., 2013

Architecting an LTE base station with graphics processing units.

[BibT_eX]

[DOI]

Proceedings of the IEEE Workshop on Signal Processing Systems, 2013

SAGE: self-tuning approximation for graphics engines.

[BibT_eX]

[DOI]

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, 2013

Trace based phase prediction for tightly-coupled heterogeneous cores.

[BibT_eX]

[DOI]

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, 2013

Low cost control flow protection using abstract control signatures.

[BibT_eX]

[DOI]

Daya Shanker Khudia

Proceedings of the SIGPLAN/SIGBED Conference on Languages, 2013

Parallelization techniques for implementing trellis algorithms on graphics processors.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE International Symposium on Circuits and Systems (ISCAS2013), 2013

WiBench: An open source kernel suite for benchmarking wireless systems.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Workload Characterization, 2013

Illusionist: Transforming lightweight cores into aggressive cores on demand.

[BibT_eX]

[DOI]

Proceedings of the 19th IEEE International Symposium on High Performance Computer Architecture, 2013

Efficient execution of augmented reality applications on mobile programmable accelerators.

[BibT_eX]

[DOI]

Proceedings of the 2013 International Conference on Field-Programmable Technology, 2013

Instant profiling: Instrumentation sampling for profiling datacenter applications.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization, 2013

Practical lock/unlock pairing for concurrent programs.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization, 2013

APOGEE: Adaptive prefetching on GPUs for energy efficiency.

[BibT_eX]

[DOI]

Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, 2013

Transparent CPU-GPU collaboration for data-parallel kernels on heterogeneous systems.

[BibT_eX]

[DOI]

Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, 2013

2012

A Customized Processor for Energy Efficient Scientific Computing.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2012

Adaptive input-aware compilation for graphics engines.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, 2012

COMET: Code Offload by Migrating Execution Transparently.

[BibT_eX]

[DOI]

Mark S. Gordon

Zhuoqing Morley Mao

Xu Chen

Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation, 2012

Libra: Tailoring SIMD Execution Using Heterogeneous Hardware and Dynamic Configurability.

[BibT_eX]

[DOI]

Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture, 2012

Composite Cores: Pushing Heterogeneity Into a Core.

[BibT_eX]

[DOI]

Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture, 2012

Dynamic acceleration of multithreaded program critical paths in near-threshold systems.

[BibT_eX]

[DOI]

Hyoun Kyu Cho

Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture, 2012

Efficient soft error protection for commodity embedded microprocessors using profile information.

[BibT_eX]

[DOI]

Daya Shanker Khudia

Griffin Wright

Proceedings of the SIGPLAN/SIGBED Conference on Languages, 2012

Efficient performance scaling of future CGRAs for mobile applications.

[BibT_eX]

[DOI]

Proceedings of the 2012 International Conference on Field-Programmable Technology, 2012

Process variation in near-threshold wide SIMD architectures.

[BibT_eX]

[DOI]

Proceedings of the 49th Annual Design Automation Conference 2012, 2012

Runtime asynchronous fault tolerance via speculation.

[BibT_eX]

[DOI]

Proceedings of the 10th Annual IEEE/ACM International Symposium on Code Generation and Optimization, 2012

Automatic speculative DOALL for clusters.

[BibT_eX]

[DOI]

Proceedings of the 10th Annual IEEE/ACM International Symposium on Code Generation and Optimization, 2012

When less is more (LIMO): controlled parallelism forimproved efficiency.

[BibT_eX]

[DOI]

Gaurav Chadha

Proceedings of the 15th International Conference on Compilers, 2012

Paragon: collaborative speculative loop execution on GPU and CPU.

[BibT_eX]

[DOI]

Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units, 2012

SIMD defragmenter: efficient ILP realization on data-parallel architectures.

[BibT_eX]

[DOI]

Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems, 2012

2011

Analyzing the Next Generation Software Defined Radio for Future Architectures.

[BibT_eX]

[DOI]

J. Signal Process. Syst., 2011

StageNet: A Reconfigurable Fabric for Constructing Dependable CMPs.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2011

Maximizing Spare Utilization by Virtually Reorganizing Faulty Cache Lines.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2011

Bundled execution of recurring traces for energy-efficient general purpose processing.

[BibT_eX]

[DOI]

Proceedings of the 44rd Annual IEEE/ACM International Symposium on Microarchitecture, 2011

Encore: low-cost, fine-grained transient fault recovery.

[BibT_eX]

[DOI]

Proceedings of the 44rd Annual IEEE/ACM International Symposium on Microarchitecture, 2011

Dynamic parallelization of JavaScript applications using an ultra-lightweight speculation mechanism.

[BibT_eX]

[DOI]

Proceedings of the 17th International Conference on High-Performance Computer Architecture (HPCA-17 2011), 2011

Archipelago: A polymorphic cache design for enabling robust near-threshold operation.

[BibT_eX]

[DOI]

Proceedings of the 17th International Conference on High-Performance Computer Architecture (HPCA-17 2011), 2011

Dynamically accelerating client-side web applications through decoupled execution.

[BibT_eX]

[DOI]

Mojtaba Mehrara

Proceedings of the CGO 2011, 2011

Deadlock-avoidance control of multithreaded software: An efficient siphon-based algorithm for Gadara petri nets.

[BibT_eX]

[DOI]

Proceedings of the 50th IEEE Conference on Decision and Control and European Control Conference, 2011

Sponge: portable stream programming on graphics engines.

[BibT_eX]

[DOI]

Proceedings of the 16th International Conference on Architectural Support for Programming Languages and Operating Systems, 2011

PEPSC: A Power-Efficient Processor for Scientific Computing.

[BibT_eX]

[DOI]

Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011

2010

Putting Faulty Cores to Work.

[BibT_eX]

[DOI]

IEEE Micro, 2010

Mobile Supercomputers for the Next-Generation Cell Phone.

[BibT_eX]

[DOI]

Computer, 2010

Supervisory control of software execution for failure avoidance: Experience from the Gadara project.

[BibT_eX]

[DOI]

Proceedings of the 10th International Workshop on Discrete Event Systems, 2010

Erasing Core Boundaries for Robust and Configurable Performance.

[BibT_eX]

[DOI]

Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture, 2010

Diet SODA: a power-efficient processor for digital cameras.

[BibT_eX]

[DOI]

Proceedings of the 2010 International Symposium on Low Power Electronics and Design, 2010

Necromancer: enhancing system throughput by animating dead cores.

[BibT_eX]

[DOI]

Proceedings of the 37th International Symposium on Computer Architecture (ISCA 2010), 2010

Maestro: Orchestrating Lifetime Reliability in Chip Multiprocessors.

[BibT_eX]

[DOI]

Proceedings of the High Performance Embedded Architectures and Compilers, 2010

StageWeb: Interweaving pipeline stages into a wearout and variation tolerant CMP fabric.

[BibT_eX]

[DOI]

Proceedings of the 2010 IEEE/IFIP International Conference on Dependable Systems and Networks, 2010

Compilation techniques for CGRAs: exploring all parallelization approaches.

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Hardware/Software Codesign and System Synthesis, 2010

Synthesis of maximally-permissive liveness-enforcing control policies for Gadara petri nets.

[BibT_eX]

[DOI]

Proceedings of the 49th IEEE Conference on Decision and Control, 2010

Resource recycling: putting idle resources to work on a composable accelerator.

[BibT_eX]

[DOI]

Proceedings of the 2010 International Conference on Compilers, 2010

Mighty-morphing power-SIMD.

[BibT_eX]

[DOI]

Proceedings of the 2010 International Conference on Compilers, 2010

MacroSS: macro-SIMDization of streaming applications.

[BibT_eX]

[DOI]

Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems, 2010

Shoestring: probabilistic soft error reliability on the cheap.

[BibT_eX]

[DOI]

Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems, 2010

CoreGenesis: erasing core boundaries for robust and configurable performance.

[BibT_eX]

[DOI]

Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, 2010

MEDICS: ultra-portable processing for medical image reconstruction.

[BibT_eX]

[DOI]

Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, 2010

2009

Multicore compilation strategies and challenges.

[BibT_eX]

[DOI]

IEEE Signal Process. Mag., 2009

Eliminating Concurrency Bugs with Control Engineering.

[BibT_eX]

[DOI]

Computer, 2009

A dataflow-centric approach to design low power control paths in CGRAs.

[BibT_eX]

[DOI]

Hyunchul Park

Proceedings of the IEEE 7th Symposium on Application Specific Processors, 2009

Power-efficient medical image processing using PUMA.

[BibT_eX]

[DOI]

Ganesh S. Dasika

Proceedings of the IEEE 7th Symposium on Application Specific Processors, 2009

Parade: A versatile parallel architecture for accelerating pulse train clustering.

[BibT_eX]

[DOI]

Amin Ansari

Dan Zhang

Proceedings of the IEEE 7th Symposium on Application Specific Processors, 2009

Customizing wide-SIMD architectures for H.264.

[BibT_eX]

[DOI]

Proceedings of the 2009 International Conference on Embedded Computer Systems: Architectures, 2009

The theory of deadlock avoidance via discrete control.

[BibT_eX]

[DOI]

Proceedings of the 36th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 2009

Parallelizing sequential applications on commodity hardware using a low-cost software transactional memory.

[BibT_eX]

[DOI]

Proceedings of the 2009 ACM SIGPLAN Conference on Programming Language Design and Implementation, 2009

Polymorphic pipeline array: a flexible multicore accelerator with virtualized execution for mobile multimedia applications.

[BibT_eX]

[DOI]

Hyunchul Park

Proceedings of the 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009), 2009

ZerehCache: armoring cache architectures in high defect density technologies.

[BibT_eX]

[DOI]

Proceedings of the 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009), 2009

Recurrence cycle aware modulo scheduling for coarse-grained reconfigurable architectures.

[BibT_eX]

[DOI]

Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, 2009

Enabling ultra low voltage system operation by tolerating on-chip cache failures.

[BibT_eX]

[DOI]

Proceedings of the 2009 International Symposium on Low Power Electronics and Design, 2009

AnySP: anytime anywhere anyway signal processing.

[BibT_eX]

[DOI]

Proceedings of the 36th International Symposium on Computer Architecture (ISCA 2009), 2009

Adaptive online testing for efficient hard fault detection.

[BibT_eX]

[DOI]

Proceedings of the 27th International Conference on Computer Design, 2009

Bridging the computation gap between programmable processors and hardwired accelerators.

[BibT_eX]

[DOI]

Proceedings of the 15th International Conference on High-Performance Computer Architecture (HPCA-15 2009), 2009

Stream Compilation for Real-Time Embedded Multicore Systems.

[BibT_eX]

[DOI]

Proceedings of the CGO 2009, 2009

Gadara nets: Modeling and analyzing lock allocation for deadlock avoidance in multithreaded software.

[BibT_eX]

[DOI]

Proceedings of the 48th IEEE Conference on Decision and Control, 2009

CGRA express: accelerating execution using dynamic operation fusion.

[BibT_eX]

[DOI]

Hyunchul Park

Proceedings of the 2009 International Conference on Compilers, 2009

Maximally permissive deadlock avoidance for multithreaded computer programs (Extended abstract).

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Automation Science and Engineering, 2009

Flextream: Adaptive Compilation of Streaming Applications for Heterogeneous Architectures.

[BibT_eX]

[DOI]

Proceedings of the PACT 2009, 2009

2008

Reliable Systems on Unreliable Fabrics.

[BibT_eX]

[DOI]

IEEE Des. Test Comput., 2008

A parameterized dataflow language extension for embedded streaming systems.

[BibT_eX]

[DOI]

Proceedings of the 2008 International Conference on Embedded Computer Systems: Architectures, 2008

Orchestrating the execution of stream programs on multicore platforms.

[BibT_eX]

[DOI]

Manjunath Kudlur

Proceedings of the ACM SIGPLAN 2008 Conference on Programming Language Design and Implementation, 2008

Gadara: Dynamic Deadlock Avoidance for Multithreaded Programs.

[BibT_eX]

[DOI]

Proceedings of the 8th USENIX Symposium on Operating Systems Design and Implementation, 2008

From SODA to scotch: The evolution of a wireless baseband processor.

[BibT_eX]

[DOI]

Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-41 2008), 2008

The StageNet fabric for constructing resilient multicore systems.

[BibT_eX]

[DOI]

Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-41 2008), 2008

VEAL: Virtualized Execution Accelerator for Loops.

[BibT_eX]

[DOI]

Proceedings of the 35th International Symposium on Computer Architecture (ISCA 2008), 2008

Analyzing the scalability of SIMD for the next generation software defined radio.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2008

Uncovering hidden loop level parallelism in sequential applications.

[BibT_eX]

[DOI]

Proceedings of the 14th International Conference on High-Performance Computer Architecture (HPCA-14 2008), 2008

DVFS in loop accelerators using BLADES.

[BibT_eX]

[DOI]

Proceedings of the 45th Design Automation Conference, 2008

Modulo scheduling for highly customized datapaths to increase hardware reusability.

[BibT_eX]

[DOI]

Proceedings of the Sixth International Symposium on Code Generation and Optimization (CGO 2008), 2008

Optimus: efficient realization of streaming applications on FPGAs.

[BibT_eX]

[DOI]

Proceedings of the 2008 International Conference on Compilers, 2008

StageNetSlice: a reconfigurable microarchitecture building block for resilient CMP systems.

[BibT_eX]

[DOI]

Proceedings of the 2008 International Conference on Compilers, 2008

Edge-centric modulo scheduling for coarse-grained reconfigurable architectures.

[BibT_eX]

[DOI]

Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, 2008

2007

Architecting a reliable CMP switch architecture.

[BibT_eX]

[DOI]

Kypros Constantinides

ACM Trans. Archit. Code Optim., 2007

SODA: A High-Performance DSP Architecture for Software-Defined Radio.

[BibT_eX]

[DOI]

IEEE Micro, 2007

Reliability: Fallacy or Reality?

[BibT_eX]

[DOI]

IEEE Micro, 2007

The Next Generation Challenge for Software Defined Radio.

[BibT_eX]

[DOI]

Proceedings of the Embedded Computer Systems: Architectures, 2007

Data Access Partitioning for Fine-grain Parallelism on Multicore Architectures.

[BibT_eX]

[DOI]

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-40 2007), 2007

Self-calibrating Online Wearout Detection.

[BibT_eX]

[DOI]

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-40 2007), 2007

Compiler-managed partitioned data caches for low power.

[BibT_eX]

[DOI]

Proceedings of the 2007 ACM SIGPLAN/SIGBED Conference on Languages, 2007

Code and data partitioning for fine-grain parallelism.

[BibT_eX]

[DOI]

Proceedings of the 2007 ACM SIGPLAN/SIGBED Conference on Languages, 2007

Extending Multicore Architectures to Exploit Hybrid Parallelism in Single-thread Applications.

[BibT_eX]

[DOI]

Steven A. Lieberman

Proceedings of the 13st International Conference on High-Performance Computer Architecture (HPCA-13 2007), 2007

Liquid SIMD: Abstracting SIMD Hardware using Lightweight Dynamic Mapping.

[BibT_eX]

[DOI]

Proceedings of the 13st International Conference on High-Performance Computer Architecture (HPCA-13 2007), 2007

Exploiting Narrow Accelerators with Data-Centric Subgraph Mapping.

[BibT_eX]

[DOI]

Proceedings of the Fifth International Symposium on Code Generation and Optimization (CGO 2007), 2007

Hierarchical coarse-grained stream compilation for software defined radio.

[BibT_eX]

[DOI]

Proceedings of the 2007 International Conference on Compilers, 2007

2006

Design and Implementation of Turbo Decoders for Software Defined Radio.

[BibT_eX]

[DOI]

Proceedings of the IEEE Workshop on Signal Processing Systems, 2006

SODA: A Low-power Architecture For Software Radio.

[BibT_eX]

[DOI]

Proceedings of the 33rd International Symposium on Computer Architecture (ISCA 2006), 2006

BulletProof: a defect-tolerant CMP switch architecture.

[BibT_eX]

[DOI]

Kypros Constantinides

Proceedings of the 12th International Symposium on High-Performance Computer Architecture, 2006

Streamroller: : automatic synthesis of prescribed throughput accelerator pipelines.

[BibT_eX]

[DOI]

Manjunath Kudlur

Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis, 2006

Increasing hardware efficiency with multifunction loop accelerators.

[BibT_eX]

[DOI]

Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis, 2006

Compiler-directed Data Partitioning for Multicluster Processors.

[BibT_eX]

[DOI]

Proceedings of the Fourth IEEE/ACM International Symposium on Code Generation and Optimization (CGO 2006), 2006

Modulo graph embedding: mapping applications onto coarse-grained reconfigurable architectures.

[BibT_eX]

[DOI]

Proceedings of the 2006 International Conference on Compilers, 2006

Scalable subgraph mapping for acyclic computation accelerators.

[BibT_eX]

[DOI]

Proceedings of the 2006 International Conference on Compilers, 2006

Cost-efficient soft error protection for embedded microprocessors.

[BibT_eX]

[DOI]

Proceedings of the 2006 International Conference on Compilers, 2006

2005

Partitioning Variables across Register Windows to Reduce Spill Code in a Low-Power Processor.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2005

Automated Custom Instruction Generation for Domain-Specific Processor Acceleration.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2005

Cost Sensitive Modulo Scheduling in a Loop Accelerator Synthesis System.

[BibT_eX]

[DOI]

Proceedings of the 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-38 2005), 2005

An Architecture Framework for Transparent Instruction Set Customization in Embedded Processors.

[BibT_eX]

[DOI]

Proceedings of the 32st International Symposium on Computer Architecture (ISCA 2005), 2005

Software Defined Radio - A High Performance Embedded Challenge.

[BibT_eX]

[DOI]

Proceedings of the High Performance Embedded Architectures and Compilers, 2005

Compiler Managed Dynamic Instruction Placement in a Low-Power Code Cache.

[BibT_eX]

[DOI]

Pracheeti D. Nagarkar

Proceedings of the 3nd IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2005), 2005

Exploring the design space of LUT-based transparent accelerators.

[BibT_eX]

[DOI]

Proceedings of the 2005 International Conference on Compilers, 2005

A Distributed Control Path Architecture for VLIW Processors.

[BibT_eX]

[DOI]

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques (PACT 2005), 2005

2004

Cost-Sensitive Partitioning in an Architecture Synthesis System for Multicluster Processors.

[BibT_eX]

[DOI]

IEEE Micro, 2004

Mobile Supercomputers.

[BibT_eX]

[DOI]

Computer, 2004

Application-Specific Processing on a General-Purpose Core via Transparent Instruction Set Customization.

[BibT_eX]

[DOI]

Proceedings of the 37th Annual International Symposium on Microarchitecture (MICRO-37 2004), 2004

Trimaran: An Infrastructure for Research in Instruction-Level Parallelism.

[BibT_eX]

[DOI]

Lakshmi N. Chakrapani

Proceedings of the Languages and Compilers for High Performance Computing, 2004

Memory system design space exploration for low-power, real-time speech recognition.

[BibT_eX]

[DOI]

Rajeev Krishna

Todd M. Austin

Proceedings of the 2nd IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, 2004

Probabilistic Predicate-Aware Modulo Scheduling.

[BibT_eX]

[DOI]

Mikhail Smelyanskiy

Edward S. Davidson

Proceedings of the 2nd IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2004), 2004

FLASH: Foresighted Latency-Aware Scheduling Heuristic for Processors with Customized Datapaths.

[BibT_eX]

[DOI]

Proceedings of the 2nd IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2004), 2004

Automatic Synthesis of Customized Local Memories for Multicluster Application Accelerators.

[BibT_eX]

[DOI]

Proceedings of the 15th IEEE International Conference on Application-Specific Systems, 2004

2003

Automatic Design of Application Specific Instruction Set Extensions Through Dataflow Graph Exploration.

[BibT_eX]

[DOI]

Int. J. Parallel Program., 2003

Region-based hierarchical operation partitioning for multicluster processors.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGPLAN 2003 Conference on Programming Language Design and Implementation 2003, 2003

Processor Acceleration Through Automated Instruction Set Customization.

[BibT_eX]

[DOI]

Proceedings of the 36th Annual International Symposium on Microarchitecture, 2003

Predicate-Aware Scheduling: A Technique for Reducing Resource Constraints.

[BibT_eX]

[DOI]

Proceedings of the 1st IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2003), 2003

Increasing the number of effective registers in a low-power processor using a windowed register file.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Compilers, 2003

Architectural optimizations for low-power, real-time speech recognition.

[BibT_eX]

[DOI]

Rajeev Krishna

Todd M. Austin

Proceedings of the International Conference on Compilers, 2003

Systematic Register Bypass Customization for Application-Specific Processors.

[BibT_eX]

[DOI]

Proceedings of the 14th IEEE International Conference on Application-Specific Systems, 2003

2002

PICO-NPA: High-Level Synthesis of Nonprogrammable Hardware Accelerators.

[BibT_eX]

[DOI]

J. VLSI Signal Process., 2002

2001

Bitwidth cognizant architecture synthesis of custom hardwareaccelerators.

[BibT_eX]

[DOI]

Robert Schreiber

Timothy Sherwood

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2001

2000

Code size minimization and retargetable assembly for custom EPIC and VLIW instruction formats.

[BibT_eX]

[DOI]

Shail Aditya

B. Ramakrishna Rau

ACM Trans. Design Autom. Electr. Syst., 2000

High-Level Synthesis of Nonprogrammable Hardware Accelerators.

[BibT_eX]

[DOI]

Proceedings of the 12th IEEE International Conference on Application-Specific Systems, 2000

1999

The Partial Reverse If-Conversion Framework for Balancing Control Flow and Predication.

[BibT_eX]

[DOI]

David I. August

Int. J. Parallel Program., 1999

Control CPR: A Branch Height Reduction Optimization for EPIC Architectures.

[BibT_eX]

[DOI]

Richard Johnson

Proceedings of the 1999 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 1999

Automatic and Efficient Evaluation of Memory Hierarchies for Embedded Systems.

[BibT_eX]

[DOI]

Santosh G. Abraham

Proceedings of the 32nd Annual IEEE/ACM International Symposium on Microarchitecture, 1999

The Program Decision Logic Approach to Predicated Execution.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual International Symposium on Computer Architecture, 1999

1998

IMPACT: An Architectural Framework for Multiple-Instruction-Issue Processors.

[BibT_eX]

[DOI]

Proceedings of the 25 Years of the International Symposia on Computer Architecture (Selected Papers)., 1998

Integrated Predicated and Speculative Execution in the IMPACT EPIC Architecture.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual International Symposium on Computer Architecture, 1998

1997

Exploiting Instruction Level Parallelism in the Presence of Conditional Branches

[BibT_eX]

[DOI]

PhD thesis, 1997

A Framework for Balancing Control Flow and Predication.

[BibT_eX]

[DOI]

David I. August

Proceedings of the Thirtieth Annual IEEE/ACM International Symposium on Microarchitecture, 1997

1996

Compiler Synthesized Dynamic Branch Prediction.

[BibT_eX]

[DOI]

Balas K. Natarajan

Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture, 1996

1995

Three Architecutral Models for Compiler-Controlled Speculative Execution.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 1995

The Importance of Prepass Code Scheduling for Superscalar and Superpipelined Processors.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 1995

Compiler technology for future microprocessors.

[BibT_eX]

[DOI]

Proc. IEEE, 1995

A Comparison of Full and Partial Predicated Execution Support for ILP Processors.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual International Symposium on Computer Architecture, 1995

A study of the effects of compiler-controlled speculation on instruction and data caches.

[BibT_eX]

[DOI]

Roger A. Bringmann

Proceedings of the 28th Annual Hawaii International Conference on System Sciences (HICSS-28), 1995

1994

Profile-assisted instruction scheduling.

[BibT_eX]

[DOI]

Int. J. Parallel Program., 1994

Characterizing the impact of predicated execution on branch prediction.

[BibT_eX]

[DOI]

Proceedings of the 27th Annual International Symposium on Microarchitecture, San Jose, California, USA, November 30, 1994

Dynamic Memory Disambiguation Using the Memory Conflict Buffer.

[BibT_eX]

[DOI]

Proceedings of the ASPLOS-VI Proceedings, 1994

1993

Sentinel Scheduling for VLIW and Superscalar Processors.

[BibT_eX]

[DOI]

ACM Trans. Comput. Syst., 1993

The superblock: An effective technique for VLIW and superscalar compilation.

[BibT_eX]

[DOI]

J. Supercomput., 1993

Reverse If-Conversion.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGPLAN'93 Conference on Programming Language Design and Implementation (PLDI), 1993

Superblock formation using static program analysis.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual International Symposium on Microarchitecture, 1993

Speculative execution exception recovery using write-back suppression.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual International Symposium on Microarchitecture, 1993

[BibT_eX]

[DOI]

Proceedings of the 20th Annual International Symposium on Computer Architecture, 1993

1992

Profile-guided Automatic Inline Expansion for C Programs.

[BibT_eX]

[DOI]

Softw. Pract. Exp., 1992

Compiler Code Transformations for Superscalar-Based High Performance Systems.

[BibT_eX]

[DOI]

Proceedings of the Proceedings Supercomputing '92, 1992

Effective compiler support for predicated execution using the hyperblock.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual International Symposium on Microarchitecture, 1992

An efficient architecture for loop based data preloading.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual International Symposium on Microarchitecture, 1992

Using Profile Information to Assist Advaced Compiler Optimization and Scheduling.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 1992

Tolerating data access latency with register preloading.

[BibT_eX]

[DOI]

Proceedings of the 6th international conference on Supercomputing, 1992

Tolerating First Level Memory Access Latency in High-Performance Systems.

[BibT_eX]

William Y. Chen

Proceedings of the 1992 International Conference on Parallel Processing, 1992

Sentinel Scheduling for VLIW and Superscalar Processors.

[BibT_eX]

[DOI]

Proceedings of the ASPLOS-V Proceedings, 1992

1991

Using Profile Information to Assist Classic Code Optimizations.

[BibT_eX]

[DOI]

Pohua P. Chang