Huiyang Zhou

CoRR, 2024

Maximum Likelihood Quantum Error Mitigation for Algorithms with a Single Correct Output.

[BibT_eX]

[DOI]

Dror Baron

Hrushikesh Pramod Patil

CoRR, 2024

SEFsim: A Statistically-Guided Fast DRAM Simulator.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2024

QuTracer: Mitigating Quantum Gate and Measurement Errors by Tracing Subsets of Qubits.

[BibT_eX]

[DOI]

Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024

Tetris: A Compilation Framework for VQA Applications in Quantum Computing.

[BibT_eX]

[DOI]

Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024

Salus: Efficient Security Support for CXL-Expanded GPU Memory.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2024

2023

Deep learning based data prefetching in CPU-GPU unified virtual memory.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., April, 2023

An Intelligent Framework for Oversubscription Management in CPU-GPU Unified Memory.

[BibT_eX]

[DOI]

J. Grid Comput., March, 2023

Dynamic Runtime Assertions in Quantum Ternary Systems.

[BibT_eX]

[DOI]

Ehsan Faghih

CoRR, 2023

Folding-Free ZNE: A Comprehensive Quantum Zero-Noise Extrapolation Approach for Mitigating Depolarizing and Decoherence Noise.

[BibT_eX]

[DOI]

Hrushikesh Pramod Patil

Peiyi Li

Proceedings of the IEEE International Conference on Quantum Computing and Engineering, 2023

PBVR: Physically Based Rendering in Virtual Reality.

[BibT_eX]

[DOI]

Yavuz Selim Tozlu

Proceedings of the IEEE International Symposium on Workload Characterization, 2023

Enhancing Virtual Distillation with Circuit Cutting for Quantum Error Mitigation.

[BibT_eX]

[DOI]

Peiyi Li

Hrushikesh Pramod Patil

Paul D. Hovland

Proceedings of the 41st IEEE International Conference on Computer Design, 2023

SecPB: Architectures for Secure Non-Volatile Memory with Battery-Backed Persist Buffers.

[BibT_eX]

[DOI]

Alexander Freij

Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2023

Plutus: Bandwidth-Efficient Memory Security for GPUs.

[BibT_eX]

[DOI]

Rahaf Abdullah

Amro Awad

Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2023

2022

A Survey of GPU Multitasking Methods Supported by Hardware Architecture.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2022

An Intelligent Framework for Oversubscription Management in CPU-GPU Unified Memory.

[BibT_eX]

[DOI]

Xinjian Long

Xiangyang Gong

CoRR, 2022

Deep Learning based Data Prefetching in CPU-GPU Unified Virtual Memory.

[BibT_eX]

[DOI]

Xinjian Long

Xiangyang Gong

CoRR, 2022

LITE: a low-cost practical inter-operable GPU TEE.

[BibT_eX]

[DOI]

Proceedings of the ICS '22: 2022 International Conference on Supercomputing, Virtual Event, June 28, 2022

Exploiting Quantum Assertions for Error Mitigation and Quantum Program Debugging.

[BibT_eX]

[DOI]

Proceedings of the IEEE 40th International Conference on Computer Design, 2022

Adaptive Security Support for Heterogeneous Memory on GPUs.

[BibT_eX]

[DOI]

Shougang Yuan

Amro Awad

Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022

Not All SWAPs Have the Same Cost: A Case for Optimization-Aware Qubit Routing.

[BibT_eX]

[DOI]

Peiyi Li

Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022

2021

Bonsai Merkle Forests: Efficiently Achieving Crash Consistency in Secure Persistent Memory.

[BibT_eX]

[DOI]

Alexander Freij

Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

Analyzing Secure Memory Architecture for GPUs.

[BibT_eX]

[DOI]

Shougang Yuan

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2021

Message from the Program Chairs.

[BibT_eX]

[DOI]

Tim Rogers

Proceedings of the IEEE International Symposium on Workload Characterization, 2021

PSSM: achieving secure memory for GPUs with partitioned and sectored security metadata.

[BibT_eX]

[DOI]

Shougang Yuan

Proceedings of the ICS '21: 2021 International Conference on Supercomputing, 2021

Systematic Approaches for Precise and Approximate Quantum State Runtime Assertion.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2021

PILOT: a Runtime System to Manage Multi-tenant GPU Unified Memory Footprint.

[BibT_eX]

[DOI]

Proceedings of the 28th IEEE International Conference on High Performance Computing, 2021

Relaxed Peephole Optimization: A Novel Compiler Optimization for Quantum Circuits.

[BibT_eX]

[DOI]

Luciano Bello

Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2021

2020

Fair and cache blocking aware warp scheduling for concurrent kernel execution on GPU.

[BibT_eX]

[DOI]

Future Gener. Comput. Syst., 2020

Streamlining Integrity Tree Updates for Secure Persistent Non-Volatile Memory.

[BibT_eX]

[DOI]

CoRR, 2020

Exploring Convolution Neural Network for Branch Prediction.

[BibT_eX]

[DOI]

IEEE Access, 2020

LARQ: Learning to Ask and Rewrite Questions for Community Question Answering.

[BibT_eX]

[DOI]

Proceedings of the Natural Language Processing and Chinese Computing, 2020

Persist Level Parallelism: Streamlining Integrity Tree Updates for Secure Persistent Memory.

[BibT_eX]

[DOI]

Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020

Scalable and Fast Lazy Persistency on GPUs.

[BibT_eX]

[DOI]

Keiji Kimura

Proceedings of the IEEE International Symposium on Workload Characterization, 2020

Reliability Modeling of NISQ- Era Quantum Computers.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Workload Characterization, 2020

MKPipe: a compiler framework for optimizing multi-kernel workloads in OpenCL for FPGA.

[BibT_eX]

[DOI]

Proceedings of the ICS '20: 2020 International Conference on Supercomputing, 2020

Quantum Circuits for Dynamic Runtime Assertions in Quantum Computation.

[BibT_eX]

[DOI]

Gregory T. Byrd

Proceedings of the ASPLOS '20: Architectural Support for Programming Languages and Operating Systems, 2020

2019

Coordinated CTA Combination and Bandwidth Partitioning for GPU Concurrent Kernel Execution.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2019

Quantum Circuits for Dynamic Runtime Assertions in Quantum Computation.

[BibT_eX]

[DOI]

Gregory T. Byrd

IEEE Comput. Archit. Lett., 2019

In-Place Zero-Space Memory Protection for CNN.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Scatter-and-Gather Revisited: High-Performance Side-Channel-Resistant AES on GPUs.

[BibT_eX]

[DOI]

Zhen Lin

Utkarsh Mathur

Proceedings of the 12th Workshop on General Purpose Processing Using GPUs, 2019

Exploring Memory Persistency Models for GPUs.

[BibT_eX]

[DOI]

Proceedings of the 28th International Conference on Parallel Architectures and Compilation Techniques, 2019

2018

GPU Performance vs. Thread-Level Parallelism: Scalability Analysis and a Novel Way to Improve TLP.

[BibT_eX]

[DOI]

Zhen Lin

Michael Mantor

ACM Trans. Archit. Code Optim., 2018

Accelerate GPU Concurrent Kernel Execution by Mitigating Memory Pipeline Stalls.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018

2017

EffiSha: A Software Framework for Enabling Effficient Preemptive Scheduling of GPU.

[BibT_eX]

[DOI]

Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2017

Developing Dynamic Profiling and Debugging Support in OpenCL for FPGAs.

[BibT_eX]

[DOI]

Proceedings of the 54th Annual Design Automation Conference, 2017

POSTER: Accelerate GPU Concurrent Kernel Execution by Mitigating Memory Pipeline Stalls.

[BibT_eX]

[DOI]

Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques, 2017

2016

A Cross-Platform SpMV Framework on Many-Core Architectures.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2016

Enabling efficient preemption for SIMT architectures with lightweight context switching.

[BibT_eX]

[DOI]

Zhen Lin

Lars Nyland

Proceedings of the International Conference for High Performance Computing, 2016

Optimizing memory efficiency for deep convolutional neural networks on GPUs.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2016

Selectively GPU Cache Bypassing for Un-Coalesced Loads.

[BibT_eX]

[DOI]

Proceedings of the 22nd IEEE International Conference on Parallel and Distributed Systems, 2016

Tuning Stencil codes in OpenCL for FPGAs.

[BibT_eX]

[DOI]

Qi Jia

Proceedings of the 34th IEEE International Conference on Computer Design, 2016

A model-driven approach to warp/thread-block level GPU cache bypassing.

[BibT_eX]

[DOI]

Proceedings of the 53rd Annual Design Automation Conference, 2016

OpenCL-based erasure coding on heterogeneous architectures.

[BibT_eX]

[DOI]

Proceedings of the 27th IEEE International Conference on Application-specific Systems, 2016

2015

CUDA-NP: Realizing Nested Thread-Level Parallelism in GPGPU Applications.

[BibT_eX]

[DOI]

Chao Li

J. Comput. Sci. Technol., 2015

Analyzing graphics processor unit (GPU) instruction set architectures.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Symposium on Performance Analysis of Systems and Software, 2015

Locality-Driven Dynamic GPU Cache Bypassing.

[BibT_eX]

[DOI]

Siva Kumar Sastry Hari

Proceedings of the 29th ACM on International Conference on Supercomputing, 2015

Spatial Locality-Aware Cache Partitioning for Effective Cache Sharing.

[BibT_eX]

[DOI]

Saurabh Gupta

Proceedings of the 44th International Conference on Parallel Processing, 2015

Computing in 3D.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE Custom Integrated Circuits Conference, 2015

Automatic data placement into GPU on-chip memory resources.

[BibT_eX]

[DOI]

Proceedings of the 13th Annual IEEE/ACM International Symposium on Code Generation and Optimization, 2015

Revisiting ILP Designs for Throughput-Oriented GPGPU Architecture.

[BibT_eX]

[DOI]

Proceedings of the 15th IEEE/ACM International Symposium on Cluster, 2015

Computing in 3D.

[BibT_eX]

[DOI]

Proceedings of the 2015 International 3D Systems Integration Conference, 2015

2014

RACB: Resource Aware Cache Bypass on GPUs.

[BibT_eX]

[DOI]

Proceedings of the 26th IEEE International Symposium on Computer Architecture and High Performance Computing Workshop, 2014

CUDA-NP: realizing nested thread-level parallelism in GPGPU applications.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2014

yaSpMV: yet another SpMV framework on GPUs.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2014

Understanding the tradeoffs between software-managed vs. hardware-managed caches in GPUs.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE International Symposium on Performance Analysis of Systems and Software, 2014

A Case for a Flexible Scalar Unit in SIMT Architecture.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

Warp-level divergence in GPUs: Characterization, impact, and mitigation.

[BibT_eX]

[DOI]

Ping Xiang

Proceedings of the 20th IEEE International Symposium on High Performance Computer Architecture, 2014

3D-enabled customizable embedded computer (3DECC).

[BibT_eX]

[DOI]

Proceedings of the 2014 International 3D Systems Integration Conference, 2014

A Highly Efficient FFT Using Shared-Memory Multiplexing.

[BibT_eX]

[DOI]

Proceedings of the Numerical Computations with GPUs, 2014

2013

Architecting against Software Cache-Based Side-Channel Attacks.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2013

Locality principle revisited: A probability-based quantitative approach.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 2013

The Implementation of a High Performance GPGPU Compiler.

[BibT_eX]

[DOI]

Int. J. Parallel Program., 2013

Analyzing locality of memory references in GPU architectures.

[BibT_eX]

[DOI]

Saurabh Gupta

Ping Xiang

Proceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness, 2013

Adaptive Cache Bypassing for Inclusive Last Level Caches.

[BibT_eX]

[DOI]

Saurabh Gupta

Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

Exploiting uniform vector instructions for GPGPU performance, energy efficiency, and opportunistic reliability enhancement.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Supercomputing, 2013

2012

A unified optimizing compiler framework for different GPGPU architectures.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2012

Fixing Performance Bugs: An Empirical Study of Open-Source GPGPU Programs.

[BibT_eX]

[DOI]

Proceedings of the 41st International Conference on Parallel Processing, 2012

CPU-assisted GPGPU on fused CPU-GPU architectures.

[BibT_eX]

[DOI]

Proceedings of the 18th IEEE International Symposium on High Performance Computer Architecture, 2012

Shared memory multiplexing: a novel way to improve GPGPU throughput.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

Many-thread aware instruction-level parallelism: architecting shader cores for GPU computing.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

2011

Combining Local and Global History for High Performance Data Prefetching.

[BibT_eX]

[DOI]

J. Instr. Level Parallelism, 2011

Time-Ordered Event Traces: A New Debugging Primitive for Concurrency Bugs.

[BibT_eX]

[DOI]

Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

2010

An optimizing compiler for GPGPU programs with input-data sharing.

[BibT_eX]

[DOI]

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2010

A GPGPU compiler for memory optimization and parallelism management.

[BibT_eX]

[DOI]

Proceedings of the 2010 ACM SIGPLAN Conference on Programming Language Design and Implementation, 2010

Improving privacy and lifetime of PCM-based main memory.

[BibT_eX]

[DOI]

Jingfei Kong

Proceedings of the 2010 IEEE/IFIP International Conference on Dependable Systems and Networks, 2010

Accelerating MATLAB Image Processing Toolbox functions on GPUs.

[BibT_eX]

[DOI]

Proceedings of 3rd Workshop on General Purpose Processing on Graphics Processing Units, 2010

2009

Hardware-software integrated approaches to defend against software cache-based side channel attacks.

[BibT_eX]

[DOI]

Proceedings of the 15th International Conference on High-Performance Computer Architecture (HPCA-15 2009), 2009

Anomaly-based bug prediction, isolation, and validation: an automated approach for software debugging.

[BibT_eX]

[DOI]

Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems, 2009

Understanding software approaches for GPGPU reliability.

[BibT_eX]

[DOI]

Mike Mantor

Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units, 2009

2008

Address-branch correlation: A novel locality for long-latency hard-to-predict branches.

[BibT_eX]

[DOI]

Proceedings of the 14th International Conference on High-Performance Computer Architecture (HPCA-14 2008), 2008

Deconstructing new cache designs for thwarting software cache-based side channel attacks.

[BibT_eX]

[DOI]

Proceedings of the 2nd ACM Workshop on Computer Security Architecture, 2008

2007

Optimizing Dual-Core Execution for Power Efficiency and Transient-Fault Recovery.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2007

PMPM: Prediction by Combining Multiple Partial Matches.

[BibT_eX]

[DOI]

J. Instr. Level Parallelism, 2007

Unified Architectural Support for Soft-Error Protection or Software Bug Detection.

[BibT_eX]

[DOI]

Proceedings of the 16th International Conference on Parallel Architectures and Compilation Techniques (PACT 2007), 2007

2006

Using Indexing Functions to Reduce Conflict Aliasing in Branch Prediction Tables.

[BibT_eX]

[DOI]

Yi Ma

IEEE Trans. Computers, 2006

A case for fault tolerance and performance enhancement using chip multi-processors.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2006

Efficient Transient-Fault Tolerance for Multithreaded Processors using Dual-Thread Execution.

[BibT_eX]

[DOI]

Yi Ma

Proceedings of the 24th International Conference on Computer Design (ICCD 2006), 2006

Improving software security via runtime instruction-level taint checking.

[BibT_eX]

[DOI]

Jingfei Kong

Cliff Changchun Zou

Proceedings of the 1st Workshop on Architectural and System Support for Improving Software Dependability, 2006

2005

Enhancing Memory-Level Parallelism via Recovery-Free Value Prediction.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2005

Adaptive Information Processing: An Effective Way to Improve Perceptron Predictors.

[BibT_eX]

[DOI]

J. Instr. Level Parallelism, 2005

Dual-Core Execution: Building a Highly Scalable Single-Thread Instruction Window.

[BibT_eX]

[DOI]

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques (PACT 2005), 2005

2003

Adaptive mode control: A static-power-efficient cache design.

[BibT_eX]

[DOI]

ACM Trans. Embed. Comput. Syst., 2003

Detecting Global Stride Locality in Value Streams.

[BibT_eX]

[DOI]

Jill Flanagan

Proceedings of the 30th International Symposium on Computer Architecture (ISCA 2003), 2003

2002

Code Size Efficiency in Global Scheduling for ILP Processors.

[BibT_eX]

[DOI]

Proceedings of the 6th Annual Workshop on Interaction between Compilers and Computer Architecture (INTERACT-6 2002), 2002

2001

Tree Traversal Scheduling: A Global Instruction Scheduling Technique for VLIW/EPIC Processors.

[BibT_eX]

[DOI]

Matthew D. Jennings