Huiyang Zhou

Orcid: 0000-0003-2133-0722

According to our database1, Huiyang Zhou authored at least 102 papers between 2001 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Maximum Likelihood Quantum Error Mitigation for Algorithms with a Single Correct Output.
CoRR, 2024

Salus: Efficient Security Support for CXL-Expanded GPU Memory.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2024

2023
Deep learning based data prefetching in CPU-GPU unified virtual memory.
J. Parallel Distributed Comput., April, 2023

An Intelligent Framework for Oversubscription Management in CPU-GPU Unified Memory.
J. Grid Comput., March, 2023

Dynamic Runtime Assertions in Quantum Ternary Systems.
CoRR, 2023

Folding-Free ZNE: A Comprehensive Quantum Zero-Noise Extrapolation Approach for Mitigating Depolarizing and Decoherence Noise.
Proceedings of the IEEE International Conference on Quantum Computing and Engineering, 2023

PBVR: Physically Based Rendering in Virtual Reality.
Proceedings of the IEEE International Symposium on Workload Characterization, 2023

Enhancing Virtual Distillation with Circuit Cutting for Quantum Error Mitigation.
Proceedings of the 41st IEEE International Conference on Computer Design, 2023

SecPB: Architectures for Secure Non-Volatile Memory with Battery-Backed Persist Buffers.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2023

Plutus: Bandwidth-Efficient Memory Security for GPUs.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2023

2022
A Survey of GPU Multitasking Methods Supported by Hardware Architecture.
IEEE Trans. Parallel Distributed Syst., 2022

An Intelligent Framework for Oversubscription Management in CPU-GPU Unified Memory.
CoRR, 2022

Deep Learning based Data Prefetching in CPU-GPU Unified Virtual Memory.
CoRR, 2022

LITE: a low-cost practical inter-operable GPU TEE.
Proceedings of the ICS '22: 2022 International Conference on Supercomputing, Virtual Event, June 28, 2022

Exploiting Quantum Assertions for Error Mitigation and Quantum Program Debugging.
Proceedings of the IEEE 40th International Conference on Computer Design, 2022

Adaptive Security Support for Heterogeneous Memory on GPUs.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022

Not All SWAPs Have the Same Cost: A Case for Optimization-Aware Qubit Routing.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022

2021
Bonsai Merkle Forests: Efficiently Achieving Crash Consistency in Secure Persistent Memory.
Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

Analyzing Secure Memory Architecture for GPUs.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2021

Message from the Program Chairs.
Proceedings of the IEEE International Symposium on Workload Characterization, 2021

PSSM: achieving secure memory for GPUs with partitioned and sectored security metadata.
Proceedings of the ICS '21: 2021 International Conference on Supercomputing, 2021

Systematic Approaches for Precise and Approximate Quantum State Runtime Assertion.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2021

PILOT: a Runtime System to Manage Multi-tenant GPU Unified Memory Footprint.
Proceedings of the 28th IEEE International Conference on High Performance Computing, 2021

Relaxed Peephole Optimization: A Novel Compiler Optimization for Quantum Circuits.
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2021

2020
Fair and cache blocking aware warp scheduling for concurrent kernel execution on GPU.
Future Gener. Comput. Syst., 2020

Streamlining Integrity Tree Updates for Secure Persistent Non-Volatile Memory.
CoRR, 2020

Exploring Convolution Neural Network for Branch Prediction.
IEEE Access, 2020

LARQ: Learning to Ask and Rewrite Questions for Community Question Answering.
Proceedings of the Natural Language Processing and Chinese Computing, 2020

Persist Level Parallelism: Streamlining Integrity Tree Updates for Secure Persistent Memory.
Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020

Scalable and Fast Lazy Persistency on GPUs.
Proceedings of the IEEE International Symposium on Workload Characterization, 2020

Reliability Modeling of NISQ- Era Quantum Computers.
Proceedings of the IEEE International Symposium on Workload Characterization, 2020

MKPipe: a compiler framework for optimizing multi-kernel workloads in OpenCL for FPGA.
Proceedings of the ICS '20: 2020 International Conference on Supercomputing, 2020

Quantum Circuits for Dynamic Runtime Assertions in Quantum Computation.
Proceedings of the ASPLOS '20: Architectural Support for Programming Languages and Operating Systems, 2020

2019
Coordinated CTA Combination and Bandwidth Partitioning for GPU Concurrent Kernel Execution.
ACM Trans. Archit. Code Optim., 2019

Quantum Circuits for Dynamic Runtime Assertions in Quantum Computation.
IEEE Comput. Archit. Lett., 2019

In-Place Zero-Space Memory Protection for CNN.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Scatter-and-Gather Revisited: High-Performance Side-Channel-Resistant AES on GPUs.
Proceedings of the 12th Workshop on General Purpose Processing Using GPUs, 2019

Exploring Memory Persistency Models for GPUs.
Proceedings of the 28th International Conference on Parallel Architectures and Compilation Techniques, 2019

2018
GPU Performance vs. Thread-Level Parallelism: Scalability Analysis and a Novel Way to Improve TLP.
ACM Trans. Archit. Code Optim., 2018

Accelerate GPU Concurrent Kernel Execution by Mitigating Memory Pipeline Stalls.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018

2017
EffiSha: A Software Framework for Enabling Effficient Preemptive Scheduling of GPU.
Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2017

Developing Dynamic Profiling and Debugging Support in OpenCL for FPGAs.
Proceedings of the 54th Annual Design Automation Conference, 2017

POSTER: Accelerate GPU Concurrent Kernel Execution by Mitigating Memory Pipeline Stalls.
Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques, 2017

2016
A Cross-Platform SpMV Framework on Many-Core Architectures.
ACM Trans. Archit. Code Optim., 2016

Enabling efficient preemption for SIMT architectures with lightweight context switching.
Proceedings of the International Conference for High Performance Computing, 2016

Optimizing memory efficiency for deep convolutional neural networks on GPUs.
Proceedings of the International Conference for High Performance Computing, 2016

Selectively GPU Cache Bypassing for Un-Coalesced Loads.
Proceedings of the 22nd IEEE International Conference on Parallel and Distributed Systems, 2016

Tuning Stencil codes in OpenCL for FPGAs.
Proceedings of the 34th IEEE International Conference on Computer Design, 2016

A model-driven approach to warp/thread-block level GPU cache bypassing.
Proceedings of the 53rd Annual Design Automation Conference, 2016

OpenCL-based erasure coding on heterogeneous architectures.
Proceedings of the 27th IEEE International Conference on Application-specific Systems, 2016

2015
CUDA-NP: Realizing Nested Thread-Level Parallelism in GPGPU Applications.
J. Comput. Sci. Technol., 2015

Analyzing graphics processor unit (GPU) instruction set architectures.
Proceedings of the 2015 IEEE International Symposium on Performance Analysis of Systems and Software, 2015

Locality-Driven Dynamic GPU Cache Bypassing.
Proceedings of the 29th ACM on International Conference on Supercomputing, 2015

Spatial Locality-Aware Cache Partitioning for Effective Cache Sharing.
Proceedings of the 44th International Conference on Parallel Processing, 2015

Computing in 3D.
Proceedings of the 2015 IEEE Custom Integrated Circuits Conference, 2015

Automatic data placement into GPU on-chip memory resources.
Proceedings of the 13th Annual IEEE/ACM International Symposium on Code Generation and Optimization, 2015

Revisiting ILP Designs for Throughput-Oriented GPGPU Architecture.
Proceedings of the 15th IEEE/ACM International Symposium on Cluster, 2015

Computing in 3D.
Proceedings of the 2015 International 3D Systems Integration Conference, 2015

2014
RACB: Resource Aware Cache Bypass on GPUs.
Proceedings of the 26th IEEE International Symposium on Computer Architecture and High Performance Computing Workshop, 2014

CUDA-NP: realizing nested thread-level parallelism in GPGPU applications.
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2014

yaSpMV: yet another SpMV framework on GPUs.
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2014

Understanding the tradeoffs between software-managed vs. hardware-managed caches in GPUs.
Proceedings of the 2014 IEEE International Symposium on Performance Analysis of Systems and Software, 2014

A Case for a Flexible Scalar Unit in SIMT Architecture.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

Warp-level divergence in GPUs: Characterization, impact, and mitigation.
Proceedings of the 20th IEEE International Symposium on High Performance Computer Architecture, 2014

3D-enabled customizable embedded computer (3DECC).
Proceedings of the 2014 International 3D Systems Integration Conference, 2014

A Highly Efficient FFT Using Shared-Memory Multiplexing.
Proceedings of the Numerical Computations with GPUs, 2014

2013
Architecting against Software Cache-Based Side-Channel Attacks.
IEEE Trans. Computers, 2013

Locality principle revisited: A probability-based quantitative approach.
J. Parallel Distributed Comput., 2013

The Implementation of a High Performance GPGPU Compiler.
Int. J. Parallel Program., 2013

Analyzing locality of memory references in GPU architectures.
Proceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness, 2013

Adaptive Cache Bypassing for Inclusive Last Level Caches.
Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

Exploiting uniform vector instructions for GPGPU performance, energy efficiency, and opportunistic reliability enhancement.
Proceedings of the International Conference on Supercomputing, 2013

2012
A unified optimizing compiler framework for different GPGPU architectures.
ACM Trans. Archit. Code Optim., 2012

Fixing Performance Bugs: An Empirical Study of Open-Source GPGPU Programs.
Proceedings of the 41st International Conference on Parallel Processing, 2012

CPU-assisted GPGPU on fused CPU-GPU architectures.
Proceedings of the 18th IEEE International Symposium on High Performance Computer Architecture, 2012

Shared memory multiplexing: a novel way to improve GPGPU throughput.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

Many-thread aware instruction-level parallelism: architecting shader cores for GPU computing.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

2011
Combining Local and Global History for High Performance Data Prefetching.
J. Instr. Level Parallelism, 2011

Time-Ordered Event Traces: A New Debugging Primitive for Concurrency Bugs.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

2010
An optimizing compiler for GPGPU programs with input-data sharing.
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2010

A GPGPU compiler for memory optimization and parallelism management.
Proceedings of the 2010 ACM SIGPLAN Conference on Programming Language Design and Implementation, 2010

Improving privacy and lifetime of PCM-based main memory.
Proceedings of the 2010 IEEE/IFIP International Conference on Dependable Systems and Networks, 2010

Accelerating MATLAB Image Processing Toolbox functions on GPUs.
Proceedings of 3rd Workshop on General Purpose Processing on Graphics Processing Units, 2010

2009
Hardware-software integrated approaches to defend against software cache-based side channel attacks.
Proceedings of the 15th International Conference on High-Performance Computer Architecture (HPCA-15 2009), 2009

Anomaly-based bug prediction, isolation, and validation: an automated approach for software debugging.
Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems, 2009

Understanding software approaches for GPGPU reliability.
Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units, 2009

2008
Address-branch correlation: A novel locality for long-latency hard-to-predict branches.
Proceedings of the 14th International Conference on High-Performance Computer Architecture (HPCA-14 2008), 2008

Deconstructing new cache designs for thwarting software cache-based side channel attacks.
Proceedings of the 2nd ACM Workshop on Computer Security Architecture, 2008

2007
Optimizing Dual-Core Execution for Power Efficiency and Transient-Fault Recovery.
IEEE Trans. Parallel Distributed Syst., 2007

PMPM: Prediction by Combining Multiple Partial Matches.
J. Instr. Level Parallelism, 2007

Unified Architectural Support for Soft-Error Protection or Software Bug Detection.
Proceedings of the 16th International Conference on Parallel Architectures and Compilation Techniques (PACT 2007), 2007

2006
Using Indexing Functions to Reduce Conflict Aliasing in Branch Prediction Tables.
IEEE Trans. Computers, 2006

A case for fault tolerance and performance enhancement using chip multi-processors.
IEEE Comput. Archit. Lett., 2006

Efficient Transient-Fault Tolerance for Multithreaded Processors using Dual-Thread Execution.
Proceedings of the 24th International Conference on Computer Design (ICCD 2006), 2006

Improving software security via runtime instruction-level taint checking.
Proceedings of the 1st Workshop on Architectural and System Support for Improving Software Dependability, 2006

2005
Enhancing Memory-Level Parallelism via Recovery-Free Value Prediction.
IEEE Trans. Computers, 2005

Adaptive Information Processing: An Effective Way to Improve Perceptron Predictors.
J. Instr. Level Parallelism, 2005

Dual-Core Execution: Building a Highly Scalable Single-Thread Instruction Window.
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques (PACT 2005), 2005

2003
Adaptive mode control: A static-power-efficient cache design.
ACM Trans. Embed. Comput. Syst., 2003

Detecting Global Stride Locality in Value Streams.
Proceedings of the 30th International Symposium on Computer Architecture (ISCA 2003), 2003

2002
Code Size Efficiency in Global Scheduling for ILP Processors.
Proceedings of the 6th Annual Workshop on Interaction between Compilers and Computer Architecture (INTERACT-6 2002), 2002

2001
Tree Traversal Scheduling: A Global Instruction Scheduling Technique for VLIW/EPIC Processors.
Proceedings of the Languages and Compilers for Parallel Computing, 2001


  Loading...