Keren Zhou

Orcid: 0000-0002-7977-3182

Affiliations:
  • George Mason University, VA, USA
  • OpenAI
  • Rice University, TX, USA (PhD)


According to our database1, Keren Zhou authored at least 39 papers between 2015 and 2026.

Collaborative distances:

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

Online presence:

On csauthors.net:

Bibliography

2026
Proton: Towards Multi-level, Adaptive Profiling for Triton.
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2026

PASTA: A Modular Program Analysis Tool Framework for Accelerators.
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2026

Triton-Sanitizer: A Fast and Device-Agnostic Memory Sanitizer for Triton with Rich Diagnostic Context.
Proceedings of the 31st ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2026

Linear Layouts: Robust Code Generation of Efficient Tensor Computation Using F_2.
Proceedings of the 31st ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2026

2025
Linear Layouts: Robust Code Generation of Efficient Tensor Computation Using 𝔽<sub>2</sub>.
CoRR, May, 2025

KPerfIR: Towards an Open and Compiler-centric Ecosystem for GPU Kernel Performance Tooling on Modern AI Workloads.
CoRR, May, 2025

Do Large Language Models Understand Performance Optimization?
CoRR, March, 2025

Mercury: Unlocking Multi-GPU Operator Optimization for LLMs via Remote Memory Scheduling.
Proceedings of the ACM SIGOPS 31st Symposium on Operating Systems Principles, 2025

Triton-Viz: Visualizing GPU Programming in AI Courses.
Proceedings of the 56th ACM Technical Symposium on Computer Science Education V. 1, 2025

KPerfIR: Towards a Open and Compiler-centric Ecosystem for GPU Kernel Performance Tooling on Modern AI Workloads.
Proceedings of the 19th USENIX Symposium on Operating Systems Design and Implementation, 2025

Comprehensive Evaluation of LLMs in HPC Code Performance Optimization.
Proceedings of the Workshop Proceedings of the 54th International Conference on Parallel Processing, 2025

DeepContext: A Context-aware, Cross-platform, and Cross-framework Tool for Performance Profiling and Analysis of Deep Learning Workloads.
Proceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2025

2024
DeepContext: A Context-aware, Cross-platform, and Cross-framework Tool for Performance Profiling and Analysis of Deep Learning Workloads.
CoRR, 2024

Centimani: Enabling Fast AI Accelerator Selection for DNN Training with a Novel Performance Predictor.
Proceedings of the 2024 USENIX Annual Technical Conference, 2024

SS1: Accelerating Inference with Fast and Expressive Sketch Structured Transform.
Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

FASTEN: Fast GPU-accelerated Segmented Matrix Multiplication for Heterogenous Graph Neural Networks.
Proceedings of the 38th ACM International Conference on Supercomputing, 2024


2023
Hardware-Aware Compression with Random Operation Access Specific Tile (ROAST) Hashing.
Proceedings of the International Conference on Machine Learning, 2023

DrGPUM: Guiding Memory Optimization for GPU-Accelerated Applications.
Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

2022
An Automated Tool for Analysis and Tuning of GPU-Accelerated Code in HPC Applications.
IEEE Trans. Parallel Distributed Syst., 2022

Paw-Net: Stacking ensemble deep learning for segmenting scanning electron microscopy images of fine-grained shale samples.
Comput. Geosci., 2022

Efficient model compression with Random Operation Access Specific Tile (ROAST) hashing.
CoRR, 2022

Accelerating high-order stencils on GPUs.
Concurr. Comput. Pract. Exp., 2022

Low overhead and context sensitive profiling of CPU-accelerated applications.
Proceedings of the ICS '22: 2022 International Conference on Supercomputing, Virtual Event, June 28, 2022

ValueExpert: exploring value patterns in GPU-accelerated applications.
Proceedings of the ASPLOS '22: 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, 28 February 2022, 2022

2021
Measurement and analysis of GPU-accelerated applications with HPCToolkit.
Parallel Comput., 2021

Measurement and Analysis of GPU-Accelerated OpenCL Computations on Intel GPUs.
Proceedings of the IEEE/ACM International Workshop on Programming and Performance Visualization Tools, 2021



GPA: A GPU Performance Advisor Based on Instruction Sampling.
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2021

2020
GVProf: a value profiler for GPU-based clusters.
Proceedings of the International Conference for High Performance Computing, 2020

A tool for top-down performance analysis of GPU-accelerated applications.
Proceedings of the PPoPP '20: 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2020

Tools for top-down performance analysis of GPU-accelerated applications.
Proceedings of the ICS '20: 2020 International Conference on Supercomputing, 2020

2019
A Tool for Performance Analysis of GPU-Accelerated Applications.
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2019

2018
Quadboost: A Scalable Concurrent Quadtree.
IEEE Trans. Parallel Distributed Syst., 2018

2017
Understanding the GPU Microarchitecture to Achieve Bare-Metal Performance Tuning.
Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2017

A performance analysis framework for exploiting GPU microarchitectural capability.
Proceedings of the International Conference on Supercomputing, 2017

2015
Multi-Classes Feature Engineering with Sliding Window for Purchase Prediction in Mobile Commerce.
Proceedings of the IEEE International Conference on Data Mining Workshop, 2015

BF-MapReduce: A Bloom Filter Based Efficient Lightweight Search.
Proceedings of the IEEE Conference on Collaboration and Internet Computing, 2015


  Loading...