We stand with Ukraine

We stand with Ukraine

Keren Zhou

Orcid: 0000-0002-7977-3182

Affiliations:

George Mason University, VA, USA
OpenAI
Rice University, TX, USA (PhD)

According to our database¹, Keren Zhou authored at least 40 papers between 2015 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

Online presence:

On csauthors.net:

Bibliography

2026

TenProf: A Tensor-Centric Profiler for Deep Learning Workload Analysis and Optimization.

[DOI]

,

,

,

Proceedings of the 40th ACM International Conference on Supercomputing, 2026

Proton: Towards Multi-level, Adaptive Profiling for Triton.

[DOI]

,

,

,

,

,

,

,

,

,

Philippe Tillet

Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2026

PASTA: A Modular Program Analysis Tool Framework for Accelerators.

[DOI]

,

,

Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2026

Triton-Sanitizer: A Fast and Device-Agnostic Memory Sanitizer for Triton with Rich Diagnostic Context.

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the 31st ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2026

Linear Layouts: Robust Code Generation of Efficient Tensor Computation Using F_2.

[DOI]

,

Mario Lezcano Casado

,

Adam P. Goucher

,

Akhmed Rakhmati

,

,

,

Pawel Szczerbuk

,

,

,

,

Proceedings of the 31st ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2026

2025

Linear Layouts: Robust Code Generation of Efficient Tensor Computation Using 𝔽<sub>2</sub>.

[DOI]

,

,

,

Akhmed Rakhmati

,

,

,

Pawel Szczerbuk

,

,

,

,

CoRR, May, 2025

KPerfIR: Towards an Open and Compiler-centric Ecosystem for GPU Kernel Performance Tooling on Modern AI Workloads.

[DOI]

,

,

,

,

,

,

,

CoRR, May, 2025

Do Large Language Models Understand Performance Optimization?

[DOI]

,

,

Oscar R. Hernandez

,

CoRR, March, 2025

Mercury: Unlocking Multi-GPU Operator Optimization for LLMs via Remote Memory Scheduling.

[DOI]

,

,

,

Daniels Johnson

,

,

,

,

,

,

Proceedings of the ACM SIGOPS 31st Symposium on Operating Systems Principles, 2025

Triton-Viz: Visualizing GPU Programming in AI Courses.

[DOI]

,

Alexander M. Rush

,

,

,

,

Proceedings of the 56th ACM Technical Symposium on Computer Science Education V. 1, 2025

KPerfIR: Towards a Open and Compiler-centric Ecosystem for GPU Kernel Performance Tooling on Modern AI Workloads.

[DOI]

,

,

,

,

,

,

,

Proceedings of the 19th USENIX Symposium on Operating Systems Design and Implementation, 2025

Comprehensive Evaluation of LLMs in HPC Code Performance Optimization.

[DOI]

,

,

Oscar R. Hernandez

,

Proceedings of the Workshop Proceedings of the 54th International Conference on Parallel Processing, 2025

DeepContext: A Context-aware, Cross-platform, and Cross-framework Tool for Performance Profiling and Analysis of Deep Learning Workloads.

[DOI]

,

,

,

,

,

,

Proceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2025

2024

DeepContext: A Context-aware, Cross-platform, and Cross-framework Tool for Performance Profiling and Analysis of Deep Learning Workloads.

[DOI]

,

,

,

,

,

,

CoRR, 2024

Centimani: Enabling Fast AI Accelerator Selection for DNN Training with a Novel Performance Predictor.

[DOI]

,

,

,

,

,

,

,

Venkatram Vishwanath

Proceedings of the 2024 USENIX Annual Technical Conference, 2024

SS1: Accelerating Inference with Fast and Expressive Sketch Structured Transform.

[DOI]

,

,

,

,

,

Anshumali Shrivastava

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

FASTEN: Fast GPU-accelerated Segmented Matrix Multiplication for Heterogenous Graph Neural Networks.

[DOI]

,

Karthik Ganapathi Subramanian

,

,

,

,

Proceedings of the 38th ACM International Conference on Supercomputing, 2024

PyTorch 2: Faster Machine Learning Through Dynamic Python Bytecode Transformation and Graph Compilation.

[DOI]

Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

2023

Hardware-Aware Compression with Random Operation Access Specific Tile (ROAST) Hashing.

[DOI]

,

,

Anshumali Shrivastava

Proceedings of the International Conference on Machine Learning, 2023

DrGPUM: Guiding Memory Optimization for GPU-Accelerated Applications.

[DOI]

,

,

Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

2022

An Automated Tool for Analysis and Tuning of GPU-Accelerated Code in HPC Applications.

[DOI]

,

,

,

,

John M. Mellor-Crummey

IEEE Trans. Parallel Distributed Syst., 2022

Paw-Net: Stacking ensemble deep learning for segmenting scanning electron microscopy images of fine-grained shale samples.

[DOI]

,

,

,

,

Comput. Geosci., 2022

Efficient model compression with Random Operation Access Specific Tile (ROAST) hashing.

[DOI]

,

,

Anshumali Shrivastava

CoRR, 2022

Accelerating high-order stencils on GPUs.

[DOI]

,

John M. Mellor-Crummey

,

,

,

Mauricio Araya-Polo

,

Concurr. Comput. Pract. Exp., 2022

Low overhead and context sensitive profiling of CPU-accelerated applications.

[DOI]

,

Jonathon M. Anderson

,

,

John M. Mellor-Crummey

Proceedings of the ICS '22: 2022 International Conference on Supercomputing, Virtual Event, June 28, 2022

ValueExpert: exploring value patterns in GPU-accelerated applications.

[DOI]

,

,

John M. Mellor-Crummey

,

,

Proceedings of the ASPLOS '22: 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, 28 February 2022, 2022

2021

Measurement and analysis of GPU-accelerated applications with HPCToolkit.

[DOI]

,

Laksono Adhianto

,

Jonathon M. Anderson

,

,

,

,

,

,

John M. Mellor-Crummey

Parallel Comput., 2021

Measurement and Analysis of GPU-Accelerated OpenCL Computations on Intel GPUs.

[DOI]

,

,

,

,

John M. Mellor-Crummey

Proceedings of the IEEE/ACM International Workshop on Programming and Performance Visualization Tools, 2021

Outcomes of OpenMP Hackathon: OpenMP Application Experiences with the Offloading Model (Part II).

[DOI]

Proceedings of the OpenMP: Enabling Massive Node-Level Parallelism, 2021

Outcomes of OpenMP Hackathon: OpenMP Application Experiences with the Offloading Model (Part I).

[DOI]

Proceedings of the OpenMP: Enabling Massive Node-Level Parallelism, 2021

GPA: A GPU Performance Advisor Based on Instruction Sampling.

[DOI]

,

,

,

John M. Mellor-Crummey

Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2021

2020

GVProf: a value profiler for GPU-based clusters.

[DOI]

,

,

John M. Mellor-Crummey

,

,

Proceedings of the International Conference for High Performance Computing, 2020

A tool for top-down performance analysis of GPU-accelerated applications.

[DOI]

,

,

John M. Mellor-Crummey

Proceedings of the PPoPP '20: 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2020

Tools for top-down performance analysis of GPU-accelerated applications.

[DOI]

,

Mark W. Krentel

,

John M. Mellor-Crummey

Proceedings of the ICS '20: 2020 International Conference on Supercomputing, 2020

2019

A Tool for Performance Analysis of GPU-Accelerated Applications.

[DOI]

,

John M. Mellor-Crummey

Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2019

2018

Quadboost: A Scalable Concurrent Quadtree.

[DOI]

,

,

IEEE Trans. Parallel Distributed Syst., 2018

2017

Understanding the GPU Microarchitecture to Achieve Bare-Metal Performance Tuning.

[DOI]

,

,

,

,

,

Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2017

A performance analysis framework for exploiting GPU microarchitectural capability.

[DOI]

,

,

,

,

Proceedings of the International Conference on Supercomputing, 2017

2015

Multi-Classes Feature Engineering with Sliding Window for Purchase Prediction in Mobile Commerce.

[DOI]

,

,

,

Proceedings of the IEEE International Conference on Data Mining Workshop, 2015

BF-MapReduce: A Bloom Filter Based Efficient Lightweight Search.

[DOI]

,

,

,

Proceedings of the IEEE Conference on Collaboration and Internet Computing, 2015

Loading...