Gennady Pekhimenko

Orcid: 0000-0002-3839-0919

Affiliations:

University of Toronto
Microsoft Research
Carnegie Mellon University (former)

According to our database¹, Gennady Pekhimenko authored at least 93 papers between 2010 and 2024.

Collaborative distances:

Dijkstra number² of three.
Erdős number³ of three.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Bibliography

2024

Accelerating Graph Neural Networks on Real Processing-In-Memory Systems.

[BibT_eX]

[DOI]

CoRR, 2024

Minuet: Accelerating 3D Sparse Convolutions on GPUs.

[BibT_eX]

[DOI]

Proceedings of the Nineteenth European Conference on Computer Systems, 2024

2023

Federated benchmarking of medical artificial intelligence with MedPerf.

[BibT_eX]

[DOI]

Alexandros Karargyris

Renato Umeton

Micah J. Sheller

Alejandro Aristizabal

Prakash Narayana Moorthy

Dimitrios Dimitriadis

Sreekar Reddy Puchala

Cassiano Ferro Moraes

Jayaraman J. Thiagarajan

Nat. Mac. Intell., July, 2023

Lightweight Frequency-Based Tiering for CXL Memory Systems.

[BibT_eX]

[DOI]

CoRR, 2023

The Synergy of Speculative Decoding and Batching in Serving Large Language Models.

[BibT_eX]

[DOI]

Qidong Su

Christina Giannoula

Gennady Pekhimenko

CoRR, 2023

Speeding up Fourier Neural Operators via Mixed Precision.

[BibT_eX]

[DOI]

Kamyar Azizzadenesheli

Anima Anandkumar

CoRR, 2023

Arbitor: A Numerically Accurate Hardware Emulation Tool for DNN Accelerators.

[BibT_eX]

[DOI]

Proceedings of the 2023 USENIX Annual Technical Conference, 2023

Grape: Practical and Efficient Graphed Execution for Dynamic Deep Neural Networks on GPUs.

[BibT_eX]

[DOI]

Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

TiLT: A Time-Centric Approach for Stream Query Optimization and Parallelization.

[BibT_eX]

[DOI]

Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

Hidet: Task-Mapping Programming Paradigm for Deep Learning Tensor Programs.

[BibT_eX]

[DOI]

Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

TorchProbe: Fuzzing Dynamic Deep Learning Compilers.

[BibT_eX]

[DOI]

Proceedings of the Programming Languages and Systems - 21st Asian Symposium, 2023

2022

Optimizing Data Collection in Deep Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, 2022

ROLLER: Fast and Efficient Tensor Compilation for Deep Learning.

[BibT_eX]

[DOI]

Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation, 2022

Tempo: Accelerating Transformer-Based Model Training through Memory Footprint Reduction.

[BibT_eX]

[DOI]

Muralidhar Andoorveedu

Zhanda Zhu

Bojian Zheng

Gennady Pekhimenko

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

DietCode: Automatic Optimization for Dynamic Tensor Programs.

[BibT_eX]

[DOI]

Proceedings of Machine Learning and Systems 2022, 2022

Keynote Talk 1: Efficient DNN Training at Scale: from Algorithms to Hardware.

[BibT_eX]

[DOI]

Gennady Pekhimenko

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2022

How to validate Machine Learning Models Prior to Deployment: Silent trial protocol for evaluation of real-time models at ICU.

[BibT_eX]

[DOI]

Sana Tonekaboni

Gabriela Morgenshtern

Proceedings of the Conference on Health, Inference, and Learning, 2022

Automatic Horizontal Fusion for GPU Kernels.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2022

GPUPool: A Holistic Approach to Fine-Grained GPU Sharing in the Cloud.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2022

Pavise: Integrating Fault Tolerance Support for Persistent Memory Applications.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2022

2021

Gretch: A Hardware Prefetcher for Graph Analytics.

[BibT_eX]

[DOI]

Anirudh Mohan Kaushik

Gennady Pekhimenko

Hiren D. Patel

ACM Trans. Archit. Code Optim., 2021

MedPerf: Open Benchmarking Platform for Medical Artificial Intelligence using Federated Evaluation.

[BibT_eX]

[DOI]

CoRR, 2021

Computational Performance Predictions for Deep Neural Network Training: A Runtime-Based Approach.

[BibT_eX]

[DOI]

CoRR, 2021

Habitat: A Runtime-Based Computational Performance Predictor for Deep Neural Network Training.

[BibT_eX]

[DOI]

Proceedings of the 2021 USENIX Annual Technical Conference, 2021

Moshpit SGD: Communication-Efficient Decentralized Training on Heterogeneous Unreliable Devices.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Distributed Deep Learning In Open Collaborations.

[BibT_eX]

[DOI]

Albert Villanova del Moral

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Boveda: Building an On-Chip Deep Learning Memory Hierarchy Brick by Brick.

[BibT_eX]

[DOI]

Alberto Delmas Lascorz

Gennady Pekhimenko

Andreas Moshovos

Proceedings of Machine Learning and Systems 2021, 2021

IOS: Inter-Operator Scheduler for CNN Acceleration.

[BibT_eX]

[DOI]

Proceedings of Machine Learning and Systems 2021, 2021

Horizontally Fused Training Array: An Effective Hardware Utilization Squeezer for Training Novel Deep Learning Models.

[BibT_eX]

[DOI]

Proceedings of Machine Learning and Systems 2021, 2021

RL-Scope: Cross-stack Profiling for Deep Reinforcement Learning Workloads.

[BibT_eX]

[DOI]

Proceedings of Machine Learning and Systems 2021, 2021

FPRaker: A Processing Element For Accelerating Neural Network Training.

[BibT_eX]

[DOI]

Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

NVOverlay: Enabling Efficient and Scalable High-Frequency Snapshotting to NVM.

[BibT_eX]

[DOI]

Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021

LifeStream: a high-performance stream processing engine for periodic streams.

[BibT_eX]

[DOI]

Proceedings of the ASPLOS '21: 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2021

2020

LifeStream: A High-performance Stream Processing Engine for Waveform Data.

[BibT_eX]

[DOI]

CoRR, 2020

TensorDash: Exploiting Sparsity to Accelerate Deep Neural Network Training and Inference.

[BibT_eX]

[DOI]

CoRR, 2020

Multi-node Bert-pretraining: Cost-efficient Approach.

[BibT_eX]

[DOI]

Jiahuang Lin

Xin Li

Gennady Pekhimenko

CoRR, 2020

Daydream: Accurately Estimating the Efficacy of Optimizations for DNN Training.

[BibT_eX]

[DOI]

Hongyu Zhu

Amar Phanishayee

Gennady Pekhimenko

Proceedings of the 2020 USENIX Annual Technical Conference, 2020

Skyline: Interactive In-Editor Computational Performance Profiling for Deep Neural Network Training.

[BibT_eX]

[DOI]

Geoffrey X. Yu

Tovi Grossman

Gennady Pekhimenko

Proceedings of the UIST '20: The 33rd Annual ACM Symposium on User Interface Software and Technology, 2020

MLPerf Training Benchmark.

[BibT_eX]

[DOI]

Proceedings of Machine Learning and Systems 2020, 2020

BPPSA: Scaling Back-propagation by Parallel Scan Algorithm.

[BibT_eX]

[DOI]

Shang Wang

Yifan Bai

Gennady Pekhimenko

Proceedings of Machine Learning and Systems 2020, 2020

TensorDash: Exploiting Sparsity to Accelerate Deep Neural Network Training.

[BibT_eX]

[DOI]

Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020

Echo: Compiler-based GPU Memory Footprint Reduction for LSTM RNN Training.

[BibT_eX]

[DOI]

Bojian Zheng

Nandita Vijaykumar

Gennady Pekhimenko

Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture, 2020

MLPerf Inference Benchmark.

[BibT_eX]

[DOI]

Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture, 2020

2019

MLPerf Training Benchmark.

[BibT_eX]

[DOI]

CoRR, 2019

Scaling Back-propagation by Parallel Scan Algorithm.

[BibT_eX]

[DOI]

Shang Wang

Yifan Bai

Gennady Pekhimenko

CoRR, 2019

SysML: The New Frontier of Machine Learning Systems.

[BibT_eX]

[DOI]

Alexandros G. Dimakis

Anastasios Kyrillidis

Dimitris S. Papailiopoulos

Shivaram Venkataraman

CoRR, 2019

Priority-based Parameter Propagation for Distributed DNN Training.

[BibT_eX]

[DOI]

Proceedings of Machine Learning and Systems 2019, 2019

Janus: optimizing memory and storage support for non-volatile memory systems.

[BibT_eX]

[DOI]

Proceedings of the 46th International Symposium on Computer Architecture, 2019

StreamBox-HBM: Stream Analytics on High Bandwidth Hybrid Memory.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, 2019

Towards Breaking the Memory Bandwidth Wall Using Approximate Value Prediction.

[BibT_eX]

[DOI]

Proceedings of the Approximate Circuits, Methodologies and CAD., 2019

2018

EcoRNN: Fused LSTM RNN Implementation with Data Layout Optimization.

[BibT_eX]

[DOI]

CoRR, 2018

Exploiting Row-Level Temporal Locality in DRAM to Reduce the Memory Access Latency.

[BibT_eX]

[DOI]

CoRR, 2018

RowClone: Accelerating Data Movement and Initialization Using DRAM.

[BibT_eX]

[DOI]

Rachata Ausavarungnirun

CoRR, 2018

SoftMC: Practical DRAM Characterization Using an FPGA-Based Infrastructure.

[BibT_eX]

[DOI]

CoRR, 2018

Flexible-Latency DRAM: Understanding and Exploiting Latency Variation in Modern DRAM Chips.

[BibT_eX]

[DOI]

CoRR, 2018

Adaptive-Latency DRAM: Reducing DRAM Latency by Exploiting Timing Margins.

[BibT_eX]

[DOI]

CoRR, 2018

Decoupling GPU Programming Models from Resource Management for Enhanced Programming Ease, Portability, and Performance.

[BibT_eX]

[DOI]

CoRR, 2018

TBD: Benchmarking and Analyzing Deep Neural Network Training.

[BibT_eX]

[DOI]

CoRR, 2018

Zorua: Enhancing Programming Ease, Portability, and Performance in GPUs by Decoupling Programming Models from Resource Management.

[BibT_eX]

[DOI]

CoRR, 2018

TerseCades: Efficient Data Compression in Stream Processing.

[BibT_eX]

[DOI]

Proceedings of the 2018 USENIX Annual Technical Conference, 2018

A Case for Richer Cross-Layer Abstractions: Bridging the Semantic Gap with Expressive Memory.

[BibT_eX]

[DOI]

Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture, 2018

Gist: Efficient Data Encoding for Deep Neural Network Training.

[BibT_eX]

[DOI]

Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture, 2018

Benchmarking and Analyzing Deep Neural Network Training.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Symposium on Workload Characterization, 2018

Compiler-driven performance workshop.

[BibT_eX]

[DOI]

Gennady Pekhimenko

Ettore Tiotto

Proceedings of the 28th Annual International Conference on Computer Science and Software Engineering, 2018

2017

Design-Induced Latency Variation in Modern DRAM Chips: Characterization, Analysis, and Latency Reduction Mechanisms.

[BibT_eX]

[DOI]

Rachata Ausavarungnirun

Gennady Pekhimenko

Vivek Seshadri

Onur Mutlu

Proc. ACM Meas. Anal. Comput. Syst., 2017

StreamBox: Modern Stream Processing on a Multicore Machine.

[BibT_eX]

[DOI]

Proceedings of the 2017 USENIX Annual Technical Conference, 2017

SoftMC: A Flexible and Practical Open-Source Infrastructure for Enabling Experimental DRAM Studies.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture, 2017

2016

RFVP: Rollback-Free Value Prediction with Safe-to-Approximate Loads.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2016

Simultaneous Multi-Layer Access: Improving 3D-Stacked Memory Bandwidth at Low Cost.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2016

Mitigating the Memory Bottleneck With Approximate Load Value Prediction.

[BibT_eX]

[DOI]

IEEE Des. Test, 2016

A Framework for Accelerating Bottlenecks in GPU Execution with Assist Warps.

[BibT_eX]

[DOI]

Rachata Ausavarungnirun

CoRR, 2016

Practical Data Compression for Modern Memory Hierarchies.

[BibT_eX]

[DOI]

Gennady Pekhimenko

CoRR, 2016

Reducing DRAM Latency by Exploiting Design-Induced Latency Variation in Modern DRAM Chips.

[BibT_eX]

[DOI]

Donghyuk Lee

Samira Manabi Khan

Lavanya Subramanian

Rachata Ausavarungnirun

CoRR, 2016

Adaptive-Latency DRAM (AL-DRAM).

[BibT_eX]

[DOI]

CoRR, 2016

Optimal seed solver: optimizing seed selection in read mapping.

[BibT_eX]

[DOI]

Bioinform., 2016

Understanding Latency Variation in Modern DRAM Chips: Experimental Characterization, Analysis, and Optimization.

[BibT_eX]

[DOI]

Proceedings of the 2016 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Science, 2016

Zorua: A holistic approach to resource virtualization in GPUs.

[BibT_eX]

[DOI]

Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016

A case for toggle-aware compression for GPU systems.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture, 2016

ChargeCache: Reducing DRAM latency by exploiting row access locality.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture, 2016

2015

Simultaneous Multi Layer Access: A High Bandwidth and Low Cost 3D-Stacked Memory Interface.

[BibT_eX]

[DOI]

CoRR, 2015

Toggle-Aware Compression for GPUs.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2015

Shifted Hamming distance: a fast and accurate SIMD-friendly filter to accelerate alignment verification in read mapping.

[BibT_eX]

[DOI]

Bioinform., 2015

PocketTrend: Timely Identification and Delivery of Trending Search Content to Mobile Users.

[BibT_eX]

[DOI]

Gennady Pekhimenko

Dimitrios Lymberopoulos

Oriana Riva

Karin Strauss

Doug Burger

Proceedings of the 24th International Conference on World Wide Web, 2015

A case for core-assisted bottleneck acceleration in GPUs: enabling flexible data compression with assist warps.

[BibT_eX]

[DOI]

Rachata Ausavarungnirun

Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

Page overlays: an enhanced virtual memory framework to enable fine-grained memory management.

[BibT_eX]

[DOI]

Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

Exploiting compressed block size as an indicator of future reuse.

[BibT_eX]

[DOI]

Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015

Adaptive-latency DRAM: Optimizing DRAM timing for the common-case.

[BibT_eX]

[DOI]

Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015

2014

Rollback-free value prediction with approximate loads.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Parallel Architectures and Compilation, 2014

2013

RowClone: fast and energy-efficient in-DRAM bulk data copy and initialization.

[BibT_eX]

[DOI]

Rachata Ausavarungnirun

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, 2013

Linearly compressed pages: a low-complexity, low-latency main memory compression framework.

[BibT_eX]

[DOI]

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, 2013

2012

Base-delta-immediate compression: practical data compression for on-chip caches.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

Linearly compressed pages: a main memory compression framework with low complexity and low latency.

[BibT_eX]

[DOI]

Gennady Pekhimenko

Todd C. Mowry

Onur Mutlu

Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

2010

Efficient Program Compilation Through Machine Learning Techniques.

[BibT_eX]

[DOI]

Gennady Pekhimenko

Angela Demke Brown

Proceedings of the Software Automatic Tuning, From Concepts to State-of-the-Art Results, 2010

Gennady Pekhimenko

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...