Mark Stephenson

  • NVIDIA, Austin, TX, USA
  • IBM Research, Austin, TX, USA
  • Massachusetts Institute of Technology, Cambridge, MA, USA (PhD)

According to our database1, Mark Stephenson authored at least 28 papers between 2000 and 2022.

Collaborative distances:



In proceedings 
PhD thesis 


Online presence:



GPU Subwarp Interleaving.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022

Cooperative Profile Guided Optimizations.
Comput. Graph. Forum, 2021

PGZ: automatic zero-value code specialization.
Proceedings of the CC '21: 30th ACM SIGPLAN International Conference on Compiler Construction, 2021

<i>Zeroploit</i>: Exploiting Zero Valued Operands in Interactive Gaming Applications.
ACM Trans. Archit. Code Optim., 2020

AZP: Automatic Specialization for Zero Values in Gaming Applications.
CoRR, 2020

Estimating Silent Data Corruption Rates Using a Two-Level Model.
CoRR, 2020

Speculative reconvergence for improved SIMT efficiency.
Proceedings of the CGO '20: 18th ACM/IEEE International Symposium on Code Generation and Optimization, 2020

Exposing Memory Access Patterns to Improve Instruction and Memory Efficiency in GPUs.
ACM Trans. Archit. Code Optim., 2019

NVBit: A Dynamic Binary Instrumentation Framework for NVIDIA GPUs.
Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019

Software-Directed Techniques for Improved GPU Register File Utilization.
ACM Trans. Archit. Code Optim., 2018

SASSIFI: An architecture-level fault injection tool for GPU application resilience evaluation.
Proceedings of the 2017 IEEE International Symposium on Performance Analysis of Systems and Software, 2017

Towards high performance paged memory for GPUs.
Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture, 2016

Automatically Exploiting Implicit Pipeline Parallelism from Multiple Dependent Kernels for GPUs.
Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

Flexible software profiling of GPU architectures.
Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

Page Placement Strategies for GPUs within Heterogeneous Memory Systems.
Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, 2015

Exploring the Design Space of SPMD Divergence Management on Data-Parallel Architectures.
Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014

A study of application-level recovery methods for transient network faults.
Proceedings of the Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2013

The power 775 architecture at scale.
Proceedings of the International Conference on Supercomputing, 2013

Statistically regulating program behavior via mainstream computing.
Proceedings of the CGO 2010, 2010

Lightweight predication support for out of order processors.
Proceedings of the 15th International Conference on High-Performance Computer Architecture (HPCA-15 2009), 2009

Characterizing and Improving the Performance of Bioinformatics Workloads on the POWER5 Architecture.
Proceedings of the IEEE 10th International Symposium on Workload Characterization, 2007

Automating the construction of a complier heuristics using machine learning.
PhD thesis, 2006

Predicting Unroll Factors Using Supervised Classification.
Proceedings of the 3nd IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2005), 2005

Convergent Scheduling.
J. Instr. Level Parallelism, 2004

Meta optimization: improving compiler heuristics with machine learning.
Proceedings of the ACM SIGPLAN 2003 Conference on Programming Language Design and Implementation 2003, 2003

Adapting Convergent Scheduling Using Machine-Learning.
Proceedings of the Languages and Compilers for Parallel Computing, 2003

Genetic Programming Applied to Compiler Heuristic Optimization.
Proceedings of the Genetic Programming, 6th European Conference, EuroGP 2003, 2003

Bitwidth analysis with application to silicon compilation.
Proceedings of the 2000 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2000