John A. Stratton

According to our database1, John A. Stratton authored at least 22 papers between 2008 and 2021.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.



In proceedings 
PhD thesis 


Online presence:



Kernel Fusion in OpenCL.
Proceedings of the Euro-Par 2021: Parallel Processing Workshops, 2021

Enhancing Faculty-Student Interaction in an Undergraduate Algorithms Course Through Group Oral Presentations.
Proceedings of the CEP '21: Computing Education Practice 2021, 2021

Optimizing Halide for Digital Signal Processors.
Proceedings of the IEEE Workshop on Signal Processing Systems, 2020

Accelerating dynamically typed languages with a virtual function cache.
Proceedings of the 2nd International Workshop on Hardware-Software Co-Design for High Performance Computing, 2015

Locality-centric thread scheduling for bulk-synchronous programming models on CPU architectures.
Proceedings of the 13th Annual IEEE/ACM International Symposium on Code Generation and Optimization, 2015

SPEC ACCEL: A Standard Application Suite for Measuring Hardware Accelerator Performance.
Proceedings of the High Performance Computing Systems. Performance Modeling, Benchmarking, and Simulation, 2014

Performance portability of parallel kernels on shared-memory systems
PhD thesis, 2013

Efficient compilation of CUDA kernels for high-performance computing on FPGAs.
ACM Trans. Embed. Comput. Syst., 2013

Data Layout Transformation Exploiting Memory-Level Parallelism in Structured Grid Many-Core Applications.
Int. J. Parallel Program., 2012

Algorithm and Data Optimization Techniques for Scaling to Massively Threaded Systems.
Computer, 2012

A scalable, numerically stable, high-performance tridiagonal solver using GPUs.
Proceedings of the SC Conference on High Performance Computing Networking, 2012

Design evaluation of OpenCL compiler framework for Coarse-Grained Reconfigurable Arrays.
Proceedings of the 2012 International Conference on Field-Programmable Technology, 2012

Multilevel Granularity Parallelism Synthesis on FPGAs.
Proceedings of the IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines, 2011

Implementing a GPU Programming Model on a Non-GPU Accelerator Architecture.
Proceedings of the Computer Architecture, 2010

Efficient compilation of fine-grained SPMD-threaded programs for multicore CPUs.
Proceedings of the CGO 2010, 2010

Data layout transformation exploiting memory-level parallelism in structured grid many-core applications.
Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, 2010

Compute Unified Device Architecture Application Suitability.
Comput. Sci. Eng., 2009

FCUDA: Enabling efficient compilation of CUDA kernels onto FPGAs.
Proceedings of the IEEE 7th Symposium on Application Specific Processors, 2009

High-performance CUDA kernel execution on FPGAs.
Proceedings of the 23rd international conference on Supercomputing, 2009

Program optimization carving for GPU computing.
J. Parallel Distributed Comput., 2008

MCUDA: An Efficient Implementation of CUDA Kernels for Multi-core CPUs.
Proceedings of the Languages and Compilers for Parallel Computing, 2008

Program optimization space pruning for a multithreaded gpu.
Proceedings of the Sixth International Symposium on Code Generation and Optimization (CGO 2008), 2008