Apan Qasem

Proceedings of the Tenth International Green and Sustainable Computing Conference, 2019

2018

Investigating Data Layout Transformations in Chapel.

[BibT_eX]

[DOI]

Ashwin M. Aji

Michael L. Chu

Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium Workshops, 2018

Modules for Teaching Parallel Performance Concepts.

[BibT_eX]

[DOI]

Proceedings of the Topics in Parallel and Distributed Computing, 2018

2017

A Machine Learning Approach to Automatic Creation of Architecture-Sensitive Performance Heuristics.

[BibT_eX]

[DOI]

Proceedings of the 19th IEEE International Conference on High Performance Computing and Communications; 15th IEEE International Conference on Smart City; 3rd IEEE International Conference on Data Science and Systems, 2017

Automatically Selecting Profitable Thread Block Sizes for Accelerated Kernels.

[BibT_eX]

[DOI]

Tiffany A. Connors

Mitigating register pressure in GPU kernels for improved energy efficiency.

[BibT_eX]

[DOI]

Samuel Teich

Proceedings of the Eighth International Green and Sustainable Computing Conference, 2017

Evaluating the impact of data layout and placement on the energy efficiency of heterogeneous applications.

[BibT_eX]

[DOI]

Samuel Teich

Proceedings of the Eighth International Green and Sustainable Computing Conference, 2017

Characterizing data organization effects on heterogeneous memory architectures.

[BibT_eX]

[DOI]

Ashwin M. Aji

Gregory Rodgers

Proceedings of the 2017 International Symposium on Code Generation and Optimization, 2017

2015

A <i>SIMD tabu search</i> implementation for solving the quadratic assignment problem with GPU acceleration.

[BibT_eX]

[DOI]

Abhilash Chaparala

Proceedings of the 2015 XSEDE Conference: Scientific Advancements Enabled by Enhanced Cyberinfrastructure, St. Louis, MO, USA, July 26, 2015

A Module-based Approach to Adopting the 2013 ACM Curricular Recommendations on Parallel Computing.

[BibT_eX]

[DOI]

Proceedings of the 46th ACM Technical Symposium on Computer Science Education, 2015

Maximizing Hardware Prefetch Effectiveness with Machine Learning.

[BibT_eX]

[DOI]

Proceedings of the 17th IEEE International Conference on High Performance Computing and Communications, 2015

Autotuning GPU-Accelerated QAP Solvers for Power and Performance.

[BibT_eX]

[DOI]

Abhilash Chaparala

Proceedings of the 17th IEEE International Conference on High Performance Computing and Communications, 2015

Neural network methods for fast and portable prediction of CPU power consumption.

[BibT_eX]

[DOI]

Proceedings of the Sixth International Green and Sustainable Computing Conference, 2015

Power-performance analysis of metaheuristic search algorithms on the GPU.

[BibT_eX]

[DOI]

Tiffany A. Connors

Proceedings of the Sixth International Green and Sustainable Computing Conference, 2015

Realizing energy-efficient thread affinity configurations with supervised learning.

[BibT_eX]

[DOI]

Claudia Alvarado

Dan E. Tamir

Proceedings of the Sixth International Green and Sustainable Computing Conference, 2015

2014

A SIMD Solution for the Quadratic Assignment Problem with GPU Acceleration.

[BibT_eX]

[DOI]

Abhilash Chaparala

Proceedings of the Annual Conference of the Extreme Science and Engineering Discovery Environment, 2014

2013

Improving TLB performance on current chip multiprocessor architectures through demand-driven superpaging.

[BibT_eX]

[DOI]

Joshua Magee

Softw. Pract. Exp., 2013

2012

Efficient parallel solutions to the integral knapsack problem on current chip-multiprocessor systems.

[BibT_eX]

[DOI]

Int. J. Parallel Emergent Distributed Syst., 2012

Efficient execution of time-step computations with pipelined parallelism and inter-thread data locality optimizaitions.

[BibT_eX]

[DOI]

Proceedings of the 2012 PPOPP International Workshop on Programming Models and Applications for Multicores and Manycores, 2012

Automatic Restructuring of GPU Kernels for Exploiting Inter-thread Data Locality.

[BibT_eX]

[DOI]

Swapneela Unkule

Christopher Shaltz

Proceedings of the Compiler Construction - 21st International Conference, 2012

2011

Poster: register pressure aware code transformations on GPU.

[BibT_eX]

[DOI]

Swapneela Unkule

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, 2011

Understanding stencil code performance on multicore architectures.

[BibT_eX]

[DOI]

Faizur Rahman

Qing Yi

Proceedings of the 8th Conference on Computing Frontiers, 2011

2010

Exposing Tunable Parameters in Multi-threaded Numerical Code.

[BibT_eX]

[DOI]

Proceedings of the Network and Parallel Computing, IFIP International Conference, 2010

Restructuring parallel loops to curb false sharing on multicore architectures.

[BibT_eX]

[DOI]

Santosh Sarangkar

Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

An Evaluation of Parallel Knapsack Algorithms on Multicore Architectures.

[BibT_eX]

Hammad Rashid

Proceedings of the 2010 International Conference on Scientific Computing, 2010

2009

Balancing Locality and Parallelism on Shared-cache Mulit-core Systems.

[BibT_eX]

[DOI]

Michael Jason Cade

Proceedings of the 11th IEEE International Conference on High Performance Computing and Communications, 2009

A case for compiler-driven superpage allocation.

[BibT_eX]

[DOI]

Joshua Magee

Proceedings of the 47th Annual Southeast Regional Conference, 2009

2008

Model-guided empirical tuning of loop fusion.

[BibT_eX]

[DOI]

Int. J. High Perform. Syst. Archit., 2008

Evaluating an Early-stop Criterion and a Statistical Pruning Strategy of the Optimization Search Space.

[BibT_eX]

Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 2008

Exploring the Optimization Space of Dense Linear Algebra Kernels.

[BibT_eX]

[DOI]

Qing Yi

Proceedings of the Languages and Compilers for Parallel Computing, 2008

2006

Automatic tuning of whole applications using direct search and a performance-based transformation system.

[BibT_eX]

[DOI]

John M. Mellor-Crummey

J. Supercomput., 2006

Profitable loop fusion and tiling using model-driven empirical search.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual International Conference on Supercomputing, 2006

2005

A Cache-Conscious Profitability Model for Empirical Tuning of Loop Fusion.

[BibT_eX]

[DOI]