Apan Qasem

According to our database1, Apan Qasem authored at least 53 papers between 2001 and 2023.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2023
Report on 2023 CyberTraining PI Meeting, 26-27 September 2023.
CoRR, 2023

GPU-accelerated Parallel Solutions to the Quadratic Assignment Problem.
CoRR, 2023

Workshop Invited Talks.
Proceedings of the 30th IEEE International Conference on High Performance Computing, 2023

ToUCH Virtual Faculty Development Workshops: Going Beyond a Webinar.
Proceedings of the 30th IEEE International Conference on High Performance Computing, 2023

2022
Uncovering input-sensitive energy bottlenecks in oversubscribed GPU workloads.
Sustain. Comput. Informatics Syst., 2022

Heterogeneous Computing for Undergraduates: Introducing the ToUCH Module Repository.
Proceedings of the SIGCSE 2022: The 53rd ACM Technical Symposium on Computer Science Education, 2022

YODA: A Pedagogical Tool for Teaching Systems Concepts.
Proceedings of the SIGCSE 2022: The 53rd ACM Technical Symposium on Computer Science Education, 2022

Raptor: Mitigating CPU-GPU False Sharing Under Unified Memory Systems.
Proceedings of the 13th IEEE International Green and Sustainable Computing Conference, 2022

Optimal Launch Bound Selection in CPU-GPU Hybrid Graph Applications with Deep Learning.
Proceedings of the 13th IEEE International Green and Sustainable Computing Conference, 2022


Adopting Heterogeneous Computing Modules: Experiences from a ToUCH Summer Workshop.
Proceedings of the IEEE/ACM International Workshop on Education for High Performance Computing, 2022

2021
A module-based introduction to heterogeneous computing in core courses.
J. Parallel Distributed Comput., 2021

Teaching about Heterogeneous Computing.
Proceedings of the SIGCSE '21: The 52nd ACM Technical Symposium on Computer Science Education, 2021

Migrating Software from x86 to ARM Architecture: An Instruction Prediction Approach.
Proceedings of the IEEE International Conference on Networking, Architecture and Storage, 2021

Characterizing Input-sensitivity in Tightly-Coupled Collaborative Graph Algorithms.
Proceedings of the 21st IEEE/ACM International Symposium on Cluster, 2021

2020
Intelligent Data Placement on Discrete GPU Nodes with Unified Memory.
Proceedings of the PACT '20: International Conference on Parallel Architectures and Compilation Techniques, 2020

2019
A Tool for Automatically Suggesting Source-Code Optimizations for Complex GPU Kernels.
CoRR, 2019

A Gentle Introduction to Heterogeneous Computing for CS1 Students.
Proceedings of the 2019 IEEE/ACM Workshop on Education for High-Performance Computing, 2019

Accelerating HotSpots in Deep Neural Networks on a CAPI-Based FPGA.
Proceedings of the 21st IEEE International Conference on High Performance Computing and Communications; 17th IEEE International Conference on Smart City; 5th IEEE International Conference on Data Science and Systems, 2019

Energy-Efficient GPU Graph Processing with On-Demand Page Migration.
Proceedings of the Tenth International Green and Sustainable Computing Conference, 2019

2018
Investigating Data Layout Transformations in Chapel.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium Workshops, 2018

Modules for Teaching Parallel Performance Concepts.
Proceedings of the Topics in Parallel and Distributed Computing, 2018

2017
A Machine Learning Approach to Automatic Creation of Architecture-Sensitive Performance Heuristics.
Proceedings of the 19th IEEE International Conference on High Performance Computing and Communications; 15th IEEE International Conference on Smart City; 3rd IEEE International Conference on Data Science and Systems, 2017

Automatically Selecting Profitable Thread Block Sizes for Accelerated Kernels.
Proceedings of the 19th IEEE International Conference on High Performance Computing and Communications; 15th IEEE International Conference on Smart City; 3rd IEEE International Conference on Data Science and Systems, 2017

Mitigating register pressure in GPU kernels for improved energy efficiency.
Proceedings of the Eighth International Green and Sustainable Computing Conference, 2017

Evaluating the impact of data layout and placement on the energy efficiency of heterogeneous applications.
Proceedings of the Eighth International Green and Sustainable Computing Conference, 2017

Characterizing data organization effects on heterogeneous memory architectures.
Proceedings of the 2017 International Symposium on Code Generation and Optimization, 2017

2015
A <i>SIMD tabu search</i> implementation for solving the quadratic assignment problem with GPU acceleration.
Proceedings of the 2015 XSEDE Conference: Scientific Advancements Enabled by Enhanced Cyberinfrastructure, St. Louis, MO, USA, July 26, 2015

A Module-based Approach to Adopting the 2013 ACM Curricular Recommendations on Parallel Computing.
Proceedings of the 46th ACM Technical Symposium on Computer Science Education, 2015

Maximizing Hardware Prefetch Effectiveness with Machine Learning.
Proceedings of the 17th IEEE International Conference on High Performance Computing and Communications, 2015

Autotuning GPU-Accelerated QAP Solvers for Power and Performance.
Proceedings of the 17th IEEE International Conference on High Performance Computing and Communications, 2015

Neural network methods for fast and portable prediction of CPU power consumption.
Proceedings of the Sixth International Green and Sustainable Computing Conference, 2015

Power-performance analysis of metaheuristic search algorithms on the GPU.
Proceedings of the Sixth International Green and Sustainable Computing Conference, 2015

Realizing energy-efficient thread affinity configurations with supervised learning.
Proceedings of the Sixth International Green and Sustainable Computing Conference, 2015

2014
A SIMD Solution for the Quadratic Assignment Problem with GPU Acceleration.
Proceedings of the Annual Conference of the Extreme Science and Engineering Discovery Environment, 2014

2013
Improving TLB performance on current chip multiprocessor architectures through demand-driven superpaging.
Softw. Pract. Exp., 2013

2012
Efficient parallel solutions to the integral knapsack problem on current chip-multiprocessor systems.
Int. J. Parallel Emergent Distributed Syst., 2012

Efficient execution of time-step computations with pipelined parallelism and inter-thread data locality optimizaitions.
Proceedings of the 2012 PPOPP International Workshop on Programming Models and Applications for Multicores and Manycores, 2012

Automatic Restructuring of GPU Kernels for Exploiting Inter-thread Data Locality.
Proceedings of the Compiler Construction - 21st International Conference, 2012

2011
Poster: register pressure aware code transformations on GPU.
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, 2011

Understanding stencil code performance on multicore architectures.
Proceedings of the 8th Conference on Computing Frontiers, 2011

2010
Exposing Tunable Parameters in Multi-threaded Numerical Code.
Proceedings of the Network and Parallel Computing, IFIP International Conference, 2010

Restructuring parallel loops to curb false sharing on multicore architectures.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

An Evaluation of Parallel Knapsack Algorithms on Multicore Architectures.
Proceedings of the 2010 International Conference on Scientific Computing, 2010

2009
Balancing Locality and Parallelism on Shared-cache Mulit-core Systems.
Proceedings of the 11th IEEE International Conference on High Performance Computing and Communications, 2009

A case for compiler-driven superpage allocation.
Proceedings of the 47th Annual Southeast Regional Conference, 2009

2008
Model-guided empirical tuning of loop fusion.
Int. J. High Perform. Syst. Archit., 2008

Evaluating an Early-stop Criterion and a Statistical Pruning Strategy of the Optimization Search Space.
Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 2008

Exploring the Optimization Space of Dense Linear Algebra Kernels.
Proceedings of the Languages and Compilers for Parallel Computing, 2008

2006
Automatic tuning of whole applications using direct search and a performance-based transformation system.
J. Supercomput., 2006

Profitable loop fusion and tiling using model-driven empirical search.
Proceedings of the 20th Annual International Conference on Supercomputing, 2006

2005
A Cache-Conscious Profitability Model for Empirical Tuning of Loop Fusion.
Proceedings of the Languages and Compilers for Parallel Computing, 2005

2001
Using a Swap Instruction to Coalesce Loads and Stores.
Proceedings of the Euro-Par 2001: Parallel Processing, 2001


  Loading...