Tong Chen

CoRR, 2020

2019

Using Structured Input and Modularity for Improved Learning.

[BibT_eX]

[DOI]

Zehra Sura

Guillaume Thomas-Collignon

CoRR, 2019

Preparation and optimization of a diverse workload for a large-scale heterogeneous system.

[BibT_eX]

[DOI]

Ian Karlin

Yoonho Park

Bronis R. de Supinski

Sara Kokkila Schumacher

Proceedings of the International Conference for High Performance Computing, 2019

POSTER: CogR: Exploiting Program Structures for Machine-Learning Based Runtime Solutions.

[BibT_eX]

[DOI]

Kevin K. O'Brien

Proceedings of the 28th International Conference on Parallel Architectures and Compilation Techniques, 2019

2017

Implementing implicit OpenMP data sharing on GPUs.

[BibT_eX]

[DOI]

Proceedings of the Fourth Workshop on the LLVM Compiler Infrastructure in HPC, 2017

Leveraging OpenMP 4.5 Support in CLANG for Fortran.

[BibT_eX]

[DOI]

Proceedings of the Scaling OpenMP for Exascale Performance and Portability, 2017

Efficient Fork-Join on GPUs Through Warp Specialization.

[BibT_eX]

[DOI]

Arpith Chacko Jacob

Samuel F. Antão

Proceedings of the 24th IEEE International Conference on High Performance Computing, 2017

2016

Performance Analysis and Optimization of Clang's OpenMP 4.5 GPU Support.

[BibT_eX]

[DOI]

Proceedings of the 7th International Workshop on Performance Modeling, 2016

Offloading Support for OpenMP in Clang and LLVM.

[BibT_eX]

[DOI]

Samuel F. Antão

Alexey Bataev

Proceedings of the Third Workshop on the LLVM Compiler Infrastructure in HPC, 2016

Automatic Copying of Pointer-Based Data Structures.

[BibT_eX]

[DOI]

Zehra Sura

Proceedings of the Languages and Compilers for Parallel Computing, 2016

2015

Active Memory Cube: A processing-in-memory architecture for exascale systems.

[BibT_eX]

[DOI]

IBM J. Res. Dev., 2015

Integrating GPU support for OpenMP offloading directives into Clang.

[BibT_eX]

[DOI]

Samuel Antão

Proceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC, 2015

Performance analysis of OpenMP on a GPU using a CORAL proxy application.

[BibT_eX]

[DOI]

Samuel F. Antão

Proceedings of the 6th International Workshop on Performance Modeling, 2015

Progressive Codesign of an Architecture and Compiler Using a Proxy Application.

[BibT_eX]

[DOI]

Proceedings of the 27th International Symposium on Computer Architecture and High Performance Computing, 2015

Exploiting Fine- and Coarse-Grained Parallelism Using a Directive Based Approach.

[BibT_eX]

[DOI]

Ravi Nair

Proceedings of the OpenMP: Heterogenous Execution and Data Movements, 2015

Data access optimization in a processing-in-memory system.

[BibT_eX]

[DOI]

Proceedings of the 12th ACM International Conference on Computing Frontiers, 2015

2014

Coordinating GPU threads for OpenMP 4.0 in LLVM.

[BibT_eX]

[DOI]

Samuel Antão

Lakshminarayanan Renganarayanan

Proceedings of the 2014 LLVM Compiler Infrastructure in HPC, 2014

2011

Automatic Loop Tiling for Direct Memory Access.

[BibT_eX]

[DOI]

Haibo Lin

Tao Liu

Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

2010

DMATiler: revisiting loop tiling for direct memory access.

[BibT_eX]

[DOI]

Lakshminarayanan Renganarayanan

Kevin O'Brien

Ling Shao

Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, 2010

2009

DBDB: optimizing DMATransfer for the cell be architecture.

[BibT_eX]

[DOI]

Proceedings of the 23rd international conference on Supercomputing, 2009

2008

Supporting OpenMP on Cell.

[BibT_eX]

[DOI]

Int. J. Parallel Program., 2008

Prefetching irregular references for software cache on cell.

[BibT_eX]

[DOI]

Tao Zhang

Zehra Sura

Marc González Tallada

Proceedings of the Sixth International Symposium on Code Generation and Optimization (CGO 2008), 2008

Hybrid access-specific software cache techniques for the cell BE architecture.

[BibT_eX]

[DOI]

Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, 2008

2007

A Novel Asynchronous Software Cache Implementation for the Cell-BE Processor.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 2007

2006

Using advanced compiler technology to exploit the performance of the Cell Broadband Engine<sup>TM</sup> architecture.

[BibT_eX]

[DOI]

IBM Syst. J., 2006

Optimizing the Use of Static Buffers for DMA on a CELL Chip.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 2006

2005

Optimizing Compiler for the CELL Processor.

[BibT_eX]

[DOI]