Khaled Hamidouche

Proceedings of the 2016 IEEE International Conference on Cloud Computing Technology and Science, 2016

CUDA Kernel Based Collective Reduction Operations on Large-scale GPU Clusters.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM 16th International Symposium on Cluster, 2016

2015

Porting scientific libraries to PGAS in XSEDE resources: practice and experience.

[BibT_eX]

[DOI]

Antonio Gómez-Iglesias

Proceedings of the 2015 XSEDE Conference: Scientific Advancements Enabled by Enhanced Cyberinfrastructure, St. Louis, MO, USA, July 26, 2015

Designing Non-blocking Personalized Collectives with Near Perfect Overlap for RDMA-Enabled Clusters.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing - 30th International Conference, 2015

A case for application-oblivious energy-efficient MPI runtime.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2015

GPU-Aware Design, Implementation, and Evaluation of Non-blocking Collective Benchmarks.

[BibT_eX]

[DOI]

Proceedings of the 22nd European MPI Users' Group Meeting, 2015

Accelerating k-NN Algorithm with Hybrid MPI and OpenSHMEM.

[BibT_eX]

[DOI]

Proceedings of the OpenSHMEM and Related Technologies. Experiences, Implementations, and Technologies, 2015

Scalable Out-of-core OpenSHMEM Library for HPC.

[BibT_eX]

[DOI]

Antonio Gómez-Iglesias

Jérôme Vienne

Christopher S. Simmons

William L. Barth

Raghunath Raja Chandrasekar

Proceedings of the OpenSHMEM and Related Technologies. Experiences, Implementations, and Technologies, 2015

A Case for Non-blocking Collectives in OpenSHMEM: Design, Implementation, and Performance Evaluation using MVAPICH2-X.

[BibT_eX]

[DOI]

Proceedings of the OpenSHMEM and Related Technologies. Experiences, Implementations, and Technologies, 2015

High-Performance Coarray Fortran Support with MVAPICH2-X: Initial Experience and Evaluation.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, 2015

Impact of InfiniBand DC Transport Protocol on Energy Consumption of All-to-All Collective Algorithms.

[BibT_eX]

[DOI]

Proceedings of the 23rd IEEE Annual Symposium on High-Performance Interconnects, 2015

Offloaded GPU Collectives Using CORE-Direct and CUDA Capabilities on InfiniBand Clusters.

[BibT_eX]

[DOI]

Proceedings of the 22nd IEEE International Conference on High Performance Computing, 2015

High Performance OpenSHMEM Strided Communication Support with InfiniBand UMR.

[BibT_eX]

[DOI]

Proceedings of the 22nd IEEE International Conference on High Performance Computing, 2015

High-Performance and Scalable Design of MPI-3 RMA on Xeon Phi Clusters.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2015: Parallel Processing, 2015

High Performance MPI Datatype Support with User-Mode Memory Registration: Challenges, Designs, and Benefits.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

Exploiting GPUDirect RDMA in Designing High Performance OpenSHMEM for NVIDIA GPU Clusters.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

Power-Check: An Energy-Efficient Checkpointing Framework for HPC Clusters.

[BibT_eX]

[DOI]

Akshay Venkatesh

Raghunath Rajachandrasekar

Proceedings of the 15th IEEE/ACM International Symposium on Cluster, 2015

2014

Designing MPI Library with Dynamic Connected Transport (DCT) of InfiniBand: Early Experiences.

[BibT_eX]

[DOI]

Proceedings of the Supercomputing - 29th International Conference, 2014

Understanding the Memory-Utilization of MPI Libraries: Challenges and Designs in Implementing the MPI_T Interface.

[BibT_eX]

[DOI]

Proceedings of the 21st European MPI Users' Group Meeting, 2014

Initial study of multi-endpoint runtime for MPI+OpenMP hybrid programming model on multi-core systems.

[BibT_eX]

[DOI]

Miao Luo

Xiaoyi Lu

Raghunath Rajachandrasekar

Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2014

Scalable MiniMD Design with Hybrid MPI and OpenSHMEM.

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models, 2014

Designing Scalable Out-of-core Sorting with Hybrid MPI+PGAS Programming Models.

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models, 2014

High Performance Alltoall and Allgather Designs for InfiniBand MIC Clusters.

[BibT_eX]

[DOI]

Akshay Venkatesh

Sreeram Potluri

Miao Luo

Raghunath Rajachandrasekar

Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

Optimizing Collective Communication in UPC.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, 2014

HAND: A Hybrid Approach to Accelerate Non-contiguous Data Movement Using MPI Datatypes on GPU Clusters.

[BibT_eX]

[DOI]

Proceedings of the 43rd International Conference on Parallel Processing, 2014

MIC-Check: a distributed check pointing framework for the intel many integrated cores architecture.

[BibT_eX]

[DOI]

Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing, 2014

A high performance broadcast design with hardware multicast and GPUDirect RDMA for streaming applications on Infiniband clusters.

[BibT_eX]

[DOI]

Proceedings of the 21st International Conference on High Performance Computing, 2014

Designing efficient small message transfer mechanism for inter-node MPI communication on InfiniBand GPU clusters.

[BibT_eX]

[DOI]

Proceedings of the 21st International Conference on High Performance Computing, 2014

Scalable Graph500 design with MPI-3 RMA.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE International Conference on Cluster Computing, 2014

High performance OpenSHMEM for Xeon Phi clusters: Extensions, runtime designs and application co-design.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE International Conference on Cluster Computing, 2014

2013

Parallel Smith-Waterman Comparison on Multicore and Manycore Computing Platforms with BSP++.

[BibT_eX]

[DOI]

Fernando Machado Mendonca

Joel Falcou

Alba Cristina Magalhaes Alves de Melo

Daniel Etiemble

Int. J. Parallel Program., 2013

MVAPICH-PRISM: a proxy-based communication framework using InfiniBand and SCIF for intel MIC clusters.

[BibT_eX]

[DOI]

Hari Subramoni

Proceedings of the International Conference for High Performance Computing, 2013

Efficient and truly passive MPI-3 RMA using InfiniBand atomics.

[BibT_eX]

[DOI]

Proceedings of the 20th European MPI Users's Group Meeting, 2013

MIC-RO: enabling efficient remote offload on heterogeneous many integrated core (MIC) clusters with InfiniBand.

[BibT_eX]

[DOI]

Sreeram Potluri

Hari Subramoni

Proceedings of the International Conference on Supercomputing, 2013

Efficient Inter-node MPI Communication Using GPUDirect RDMA for InfiniBand Clusters with NVIDIA GPUs.

[BibT_eX]

[DOI]

Proceedings of the 42nd International Conference on Parallel Processing, 2013

Designing Optimized MPI Broadcast and Allreduce for Many Integrated Core (MIC) InfiniBand Clusters.

[BibT_eX]

[DOI]

Proceedings of the IEEE 21st Annual Symposium on High-Performance Interconnects, 2013

A scalable and portable approach to accelerate hybrid HPL on heterogeneous CPU-GPU clusters.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE International Conference on Cluster Computing, 2013

2011

Programmation des architectures hiérarchiques et hétérogènes. (Programming hierarxchical and heterogenous machines).

[BibT_eX]

[DOI]

PhD thesis, 2011

A framework for an automatic hybrid MPI+OpenMP code generation.

[BibT_eX]

[DOI]

Joel Falcou

Daniel Etiemble

Proceedings of the 2011 Spring Simulation Multi-conference, 2011

Parallel Biological Sequence Comparison on Heterogeneous High Performance Computing Platforms with BSP++.

[BibT_eX]

[DOI]