George Bosilca

Orcid: 0000-0003-2411-8495

Affiliations:
  • University of Tennessee, Knoxville, USA


According to our database1, George Bosilca authored at least 164 papers between 2002 and 2023.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2023
Cache Optimization and Performance Modeling of Batched, Small, and Rectangular Matrix Multiplication on Intel, AMD, and Fujitsu Processors.
ACM Trans. Math. Softw., September, 2023

O(N) distributed direct factorization of structured dense matrices using runtime systems.
CoRR, 2023

Elastic deep learning through resilient collective operations.
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

Synchronizing MPI Processes in Space and Time.
Proceedings of the 30th European MPI Users' Group Meeting, 2023

Improving the Scaling of an Asynchronous Many-Task Runtime with a Lightweight Communication Engine.
Proceedings of the 52nd International Conference on Parallel Processing, 2023

O(N) distributed direct factorization of structured dense matrices using runtime systems.
Proceedings of the 52nd International Conference on Parallel Processing, 2023

Performance Insights into Device-initiated RMA Using Kokkos Remote Spaces.
Proceedings of the IEEE International Conference on Cluster Computing, 2023

Reducing Data Motion and Energy Consumption of Geospatial Modeling Applications Using Automated Precision Conversion.
Proceedings of the IEEE International Conference on Cluster Computing, 2023

2022
Evaluating Data Redistribution in PaRSEC.
IEEE Trans. Parallel Distributed Syst., 2022

Accelerating Geostatistical Modeling and Prediction With Mixed-Precision Computations: A High-Productivity Approach With PaRSEC.
IEEE Trans. Parallel Distributed Syst., 2022

Using long vector extensions for MPI reductions.
Parallel Comput., 2022

Comparing Distributed Termination Detection Algorithms for Modern HPC Platforms.
Int. J. Netw. Comput., 2022

MARs: Memory Access Rearrangements in Open MPI.
Proceedings of the IEEE/ACM Workshop on Latest Advances in Scalable Algorithms for Large-Scale Heterogeneous Systems, 2022

Reshaping Geostatistical Modeling and Prediction for Extreme-Scale Environmental Applications.
Proceedings of the SC22: International Conference for High Performance Computing, 2022

Sequential Task Flow Runtime Model Improvements and Limitations.
Proceedings of the IEEE/ACM International Workshop on Runtime and Operating Systems for Supercomputers, 2022

Composition of Algorithmic Building Blocks in Template Task Graphs.
Proceedings of the IEEE/ACM Parallel Applications Workshop: Alternatives To MPI+X, 2022

Generalized Flow-Graph Programming Using Template Task-Graphs: Initial Implementation and Assessment.
Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium, 2022

A Framework to Exploit Data Sparsity in Tile Low-Rank Cholesky Factorization.
Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium, 2022

Implicit Actions and Non-blocking Failure Recovery with MPI.
Proceedings of the 12th IEEE/ACM Workshop on Fault Tolerance for HPC at eXtreme Scale, 2022

Integrating process, control-flow, and data resiliency layers using a hybrid Fenix/Kokkos approach.
Proceedings of the IEEE International Conference on Cluster Computing, 2022

Pushing the Boundaries of Small Tasks: Scalable Low-Overhead Data-Flow Programming in TTG.
Proceedings of the IEEE International Conference on Cluster Computing, 2022

Lossy all-to-all exchange for accelerating parallel 3-D FFTs on hybrid architectures with GPUs.
Proceedings of the IEEE International Conference on Cluster Computing, 2022

2021
Callback-based completion notification using MPI Continuations.
Parallel Comput., 2021

An international survey on MPI users.
Parallel Comput., 2021

Quo Vadis MPI RMA? Towards a More Efficient Use of MPI One-Sided Communication.
CoRR, 2021

Distributed-memory multi-GPU block-sparse tensor contraction for electronic structure.
Proceedings of the 35th IEEE International Parallel and Distributed Processing Symposium, 2021

Leveraging PaRSEC Runtime Support to Tackle Challenging 3D Data-Sparse Matrix Problems.
Proceedings of the 35th IEEE International Parallel and Distributed Processing Symposium, 2021

Revisiting Credit Distribution Algorithms for Distributed Termination Detection.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2021

2020
Overhead of using spare nodes.
Int. J. High Perform. Comput. Appl., 2020

Fault tolerance of MPI applications in exascale systems: The ULFM solution.
Future Gener. Comput. Syst., 2020

A survey of MPI usage in the US exascale computing project.
Concurr. Comput. Pract. Exp., 2020

Task bench: a parameterized benchmark for evaluating parallel runtime performance.
Proceedings of the International Conference for High Performance Computing, 2020

The Template Task Graph (TTG) - an emerging practical dataflow programming paradigm for scientific simulation at extreme scale.
Proceedings of the 5th IEEE/ACM International Workshop on Extreme Scale Programming Models and Middleware, 2020

Using Advanced Vector Extensions AVX-512 for MPI Reductions.
Proceedings of the EuroMPI/USA '20: 27th European MPI Users' Group Meeting, 2020

Extreme-Scale Task-Based Cholesky Factorization Toward Climate and Weather Prediction Applications.
Proceedings of the PASC '20: Platform for Advanced Scientific Computing Conference, Geneva, Switzerland, June 29, 2020

Communication Avoiding 2D Stencil Implementations over PaRSEC Task-Based Runtime.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium Workshops, 2020

FFT-based Gradient Sparsification for the Distributed Training of Deep Neural Networks.
Proceedings of the HPDC '20: The 29th International Symposium on High-Performance Parallel and Distributed Computing, 2020

HAN: a Hierarchical AutotuNed Collective Communication Framework.
Proceedings of the IEEE International Conference on Cluster Computing, 2020

Predicting MPI Collective Communication Performance Using Machine Learning.
Proceedings of the IEEE International Conference on Cluster Computing, 2020

Flexible Data Redistribution in a Task-Based Runtime System.
Proceedings of the IEEE International Conference on Cluster Computing, 2020

Using Arm Scalable Vector Extension to Optimize OPEN MPI.
Proceedings of the 20th IEEE/ACM International Symposium on Cluster, 2020

DeepFreeze: Towards Scalable Asynchronous Checkpointing of Deep Learning Models.
Proceedings of the 20th IEEE/ACM International Symposium on Cluster, 2020

2019
Comparing the performance of rigid, moldable and grid-shaped applications on failure-prone HPC platforms.
Parallel Comput., 2019

Checkpointing Strategies for Shared High-Performance Computing Platforms.
Int. J. Netw. Comput., 2019

Local rollback for resilient MPI applications with application-level checkpointing and message logging.
Future Gener. Comput. Syst., 2019

Evaluation of Programming Models to Address Load Imbalance on Distributed Multi-Core CPUs: A Case Study with Block Low-Rank Factorization.
Proceedings of the 2019 IEEE/ACM Parallel Applications Workshop, Alternatives To MPI, 2019

Asynchronous Receiver-Driven Replay for Local Rollback of MPI Applications.
Proceedings of the 9th IEEE/ACM Workshop on Fault Tolerance for HPC at eXtreme Scale, 2019

Understanding Scalability and Fine-Grain Parallelism of Synchronous Data Parallel Training.
Proceedings of the 2019 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments, 2019

Generic Matrix Multiplication for Multi-GPU Accelerated Distributed-Memory Platforms over PaRSEC.
Proceedings of the 10th IEEE/ACM Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2019

Performance Analysis of Tile Low-Rank Cholesky Factorization Using PaRSEC Instrumentation Tools.
Proceedings of the IEEE/ACM International Workshop on Programming and Performance Visualization Tools, 2019

Runtime level failure detection and propagation in HPC systems.
Proceedings of the 26th European MPI Users' Group Meeting, 2019

Towards Portable Online Prediction of Network Utilization Using MPI-Level Monitoring.
Proceedings of the Euro-Par 2019: Parallel Processing, 2019

Give MPI Threading a Fair Chance: A Study of Multithreaded MPI Designs.
Proceedings of the 2019 IEEE International Conference on Cluster Computing, 2019

2018
Argobots: A Lightweight Low-Level Threading and Tasking Framework.
IEEE Trans. Parallel Distributed Syst., 2018

A failure detector for HPC platforms.
Int. J. High Perform. Comput. Appl., 2018

SuperNeurons: FFT-based Gradient Sparsification in the Distributed Training of Deep Neural Networks.
CoRR, 2018

Optimal Cooperative Checkpointing for Shared High-Performance Computing Platforms.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium Workshops, 2018

ADAPT: an event-based adaptive collective communication framework.
Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing, 2018

Do Moldable Applications Perform Better on Failure-Prone HPC Platforms?
Proceedings of the Euro-Par 2018: Parallel Processing Workshops, 2018

2017
Dynamic task discovery in PaRSEC: a data-flow task-based runtime.
Proceedings of the 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2017

Using software-based performance counters to expose low-level open MPI performance information.
Proceedings of the 24th European MPI Users' Group Meeting, 2017

Efficient Communications in Training Large Scale Neural Networks.
Proceedings of the on Thematic Workshops of ACM Multimedia 2017, Mountain View, CA, USA, October 23, 2017

Online Dynamic Monitoring of MPI Communications.
Proceedings of the Euro-Par 2017: Parallel Processing - 23rd International Conference on Parallel and Distributed Computing, Santiago de Compostela, Spain, August 28, 2017

2016
Assessing the cost of redistribution followed by a computational kernel: Complexity and performance results.
Parallel Comput., 2016

Failure detection and propagation in HPC systems.
Proceedings of the International Conference for High Performance Computing, 2016

Surviving Errors with OpenSHMEM.
Proceedings of the OpenSHMEM and Related Technologies. Enhancing OpenSHMEM for Hybrid Environments, 2016

GPU-Aware Non-contiguous Data Movement In Open MPI.
Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing, 2016

Exploiting a Parametrized Task Graph Model for the Parallelization of a Sparse Direct Multifrontal Solver.
Proceedings of the Euro-Par 2016: Parallel Processing Workshops, 2016

DSN 2016 Tutorial: Resilience for Scientific Computing: From Theory to Practice.
Proceedings of the 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops, 2016

2015
Algorithm-Based Fault Tolerance for Dense Matrix Factorizations, Multiple Failures and Accuracy.
ACM Trans. Parallel Comput., 2015

Composing resilience techniques: ABFT, periodic and incremental checkpointing.
Int. J. Netw. Comput., 2015

Practical scalable consensus for pseudo-synchronous distributed systems.
Proceedings of the International Conference for High Performance Computing, 2015

Sliding Substitution of Failed Nodes.
Proceedings of the 22nd European MPI Users' Group Meeting, 2015

Plan B: Interruption of Ongoing MPI Operations to Support Failure Recovery.
Proceedings of the 22nd European MPI Users' Group Meeting, 2015

Accelerating NWChem Coupled Cluster Through Dataflow-Based Execution.
Proceedings of the Parallel Processing and Applied Mathematics, 2015

From MPI to OpenSHMEM: Porting LAMMPS.
Proceedings of the OpenSHMEM and Related Technologies. Experiences, Implementations, and Technologies, 2015

Hierarchical DAG Scheduling for Hybrid Distributed Systems.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

Design for a Soft Error Resilient Dynamic Task-Based Runtime.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015


PaRSEC in Practice: Optimizing a Legacy Chemistry Application through Distributed Task-Based Execution.
Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

2014
An efficient distributed randomized algorithm for solving large dense symmetric indefinite linear systems.
Parallel Comput., 2014

Power profiling of Cholesky and QR factorizations on distributed memory systems.
Comput. Sci. Res. Dev., 2014

Unified model for assessing checkpointing protocols at extreme-scale.
Concurr. Comput. Pract. Exp., 2014

PTG: an abstraction for unhindered parallelism.
Proceedings of the Fourth International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing, 2014

Optimizations to enhance sustainability of MPI applications.
Proceedings of the 21st European MPI Users' Group Meeting, 2014

A Multithreaded Communication Substrate for OpenSHMEM.
Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models, 2014

Taking Advantage of Hybrid Systems for Sparse Direct Solvers via Task-Based Runtimes.
Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, 2014

Assessing the Impact of ABFT and Checkpoint Composite Strategies.
Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, 2014

Task-Based Programming for Seismic Imaging: Preliminary Results.
Proceedings of the 2014 IEEE International Conference on High Performance Computing and Communications, 2014

Assembly Operations for Multicore Architectures Using Task-Based Runtime Systems.
Proceedings of the Euro-Par 2014: Parallel Processing Workshops, 2014

Utilizing dataflow-based execution for coupled cluster methods.
Proceedings of the 2014 IEEE International Conference on Cluster Computing, 2014

2013
Kernel-assisted and topology-aware MPI collective communications on multicore/many-core platforms.
J. Parallel Distributed Comput., 2013

Post-failure recovery of MPI communication capability: Design and rationale.
Int. J. High Perform. Comput. Appl., 2013

PaRSEC: Exploiting Heterogeneity to Enhance Scalability.
Comput. Sci. Eng., 2013

Correlated set coordination in fault tolerant message logging protocols for many-core clusters.
Concurr. Comput. Pract. Exp., 2013

Extending the scope of the Checkpoint-on-Failure protocol for forward recovery in standard MPI.
Concurr. Comput. Pract. Exp., 2013

An evaluation of User-Level Failure Mitigation support in MPI.
Computing, 2013

CPU-GPU hybrid bidiagonal reduction with soft error resilience.
Proceedings of the Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2013

Parallel reduction to hessenberg form with algorithm-based fault tolerance.
Proceedings of the International Conference for High Performance Computing, 2013

Efficient parallelization of batch pattern training algorithm on many-core and cluster architectures.
Proceedings of the IEEE 7th International Conference on Intelligent Data Acquisition and Advanced Computing Systems, 2013

2012
DAGuE: A generic distributed DAG engine for High Performance Computing.
Parallel Comput., 2012



Algorithm-based fault tolerance for dense matrix factorizations.
Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2012

HierKNEM: An Adaptive Framework for Kernel-Assisted and Topology-Aware Collective Communications on Many-core Clusters.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

Scalable Dense Linear Algebra on Heterogeneous Hardware.
Proceedings of the Transition of HPC Towards Exascale Computing, 2012

From Serial Loops to Parallel Execution on Distributed Systems.
Proceedings of the Euro-Par 2012 Parallel Processing - 18th International Conference, 2012

A Checkpoint-on-Failure Protocol for Algorithm-Based Recovery in Standard MPI.
Proceedings of the Euro-Par 2012 Parallel Processing - 18th International Conference, 2012

2011
Impact of Kernel-Assisted MPI Communication over Scientific Applications: CPMD and FFTW.
Proceedings of the Recent Advances in the Message Passing Interface, 2011

OMPIO: A Modular Software Architecture for MPI I/O.
Proceedings of the Recent Advances in the Message Passing Interface, 2011

Scalable Runtime for MPI: Efficiently Building the Communication Infrastructure.
Proceedings of the Recent Advances in the Message Passing Interface, 2011

Will MPI Remain Relevant?
Proceedings of the Recent Advances in the Message Passing Interface, 2011

Flexible Development of Dense Linear Algebra Algorithms on Massively Parallel Architectures with DPLASMA.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Kernel Assisted Collective Intra-node MPI Communication among Multi-Core and Many-Core CPUs.
Proceedings of the International Conference on Parallel Processing, 2011

The Common Communication Interface (CCI).
Proceedings of the IEEE 19th Annual Symposium on High Performance Interconnects, 2011

Correlated Set Coordination in Fault Tolerant Message Logging Protocols.
Proceedings of the Euro-Par 2011 Parallel Processing - 17th International Conference, 2011

Algorithms, Models and Tools for Parallel Computing on Heterogeneous Platforms (HeteroPar 2011).
Proceedings of the Euro-Par 2011: Parallel Processing Workshops - CCPI, CGWS, HeteroPar, HiBB, HPCVirt, HPPC, HPSS, MDGS, ProPer, Resilience, UCHPC, VHPC, Bordeaux, France, August 29, 2011

Process Distance-Aware Adaptive MPI Collective Communications.
Proceedings of the 2011 IEEE International Conference on Cluster Computing (CLUSTER), 2011

On Scalability for MPI Runtime Systems.
Proceedings of the 2011 IEEE International Conference on Cluster Computing (CLUSTER), 2011

Performance Portability of a GPU Enabled Factorization with the DAGuE Framework.
Proceedings of the 2011 IEEE International Conference on Cluster Computing (CLUSTER), 2011

2010
Improvement of parallelization efficiency of batch pattern BP training algorithm using Open MPI.
Proceedings of the International Conference on Computational Science, 2010

Self-healing network for scalable fault-tolerant runtime environments.
Future Gener. Comput. Syst., 2010

Redesigning the message logging model for high performance.
Concurr. Comput. Pract. Exp., 2010

Locality and Topology Aware Intra-node Communication among Multicore CPUs.
Proceedings of the Recent Advances in the Message Passing Interface, 2010

Dodging the Cost of Unavoidable Memory Copies in Message Logging Protocols.
Proceedings of the Recent Advances in the Message Passing Interface, 2010

2009
Algorithm-based fault tolerance applied to high performance computing.
J. Parallel Distributed Comput., 2009

Constructing Resiliant Communication Infrastructure for Runtime Environments.
Proceedings of the Parallel Computing: From Multicores and GPU's to Petascale, 2009

Reasons for a pessimistic or optimistic message logging protocol in MPI uncoordinated failure, recovery.
Proceedings of the 2009 IEEE International Conference on Cluster Computing, August 31, 2009

2008
Algorithmic Based Fault Tolerance Applied to High Performance Computing
CoRR, 2008

The Next Frontier.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2008

A Scalable Tools Communications Infrastructure.
Proceedings of the 22nd Annual International Symposium on High Performance Computing Systems and Applications (HPCS 2008), 2008

2007
Recovery Patterns for Iterative Methods in a Parallel Unstable Environment.
SIAM J. Sci. Comput., 2007

Open MPI: a High Performance, Flexible Implementation of MPI Point-to-Point Communications.
Parallel Process. Lett., 2007

MPI collective algorithm selection and quadtree encoding.
Parallel Comput., 2007

Performance analysis of MPI collective operations.
Clust. Comput., 2007

Advanced MPI Programming.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 14th European PVM/MPI User's Group Meeting, Paris, France, September 30, 2007

An Evaluation of Open MPI's Matching Transport Layer on the Cray XT.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 14th European PVM/MPI User's Group Meeting, Paris, France, September 30, 2007

Retrospect: Deterministic Replay of MPI Applications for Interactive Distributed Debugging.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 14th European PVM/MPI User's Group Meeting, Paris, France, September 30, 2007

The X-Scale Challenge.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 14th European PVM/MPI User's Group Meeting, Paris, France, September 30, 2007

Optimal Routing in Binomial Graph Networks.
Proceedings of the Eighth International Conference on Parallel and Distributed Computing, 2007

Self-healing in Binomial Graph Networks.
Proceedings of the On the Move to Meaningful Internet Systems 2007: OTM 2007 Workshops, 2007

Binomial Graph: A Scalable and Fault-Tolerant Logical Network Topology.
Proceedings of the Parallel and Distributed Processing and Applications, 2007

Network Fault Tolerance in Open MPI.
Proceedings of the Euro-Par 2007, 2007

Decision Trees and MPI Collective Algorithm Selection Problem.
Proceedings of the Euro-Par 2007, 2007

Topic 9 Parallel and Distributed Programming.
Proceedings of the Euro-Par 2007, 2007

Reliability Analysis of Self-Healing Network using Discrete-Event Simulation.
Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2007), 2007

2006
Self-adapting numerical software (SANS) effort.
IBM J. Res. Dev., 2006

High Performance RDMA Protocols in HPC.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2006

Implementation and Usage of the PERUSE-Interface in Open MPI.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2006

Scalable Fault Tolerant Protocol for Parallel Runtime Environments.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2006

Open MPI: A High-Performance, Heterogeneous MPI.
Proceedings of the 2006 IEEE International Conference on Cluster Computing, 2006

2005
Process Fault Tolerance: Semantics, Design and Applications for High Performance Computing.
Int. J. High Perform. Comput. Appl., 2005

Hash Functions for Datatype Signatures in MPI.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2005

Advanced Message Passing and Threading Issues.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2005

Scalable Fault Tolerant MPI: Extending the Recovery Algorithm.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2005

Analysis of the Component Architecture Overhead in Open MPI.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2005

Fault tolerant high performance computing by a coding approach.
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2005

2004
OVM, une machine parallèle virtuelle à exécution dans le désordre.
Tech. Sci. Informatiques, 2004

TEG: A High-Performance, Scalable, Multi-network Point-to-Point Communications Methodology.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2004

Open MPI's TEG Point-to-Point Communications Methodology: Comparison to Existing Implementations.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2004

Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2004

2002
OVM: Out-of-order execution parallel virtual machine.
Future Gener. Comput. Syst., 2002

MPICH-V: toward a scalable fault tolerant MPI for volatile nodes.
Proceedings of the 2002 ACM/IEEE conference on Supercomputing, 2002

MPICH-CM: A Communication Library Design for a P2P MPI Implementation.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 9th European PVM/MPI Users' Group Meeting, Linz, Austria, September 29, 2002


  Loading...