Aurelien Bouteiller

Orcid: 0000-0001-5108-509X

According to our database1, Aurelien Bouteiller authored at least 68 papers between 2002 and 2023.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2023
Elastic deep learning through resilient collective operations.
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

2022
Comparing Distributed Termination Detection Algorithms for Modern HPC Platforms.
Int. J. Netw. Comput., 2022

Implicit Actions and Non-blocking Failure Recovery with MPI.
Proceedings of the 12th IEEE/ACM Workshop on Fault Tolerance for HPC at eXtreme Scale, 2022

Integrating process, control-flow, and data resiliency layers using a hybrid Fenix/Kokkos approach.
Proceedings of the IEEE International Conference on Cluster Computing, 2022

2021
Revisiting Credit Distribution Algorithms for Distributed Termination Detection.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2021

2020
Overhead of using spare nodes.
Int. J. High Perform. Comput. Appl., 2020

Fault tolerance of MPI applications in exascale systems: The ULFM solution.
Future Gener. Comput. Syst., 2020

Flexible Data Redistribution in a Task-Based Runtime System.
Proceedings of the IEEE International Conference on Cluster Computing, 2020

2019
Performance of asynchronous optimized Schwarz with one-sided communication.
Parallel Comput., 2019

Comparing the performance of rigid, moldable and grid-shaped applications on failure-prone HPC platforms.
Parallel Comput., 2019

Checkpointing Strategies for Shared High-Performance Computing Platforms.
Int. J. Netw. Comput., 2019

Local rollback for resilient MPI applications with application-level checkpointing and message logging.
Future Gener. Comput. Syst., 2019

Asynchronous Receiver-Driven Replay for Local Rollback of MPI Applications.
Proceedings of the 9th IEEE/ACM Workshop on Fault Tolerance for HPC at eXtreme Scale, 2019

Runtime level failure detection and propagation in HPC systems.
Proceedings of the 26th European MPI Users' Group Meeting, 2019

2018
PMIx: Process management for exascale environments.
Parallel Comput., 2018

A failure detector for HPC platforms.
Int. J. High Perform. Comput. Appl., 2018

Optimal Cooperative Checkpointing for Shared High-Performance Computing Platforms.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium Workshops, 2018

Do Moldable Applications Perform Better on Failure-Prone HPC Platforms?
Proceedings of the Euro-Par 2018: Parallel Processing Workshops, 2018

2017
A Framework for Out of Memory SVD Algorithms.
Proceedings of the High Performance Computing - 32nd International Conference, 2017

Evaluating Contexts in OpenSHMEM-X Reference Implementation.
Proceedings of the OpenSHMEM and Related Technologies. Big Compute and Big Data Convergence, 2017

2016
Failure detection and propagation in HPC systems.
Proceedings of the International Conference for High Performance Computing, 2016

Surviving Errors with OpenSHMEM.
Proceedings of the OpenSHMEM and Related Technologies. Enhancing OpenSHMEM for Hybrid Environments, 2016

2015
Algorithm-Based Fault Tolerance for Dense Matrix Factorizations, Multiple Failures and Accuracy.
ACM Trans. Parallel Comput., 2015

Composing resilience techniques: ABFT, periodic and incremental checkpointing.
Int. J. Netw. Comput., 2015

Practical scalable consensus for pseudo-synchronous distributed systems.
Proceedings of the International Conference for High Performance Computing, 2015

Sliding Substitution of Failed Nodes.
Proceedings of the 22nd European MPI Users' Group Meeting, 2015

Plan B: Interruption of Ongoing MPI Operations to Support Failure Recovery.
Proceedings of the 22nd European MPI Users' Group Meeting, 2015

From MPI to OpenSHMEM: Porting LAMMPS.
Proceedings of the OpenSHMEM and Related Technologies. Experiences, Implementations, and Technologies, 2015

Hierarchical DAG Scheduling for Hybrid Distributed Systems.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015


2014
Unified model for assessing checkpointing protocols at extreme-scale.
Concurr. Comput. Pract. Exp., 2014

PTG: an abstraction for unhindered parallelism.
Proceedings of the Fourth International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing, 2014

A Multithreaded Communication Substrate for OpenSHMEM.
Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models, 2014

Assessing the Impact of ABFT and Checkpoint Composite Strategies.
Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, 2014

2013
Kernel-assisted and topology-aware MPI collective communications on multicore/many-core platforms.
J. Parallel Distributed Comput., 2013

Post-failure recovery of MPI communication capability: Design and rationale.
Int. J. High Perform. Comput. Appl., 2013

PaRSEC: Exploiting Heterogeneity to Enhance Scalability.
Comput. Sci. Eng., 2013

Correlated set coordination in fault tolerant message logging protocols for many-core clusters.
Concurr. Comput. Pract. Exp., 2013

Extending the scope of the Checkpoint-on-Failure protocol for forward recovery in standard MPI.
Concurr. Comput. Pract. Exp., 2013

An evaluation of User-Level Failure Mitigation support in MPI.
Computing, 2013

Efficient parallelization of batch pattern training algorithm on many-core and cluster architectures.
Proceedings of the IEEE 7th International Conference on Intelligent Data Acquisition and Advanced Computing Systems, 2013

Multi-criteria Checkpointing Strategies: Response-Time versus Resource Utilization.
Proceedings of the Euro-Par 2013 Parallel Processing, 2013

2012
DAGuE: A generic distributed DAG engine for High Performance Computing.
Parallel Comput., 2012

Algorithm-based fault tolerance for dense matrix factorizations.
Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2012

HierKNEM: An Adaptive Framework for Kernel-Assisted and Topology-Aware Collective Communications on Many-core Clusters.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

Scalable Dense Linear Algebra on Heterogeneous Hardware.
Proceedings of the Transition of HPC Towards Exascale Computing, 2012

From Serial Loops to Parallel Execution on Distributed Systems.
Proceedings of the Euro-Par 2012 Parallel Processing - 18th International Conference, 2012

A Checkpoint-on-Failure Protocol for Algorithm-Based Recovery in Standard MPI.
Proceedings of the Euro-Par 2012 Parallel Processing - 18th International Conference, 2012

2011
Impact of Kernel-Assisted MPI Communication over Scientific Applications: CPMD and FFTW.
Proceedings of the Recent Advances in the Message Passing Interface, 2011

Flexible Development of Dense Linear Algebra Algorithms on Massively Parallel Architectures with DPLASMA.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Kernel Assisted Collective Intra-node MPI Communication among Multi-Core and Many-Core CPUs.
Proceedings of the International Conference on Parallel Processing, 2011

Correlated Set Coordination in Fault Tolerant Message Logging Protocols.
Proceedings of the Euro-Par 2011 Parallel Processing - 17th International Conference, 2011

Performance Portability of a GPU Enabled Factorization with the DAGuE Framework.
Proceedings of the 2011 IEEE International Conference on Cluster Computing (CLUSTER), 2011

2010
Redesigning the message logging model for high performance.
Concurr. Comput. Pract. Exp., 2010

Locality and Topology Aware Intra-node Communication among Multicore CPUs.
Proceedings of the Recent Advances in the Message Passing Interface, 2010

Dodging the Cost of Unavoidable Memory Copies in Message Logging Protocols.
Proceedings of the Recent Advances in the Message Passing Interface, 2010

2009
Reasons for a pessimistic or optimistic message logging protocol in MPI uncoordinated failure, recovery.
Proceedings of the 2009 IEEE International Conference on Cluster Computing, August 31, 2009

2008
Fault Tolerance Management for a Hierarchical GridRPC Middleware.
Proceedings of the 8th IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2008), 2008

2007
Retrospect: Deterministic Replay of MPI Applications for Interactive Distributed Debugging.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 14th European PVM/MPI User's Group Meeting, Paris, France, September 30, 2007

2006
MPICH-V Project: A Multiprotocol Automatic Fault-Tolerant MPI.
Int. J. High Perform. Comput. Appl., 2006

Hybrid Preemptive Scheduling of Message Passing Interface Applications on Grids.
Int. J. High Perform. Comput. Appl., 2006

Diet: New Developments and Recent Results.
Proceedings of the Euro-Par 2006 Workshops: Parallel Processing, 2006

2005
Impact of Event Logger on Causal Message Logging Protocols for Fault Tolerant MPI.
Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

2004
Coordinated checkpoint versus message log for fault tolerant MPI.
Int. J. High Perform. Comput. Netw., 2004

Hybrid Preemptive Scheduling of MPI Applications on the Grids.
Proceedings of the 5th International Workshop on Grid Computing (GRID 2004), 2004

Improved message logging versus improved coordinated checkpointing for fault tolerant MPI.
Proceedings of the 2004 IEEE International Conference on Cluster Computing (CLUSTER 2004), 2004

2003
MPICH-V2: a Fault Tolerant MPI for Volatile Nodes based on Pessimistic Sender Based Message Logging.
Proceedings of the ACM/IEEE SC2003 Conference on High Performance Networking and Computing, 2003

2002
MPICH-V: toward a scalable fault tolerant MPI for volatile nodes.
Proceedings of the 2002 ACM/IEEE conference on Supercomputing, 2002


  Loading...