Ignacio Laguna

According to our database1, Ignacio Laguna authored at least 44 papers between 2007 and 2019.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Other 

Links

Homepages:

On csauthors.net:

Bibliography

2019
Failure recovery for bulk synchronous applications with MPI stages.
Parallel Computing, 2019

Pruners.
IJHPCA, 2019

GPUMixer: Performance-Driven Floating-Point Tuning for GPU Scientific Applications.
Proceedings of the High Performance Computing - 34th International Conference, 2019

A large-scale study of MPI usage in open-source HPC applications.
Proceedings of the International Conference for High Performance Computing, 2019

SAFIRE: Scalable and Accurate Fault Injection for Parallel Multithreaded Applications.
Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium, 2019

AMPT-GA: automatic mixed precision floating point tuning for GPU applications.
Proceedings of the ACM International Conference on Supercomputing, 2019

2018
PARIS: Predicting Application Resilience Using Machine Learning.
CoRR, 2018

Multi-level analysis of compiler induced variability and performance tradeoffs.
CoRR, 2018

FlipTracker: Understanding Natural Error Resilience in HPC Applications.
CoRR, 2018

MPI Stages: Checkpointing MPI State for Bulk Synchronous Applications.
Proceedings of the 25th European MPI Users' Group Meeting, 2018

SWORD: A Bounded Memory-Overhead Detector of OpenMP Data Races in Production Runs.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

2017
Exploring versioned distributed arrays for resilience in scientific applications.
IJHPCA, 2017

Report of the HPC Correctness Summit, Jan 25-26, 2017, Washington, DC.
CoRR, 2017

Snowpack: efficient parameter choice for GPU kernels via static analysis and statistical prediction.
Proceedings of the 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2017

REFINE: realistic fault injection via compiler-based instrumentation for accuracy, portability and speed.
Proceedings of the International Conference for High Performance Computing, 2017

Noise Injection Techniques to Expose Subtle and Unintended Message Races.
Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2017

Apollo: Reusable Models for Fast, Dynamic Tuning of Input-Dependent Code.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

Understanding the Spatial Characteristics of DRAM Errors in HPC Clusters.
Proceedings of the ACM Workshop on Fault-Tolerance for HPC at Extreme Scale, 2017

2016
Evaluating and extending user-level fault tolerance in MPI applications.
IJHPCA, 2016

Pinpointing scale-dependent integer overflow bugs in large-scale parallel applications.
Proceedings of the International Conference for High Performance Computing, 2016

Testing Infrastructure for OpenMP Debugging Interface Implementations.
Proceedings of the OpenMP: Memory, Devices, and Tasks, 2016

ARCHER: Effectively Spotting Data Races in Large OpenMP Applications.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

IPAS: intelligent protection against silent output corruption in scientific applications.
Proceedings of the 2016 International Symposium on Code Generation and Optimization, 2016

2015
Diagnosis of Performance Faults in LargeScale MPI Applications via Probabilistic Progress-Dependence Inference.
IEEE Trans. Parallel Distrib. Syst., 2015

Debugging high-performance computing applications at massive scales.
Commun. ACM, 2015

Clock delta compression for scalable order-replay of non-deterministic parallel applications.
Proceedings of the International Conference for High Performance Computing, 2015

Lessons Learned from Implementing OMPD: A Debugging Interface for OpenMP.
Proceedings of the OpenMP: Heterogenous Execution and Data Movements, 2015

Versioned Distributed Arrays for Resilience in Scientific Applications: Global View Resilience.
Proceedings of the International Conference on Computational Science, 2015

2014
Towards providing low-overhead data race detection for large OpenMP applications.
Proceedings of the 2014 LLVM Compiler Infrastructure in HPC, 2014

Evaluating User-Level Fault Tolerance for MPI Applications.
Proceedings of the 21st European MPI Users' Group Meeting, 2014

Accurate application progress analysis for large-scale parallel debugging.
Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, 2014

2013
Automatic Problem Localization via Multi-dimensional Metric Profiling.
Proceedings of the IEEE 32nd Symposium on Reliable Distributed Systems, 2013

A study of application-level recovery methods for transient network faults.
Proceedings of the Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2013

Overcoming extreme-scale reproducibility challenges through a unified, targeted, and multilevel toolset.
Proceedings of the 1st International Workshop on Software Engineering for High Performance Computing in Computational Science and Engineering, 2013

Performance Analysis Techniques for the Exascale Co-Design Process.
Proceedings of the Parallel Computing: Accelerating Computational Science and Engineering (CSE), 2013

2012
Automatic fault characterization via abnormality-enhanced classification.
Proceedings of the IEEE/IFIP International Conference on Dependable Systems and Networks, 2012

Probabilistic diagnosis of performance faults in large-scale parallel applications.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

2011
Large scale debugging of parallel tasks with AutomaDeD.
Proceedings of the Conference on High Performance Computing Networking, 2011

2010
AutomaDeD: Automata-based debugging for dissimilar parallel tasks.
Proceedings of the 2010 IEEE/IFIP International Conference on Dependable Systems and Networks, 2010

2009
Scalable temporal order analysis for large scale debugging.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009

How to Keep Your Head above Water While Detecting Errors.
Proceedings of the Middleware 2009, ACM/IFIP/USENIX, 10th International Middleware Conference, Urbana, IL, USA, November 30, 2009

Stateful error detection in high throughput applications.
Proceedings of the Middleware 2008, 2009

2007
Stateful Detection in High Throughput Distributed Systems.
Proceedings of the 26th IEEE Symposium on Reliable Distributed Systems (SRDS 2007), 2007

Distributed Diagnosis of Failures in a Three Tier E-Commerce System.
Proceedings of the 26th IEEE Symposium on Reliable Distributed Systems (SRDS 2007), 2007


  Loading...