Ann C. Gentile

Affiliations:
  • Sandia National Laboratories, USA


According to our database1, Ann C. Gentile authored at least 37 papers between 1997 and 2023.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2023
Driving HPC Operations With Holistic Monitoring and Operational Data Analytics (Dagstuhl Seminar 23171).
Dagstuhl Reports, 2023


2021
Delay sensitivity-driven congestion mitigation for HPC systems.
Proceedings of the ICS '21: 2021 International Conference on Supercomputing, 2021

2020
Application-aware Congestion Mitigation forHigh-Performance Computing Systems.
CoRR, 2020

ALAMO: Autonomous Lightweight Allocation, Management, and Optimization.
Proceedings of the Driving Scientific and Engineering Discoveries Through the Convergence of HPC, Big Data and AI, 2020

Measuring Congestion in High-Performance Datacenter Interconnects.
Proceedings of the 17th USENIX Symposium on Networked Systems Design and Implementation, 2020

2019
Understanding Fault Scenarios and Impacts through Fault Injection Experiments in Cielo.
CoRR, 2019

A Study of Network Congestion in Two Supercomputing High-Speed Interconnects.
Proceedings of the 2019 IEEE Symposium on High-Performance Interconnects, 2019

2018
Integrating Low-latency Analysis into HPC System Monitoring.
Proceedings of the 47th International Conference on Parallel Processing, 2018

Characterizing Supercomputer Traffic Networks Through Link-Level Analysis.
Proceedings of the IEEE International Conference on Cluster Computing, 2018


2017
Holistic Measurement-Driven System Assessment.
Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017

2016
Continuous whole-system monitoring toward rapid understanding of production HPC applications and systems.
Parallel Comput., 2016

Large-Scale Persistent Numerical Data Source Monitoring System Experiences.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

HPCMASPA Introduction and Committees.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

2015
Overtime: a tool for analyzing performance variation due to network interference.
Proceedings of the 3rd Workshop on Exascale MPI, 2015

Infrastructure for In Situ System Monitoring and Application Data Analysis.
Proceedings of the First Workshop on In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualization, 2015

New Systems, New Behaviors, New Patterns: Monitoring Insights from System Standup.
Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

Toward Rapid Understanding of Production HPC Applications and Systems.
Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

2014
The Lightweight Distributed Metric Service: A Scalable Infrastructure for Continuous Monitoring of Large Scale Computing Systems and Applications.
Proceedings of the International Conference for High Performance Computing, 2014

Demonstrating improved application performance using dynamic monitoring and task mapping.
Proceedings of the 2014 IEEE International Conference on Cluster Computing, 2014

2012
Filtering log data: Finding the needles in the Haystack.
Proceedings of the IEEE/IFIP International Conference on Dependable Systems and Networks, 2012

2011
Baler: deterministic, lossless log message clustering tool.
Comput. Sci. Res. Dev., 2011

Framework for Enabling System Understanding.
Proceedings of the Euro-Par 2011: Parallel Processing Workshops - CCPI, CGWS, HeteroPar, HiBB, HPCVirt, HPPC, HPSS, MDGS, ProPer, Resilience, UCHPC, VHPC, Bordeaux, France, August 29, 2011

2010
Combining Virtualization, resource characterization, and Resource management to enable efficient high performance compute platforms through intelligent dynamic resource allocation.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Quantifying effectiveness of failure prediction and response in HPC systems: Methodology and example.
Proceedings of the IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W 2010), Chicago, Illinois, USA, June 28, 2010

Using Cloud Constructs and Predictive Analysis to Enable Pre-Failure Process Migration in HPC Systems.
Proceedings of the 10th IEEE/ACM International Conference on Cluster, 2010

2009
Resource monitoring and management with OVIS to enable HPC in cloud computing environments.
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

2008
Ovis-2: A robust distributed architecture for scalable RAS.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Using Probabilistic Characterization to Reduce Runtime Faults in HPC Systems.
Proceedings of the 8th IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2008), 2008

2006
OVIS: a tool for intelligent, real-time monitoring of computational clusters.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

2005
Meaningful Automated Statistical Analysis of Large Computational Clusters.
Proceedings of the 2005 IEEE International Conference on Cluster Computing (CLUSTER 2005), September 26, 2005

1999
Lilith Lights: A Network Traffic Visualization Tool for High Performance Clusters.
Proceedings of the High-Performance Computing and Networking, 7th International Conference, 1999

The Lilith framework for the rapid development of secure scalable tools for distributed computing (short paper).
Proceedings of the Distributed Applications and Interoperable Systems II, Second IFIP WG 6.1 International Working Conference on Distributed Applications and Interoperable Systems, June 28, 1999

1998
A visualization tool for parallel and distributed computing using the Lilith framework.
Proceedings of the SIGMETRICS Symposium on Parallel and Distributed Tools, 1998

Lilith: A Software Framework for the Rapid Development of Scalable Tools for Distributed Computing.
Proceedings of the Seventh IEEE International Symposium on High Performance Distributed Computing, 1998

1997
Lilith: Scalable Execution of User Code for Distributed Computing.
Proceedings of the 6th International Symposium on High Performance Distributed Computing, 1997


  Loading...