Stephen L. Scott

According to our database1, Stephen L. Scott
  • authored at least 94 papers between 1994 and 2016.
  • has a "Dijkstra number"2 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Other 

Links

On csauthors.net:

Bibliography

2016
Adding Fault Tolerance to NPB Benchmarks Using ULFM.
Proceedings of the ACM Workshop on Fault-Tolerance for HPC at Extreme Scale, 2016

A Cooperative Approach to Virtual Machine Based Fault Injection.
Proceedings of the Euro-Par 2016: Parallel Processing Workshops, 2016

2014
What Is the Right Balance for Performance and Isolation with Virtualization in HPC?
Proceedings of the Euro-Par 2014: Parallel Processing Workshops, 2014

Efficient Checkpointing of Virtual Machines Using Virtual Machine Introspection.
Proceedings of the 14th IEEE/ACM International Symposium on Cluster, 2014

2012
Proactive process-level live migration and back migration in HPC environments.
J. Parallel Distrib. Comput., 2012

Architecture for the next generation system management tools.
Future Generation Comp. Syst., 2012

Distributed Virtual Diskless Checkpointing: A Highly Fault Tolerant Scheme for Virtualized Clusters.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

Workshop on Resiliency in High Performance Computing (Resilience) in Clusters, Clouds, and Grids.
Proceedings of the Euro-Par 2012: Parallel Processing Workshops, 2012

2011
5th Workshop on System-Level Virtualization for High Performance Computing (HPCVirt 2011).
Proceedings of the Euro-Par 2011: Parallel Processing Workshops - CCPI, CGWS, HeteroPar, HiBB, HPCVirt, HPPC, HPSS, MDGS, ProPer, Resilience, UCHPC, VHPC, Bordeaux, France, August 29, 2011

Workshop on Resiliency in High Performance Computing (Resilience) in Clusters, Clouds, and Grids.
Proceedings of the Euro-Par 2011: Parallel Processing Workshops - CCPI, CGWS, HeteroPar, HiBB, HPCVirt, HPPC, HPSS, MDGS, ProPer, Resilience, UCHPC, VHPC, Bordeaux, France, August 29, 2011

A Case for Virtual Machine Based Fault Injection in a High-Performance Computing Environment.
Proceedings of the Euro-Par 2011: Parallel Processing Workshops - CCPI, CGWS, HeteroPar, HiBB, HPCVirt, HPPC, HPSS, MDGS, ProPer, Resilience, UCHPC, VHPC, Bordeaux, France, August 29, 2011

2010
Reliability of a System of k Nodes for High Performance Computing Applications.
IEEE Trans. Reliability, 2010

Incremental Checkpoint Schemes for Weibull Failure Distribution.
Int. J. Found. Comput. Sci., 2010

System-level virtualization research at Oak Ridge National Laboratory.
Future Generation Comp. Syst., 2010

Benefits of Software Rejuvenation on HPC Systems.
Proceedings of the IEEE International Symposium on Parallel and Distributed Processing with Applications, 2010

Hybrid Checkpointing for MPI Jobs in HPC Environments.
Proceedings of the 16th IEEE International Conference on Parallel and Distributed Systems, 2010

Aggregation of Real-Time System Monitoring Data for Analyzing Large-Scale Parallel and Distributed Computing Environments.
Proceedings of the 12th IEEE International Conference on High Performance Computing and Communications, 2010

Loadable Hypervisor Modules.
Proceedings of the 43rd Hawaii International International Conference on Systems Science (HICSS-43 2010), 2010

2009
Symmetric active/active metadata service for high availability parallel file systems.
J. Parallel Distrib. Comput., 2009

A tunable holistic resiliency approach for high-performance computing systems.
Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2009

Proactive Fault Tolerance Using Preemptive Migration.
Proceedings of the 17th Euromicro International Conference on Parallel, 2009

Refinement Proposal of the Goldberg's Theory.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2009

An Extensible I/O Performance Analysis Framework for Distributed Environments.
Proceedings of the Euro-Par 2009 Parallel Processing, 2009

Blue Gene/L Log Analysis and Time to Interrupt Estimation.
Proceedings of the The Forth International Conference on Availability, 2009

2008
Proactive process-level live migration in HPC environments.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2008

System-Level Virtualization for High Performance Computing.
Proceedings of the 16th Euromicro International Conference on Parallel, 2008

Virtualized Environments for the Harness High Performance Computing Workbench.
Proceedings of the 16th Euromicro International Conference on Parallel, 2008

Failure Prediction Models for Proactive Fault Tolerance Within Storage Environments.
Proceedings of the 16th International Symposium on Modeling, 2008

An optimal checkpoint/restart model for a large scale high performance computing system.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Proposal for Modifications to the OSCAR Architecture to Address Challenges in Distributed System Management.
Proceedings of the 22nd Annual International Symposium on High Performance Computing Systems and Applications (HPCS 2008), 2008

An Analysis of HPC Benchmarks in Virtual Machine Environments.
Proceedings of the Euro-Par 2008 Workshops, 2008

Complementarity between Virtualization and Single System Image Technologies.
Proceedings of the Euro-Par 2008 Workshops, 2008

Reliability-Aware Approach: An Incremental Checkpoint/Restart Model in HPC Environments.
Proceedings of the 8th IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2008), 2008

Symmetric Active/Active High Availability for High-Performance Computing System Services: Accomplishments and Limitations.
Proceedings of the 8th IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2008), 2008

A Framework for Proactive Fault Tolerance.
Proceedings of the The Third International Conference on Availability, 2008

Symmetric Active/Active Replication for Dependent Services.
Proceedings of the The Third International Conference on Availability, 2008

2007
A unified multiple-level cache for high performance storage systems.
IJHPCN, 2007

A Job Pause Service under LAM/MPI+BLCR for Transparent Fault Tolerance.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

Proactive fault tolerance for HPC with Xen virtualization.
Proceedings of the 21th Annual International Conference on Supercomputing, 2007

A Fast Delivery Protocol for Total Order Broadcasting.
Proceedings of the 16th International Conference on Computer Communications and Networks, 2007

Middleware in Modern High Performance Computing System Architectures.
Proceedings of the Computational Science - ICCS 2007, 7th International Conference, Beijing, China, May 27, 2007

Automatic Testing Tool for OSCAR Using System-level Virtualization.
Proceedings of the 21st Annual International Symposium on High Performance Computing Systems and Applications (HPCS 2007), 2007

Design and Implementation of a Menu Based OSCAR Command Line Interface.
Proceedings of the 21st Annual International Symposium on High Performance Computing Systems and Applications (HPCS 2007), 2007

Evaluation of fault-tolerant policies using simulation.
Proceedings of the 2007 IEEE International Conference on Cluster Computing, 2007

A reliability-aware approach for an optimal checkpoint/restart model in HPC environments.
Proceedings of the 2007 IEEE International Conference on Cluster Computing, 2007

Reliability-aware resource allocation in HPC systems.
Proceedings of the 2007 IEEE International Conference on Cluster Computing, 2007

System management software for virtual environments.
Proceedings of the 4th Conference on Computing Frontiers, 2007

Transparent Symmetric Active/Active Replication for Service-Level High Availability.
Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2007), 2007

On Programming Models for Service-Level High Availability.
Proceedings of the The Second International Conference on Availability, 2007

2006
Constructing collaborative desktop storage caches for large scientific datasets.
TOS, 2006

MOLAR: adaptive runtime support for high-end computing operating and runtime systems.
Operating Systems Review, 2006

Symmetric Active/Active High Availability for High-Performance Computing System Services.
JCP, 2006

OSCAR - OSCAR community meeting.
Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

Xen-OSCAR for Cluster Virtualization.
Proceedings of the Frontiers of High Performance Computing and Networking, 2006

Scalable, fault tolerant membership for MPI tasks on HPC systems.
Proceedings of the 20th Annual International Conference on Supercomputing, 2006

Coupling prefix caching and collective downloads for remote dataset access.
Proceedings of the 20th Annual International Conference on Supercomputing, 2006

OSCAR Testing with Xen.
Proceedings of the 20th Annual International Symposium on High Performance Computing Systems and Applications (HPCS 2006), 2006

A Component-Based Approach to Improving the Modularity of OSCAR.
Proceedings of the 20th Annual International Symposium on High Performance Computing Systems and Applications (HPCS 2006), 2006

JOSHUA: Symmetric Active/Active Replication for Highly Available HPC Job and Resource Management.
Proceedings of the 2006 IEEE International Conference on Cluster Computing, 2006

IPMI-based Efficient Notification Framework for Large Scale Cluster Computing.
Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2006), 2006

Active/Active Replication for Highly Available HPC System Services.
Proceedings of the The First International Conference on Availability, 2006

2005
Achieving high availability and performance computing with an HA-OSCAR cluster.
Future Generation Comp. Syst., 2005

UML-based Beowulf Cluster Availability Modeling.
Proceedings of the International Conference on Software Engineering Research and Practice, 2005

FreeLoader: Scavenging Desktop Storage Resources for Scientific Data.
Proceedings of the ACM/IEEE SC2005 Conference on High Performance Networking and Computing, 2005

A Unified Multiple-Level Cache for High Performance Storage Systems.
Proceedings of the 13th International Symposium on Modeling, 2005

Model-Based Statistical Testing of a Cluster Utility.
Proceedings of the Computational Science, 2005

SSI-OSCAR: A Cluster Distribution for High Performance Computing Using a Single System Image.
Proceedings of the 19th Annual International Symposium on High Performance Computing Systems and Applications (HPCS 2005), 2005

OSCAR Meta-Package System.
Proceedings of the 19th Annual International Symposium on High Performance Computing Systems and Applications (HPCS 2005), 2005

Grid-Aware HA-OSCAR.
Proceedings of the 19th Annual International Symposium on High Performance Computing Systems and Applications (HPCS 2005), 2005

Reliability-aware resource management for computational grid/cluster environments.
Proceedings of the 6th IEEE/ACM International Conference on Grid Computing (GRID 2005), 2005

Reliability-aware Checkpoint/Restart Scheme: A Performability Trade-off.
Proceedings of the 2005 IEEE International Conference on Cluster Computing (CLUSTER 2005), September 26, 2005

Job-Site Level Fault Tolerance for Cluster and Grid environments.
Proceedings of the 2005 IEEE International Conference on Cluster Computing (CLUSTER 2005), September 26, 2005

2004
Highly Reliable Linux HPC Clusters: Self-Awareness Approach.
Proceedings of the Parallel and Distributed Processing and Applications, 2004

Online Remote Data Backup for iSCSI-Based Storage Systems.
Proceedings of the International Conference on Internet Computing, 2004

2003
ORNL-RSH Package and Windows '03 PVM 3.4.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface,10th European PVM/MPI Users' Group Meeting, Venice, Italy, September 29, 2003

Dependability Prediction of High Availability OSCAR Cluster Server.
Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 2003

Availability Prediction and Modeling of High Availability OSCAR Cluster.
Proceedings of the 2003 IEEE International Conference on Cluster Computing (CLUSTER 2003), 2003

2002
Distributed Peer-to-Peer Control in Harness.
Proceedings of the Computational Science - ICCS 2002, 2002

2001
Cluster Command and Control (C3) Tool Suite.
Scalable Computing: Practice and Experience, 2001

Systems Administration.
IJHPCA, 2001

VIA Communication Performance on a Gigabit Ethernet Cluster.
Proceedings of the Euro-Par 2001: Parallel Processing, 2001

OSCAR and the Beowulf Arms Race for the "Cluster Standard".
Proceedings of the 2001 IEEE International Conference on Cluster Computing (CLUSTER 2001), 2001

M3C: Managing and Monitoring Multiple Clusters.
Proceedings of the First IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2001), 2001

2000
GigaBit Performance under NT.
Proceedings of the Parallel and Distributed Processing, 2000

Tutorial A: Design and Analysis of High Performance Clusters.
Proceedings of the 2000 IEEE International Conference on Cluster Computing (CLUSTER 2000), November 28th, 2000

ORNL M3C tool.
Proceedings of the 2000 IEEE International Conference on Cluster Computing (CLUSTER 2000), November 28th, 2000

Enabling High Performance Data Transfer on Cluster Architecture.
Proceedings of the 2000 IEEE International Conference on Cluster Computing (CLUSTER 2000), November 28th, 2000

1999
Harness: Adaptable Virtual Machine Environment for Heterogeneous Clusters.
Parallel Processing Letters, 1999

1998
PVM on Windows and NT Clusters.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 1998

HARNESS: Heterogeneous Adaptable Reconfigurable NEtworked SystemS.
Proceedings of the Seventh IEEE International Symposium on High Performance Distributed Computing, 1998

1997
Work-based performance measurement and analysis of virtual heterogeneous machines.
Int. J. Systems Science, 1997

Beyond PVM 3.4: What We've Learned, What's New, and Why.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 1997

1994
ASC: An Associative-Computing Paradigm.
IEEE Computer, 1994

A Task Graph Centroid.
Proceedings of the Third International Symposium on High Performance Distributed Computing, 1994


  Loading...