Long Wang

Proceedings of the 13th IEEE International Conference on Cloud Computing, 2020

2019

LADRA: Log-based abnormal task detection and root-cause analysis in big data processing with Spark.

[BibT_eX]

[DOI]

Future Gener. Comput. Syst., 2019

System Restore in a Multi-cloud Data Pipeline Platform.

[BibT_eX]

[DOI]

Valentina Salapura

Robin Arnold

Xu Wang

Senthil Bakthavachalam

Proceedings of the 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2019

2018

KEREP: Experience in Extracting Knowledge on Distributed System Behavior through Request Execution Path.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Symposium on Software Reliability Engineering Workshops, 2018

Transparently Capturing Execution Path of Service/Job Request Processing.

[BibT_eX]

[DOI]

Proceedings of the Service-Oriented Computing - 16th International Conference, 2018

2017

Failure Diagnosis for Distributed Systems Using Targeted Fault Injection.

[BibT_eX]

[DOI]

Zbigniew T. Kalbarczyk

Ravishankar K. Iyer

IEEE Trans. Parallel Distributed Syst., 2017

Log-based Abnormal Task Detection and Root Cause Analysis for Spark.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Web Services, 2017

Providing Resiliency to Orchestration and Automation Engines in Hybrid Cloud.

[BibT_eX]

[DOI]

Alexei Karve

Proceedings of the 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops, 2017

Predicting Misconfiguration-Induced Unsuccessful Executions of Jobs in Big Data System.

[BibT_eX]

[DOI]

Proceedings of the 41st IEEE Annual Computer Software and Applications Conference, 2017

2016

Activating Protection and Exercising Recovery Against Large-Scale Outages on the Cloud.

[BibT_eX]

[DOI]

Ruchi Mahindru

Proceedings of the 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops, 2016

Disaster Recovery for Cloud-Hosted Enterprise Applications.

[BibT_eX]

[DOI]

Ruchi Mahindru

Proceedings of the 9th IEEE International Conference on Cloud Computing, 2016

Auto-tuning Performance of MPI Parallel Programs Using Resource Management in Container-Based Virtual Cloud.

[BibT_eX]

[DOI]

Proceedings of the 9th IEEE International Conference on Cloud Computing, 2016

2015

VM-μCheckpoint: Design, Modeling, and Assessment of Lightweight In-Memory VM Checkpointing.

[BibT_eX]

[DOI]

IEEE Trans. Dependable Secur. Comput., 2015

Experiences with Building Disaster Recovery for Enterprise-Class Clouds.

[BibT_eX]

[DOI]

Mahesh Viswanathan

Edmond Plattier

Proceedings of the 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2015

2014

Toward achieving operational excellence in a cloud.

[BibT_eX]

[DOI]

IBM J. Res. Dev., 2014

2013

PseudoApp: Performance prediction for application migration to cloud.

[BibT_eX]

[DOI]

Proceedings of the 2013 IFIP/IEEE International Symposium on Integrated Network Management (IM 2013), 2013

Dissecting Open Source Cloud Evolution: An OpenStack Case Study.

[BibT_eX]

[DOI]

Proceedings of the 5th USENIX Workshop on Hot Topics in Cloud Computing, 2013

CAP3: A Cloud Auto-Provisioning Framework for Parallel Processing Using On-Demand and Spot Instances.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE Sixth International Conference on Cloud Computing, Santa Clara, CA, USA, June 28, 2013

2012

Towards an Understanding of Oversubscription in Cloud.

[BibT_eX]

[DOI]

Salman Abdul Baset

Chunqiang Tang

Proceedings of the 2nd USENIX Workshop on Hot Topics in Management of Internet, 2012

Remediating Overload in Over-Subscribed Computing Environments.

[BibT_eX]

[DOI]

Rafah Hosn

Chunqiang Tang

Proceedings of the 2012 IEEE Fifth International Conference on Cloud Computing, 2012

2010

Providing application-aware reliability through OS/hypervisor-level techniques

[BibT_eX]

[DOI]

PhD thesis, 2010

Checkpointing virtual machines against transient errors.

[BibT_eX]

[DOI]

Proceedings of the 16th IEEE International On-Line Testing Symposium (IOLTS 2010), 2010

2008

Formalizing System Behavior for Evaluating a System Hang Detector.

[BibT_eX]

[DOI]

Zbigniew Kalbarczyk

Ravishankar K. Iyer

Proceedings of the 27th IEEE Symposium on Reliable Distributed Systems (SRDS 2008), 2008

2007

Reliability MicroKernel: Providing Application-Aware Reliability in the OS.

[BibT_eX]

[DOI]

IEEE Trans. Reliab., 2007

2006

An OS-level Framework for Providing Application-Aware Reliability.

[BibT_eX]

[DOI]

Proceedings of the 12th IEEE Pacific Rim International Symposium on Dependable Computing (PRDC 2006), 2006

2005

RTES demo system2004.

[BibT_eX]

[DOI]

SIGBED Rev., 2005

Application Fault Tolerance with Armor Middleware.

[BibT_eX]

[DOI]

Zbigniew Kalbarczyk

Ravishankar K. Iyer