Devesh Tiwari

According to our database1, Devesh Tiwari authored at least 50 papers between 2010 and 2020.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Other 

Links

On csauthors.net:

Bibliography

2020
Resilience and coevolution of preferential interdependent networks.
Social Netw. Analys. Mining, 2020

Comparing Performances of Five Distinct Automatic Classifiers for Fin Whale Vocalizations in Beamformed Spectrograms of Coherent Hydrophone Array.
Remote Sensing, 2020

2019
An Analysis Workflow-Aware Storage System for Multi-Core Active Flash Arrays.
IEEE Trans. Parallel Distrib. Syst., 2019

Two stage cluster for resource optimization with Apache Mesos.
CoRR, 2019

Revisiting I/O behavior in large-scale storage systems: the expected and the unexpected.
Proceedings of the International Conference for High Performance Computing, 2019

Characterizing Disk Health Degradation and Proactively Protecting Against Disk Failures for Reliable Storage Systems.
Proceedings of the 2019 IEEE International Conference on Autonomic Computing, 2019

PERQ: Fair and Efficient Power Management of Power-Constrained Large-Scale Computing Systems.
Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing, 2019

PCFI: Program Counter Guided Fault Injection for Accelerating GPU Reliability Assessment.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2019

What does Vibration do to Your SSD?
Proceedings of the 56th Annual Design Automation Conference 2019, 2019

Towards Enabling Dynamic Resource Estimation and Correction for Improving Utilization in an Apache Mesos Cloud Environment.
Proceedings of the 19th IEEE/ACM International Symposium on Cluster, 2019

Exploring Potential for Non-Disruptive Vertical Auto Scaling and Resource Estimation in Kubernetes.
Proceedings of the 12th IEEE International Conference on Cloud Computing, 2019

2018
Exploring the Optimal Platform Configuration for Power-Constrained HPC Workflows.
Proceedings of the 27th International Conference on Computer Communication and Networks, 2018

Machine Learning Models for GPU Error Prediction in a Large Scale HPC System.
Proceedings of the 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2018

Understanding and Analyzing Interconnect Errors and Network Congestion on a Large Scale HPC System.
Proceedings of the 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2018

Shiraz: Exploiting System Reliability and Application Resilience Characteristics to Improve Large Scale System Throughput.
Proceedings of the 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2018

Reliability Characterization of Solid State Drives in a Scalable Production Datacenter.
Proceedings of the IEEE International Conference on Big Data, 2018

Resilience and the Coevolution of Interdependent Multiplex Networks.
Proceedings of the IEEE/ACM 2018 International Conference on Advances in Social Networks Analysis and Mining, 2018

2017
Obtaining and Managing Answer Quality for Online Data-Intensive Services.
TOMPECS, 2017

Compiler-Directed Soft Error Detection and Recovery to Avoid DUE and SDC via Tail-DMR.
ACM Trans. Embedded Comput. Syst., 2017

GUIDE: a scalable information directory service to collect, federate, and analyze logs for operational insights into a leadership HPC facility.
Proceedings of the International Conference for High Performance Computing, 2017

Failures in large scale systems: long-term measurement, analysis, and implications.
Proceedings of the International Conference for High Performance Computing, 2017

Toward Managing HPC Burst Buffers Effectively: Draining Strategy to Regulate Bursty I/O Behavior.
Proceedings of the 25th IEEE International Symposium on Modeling, 2017

Characterizing Temperature, Power, and Soft-Error Behaviors in Data Center Systems: Insights, Challenges, and Opportunities.
Proceedings of the 25th IEEE International Symposium on Modeling, 2017

Effective Running of End-to-End HPC Workflows on Emerging Heterogeneous Architectures.
Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017

2016
Application configuration selection for energy-efficient execution on multicore systems.
J. Parallel Distributed Comput., 2016

Compiler-directed lightweight checkpointing for fine-grained guaranteed soft error recovery.
Proceedings of the International Conference for High Performance Computing, 2016

Granularity and the cost of error recovery in resilient AMR scientific applications.
Proceedings of the International Conference for High Performance Computing, 2016

Low-cost soft error resilience with unified data verification and fine-grained recovery for acoustic sensor based detection.
Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016

Reducing Waste in Extreme Scale Systems through Introspective Analysis.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

Adaptive Power Profiling for Many-Core HPC Architectures.
Proceedings of the 2016 IEEE International Conference on Autonomic Computing, 2016

A large-scale study of soft-errors on GPUs in the field.
Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture, 2016

Power-Capping Aware Checkpointing: On the Interplay Among Power-Capping, Temperature, Reliability, Performance, and Energy.
Proceedings of the 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2016

2015
A practical approach to reconciling availability, performance, and capacity in provisioning extreme-scale storage systems.
Proceedings of the International Conference for High Performance Computing, 2015

Reliability lessons learned from GPU experience with the Titan supercomputer at Oak Ridge leadership computing facility.
Proceedings of the International Conference for High Performance Computing, 2015

AnalyzeThis: an analysis workflow-aware storage system.
Proceedings of the International Conference for High Performance Computing, 2015

Clover: Compiler Directed Lightweight Soft Error Resilience.
Proceedings of the 16th ACM SIGPLAN/SIGBED Conference on Languages, 2015

Measuring and Managing Answer Quality for Online Data-Intensive Services.
Proceedings of the 2015 IEEE International Conference on Autonomic Computing, 2015

Understanding GPU errors on large-scale HPC systems and the implications for system design and operation.
Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015

Understanding and Exploiting Spatial Properties of System Failures on Extreme-Scale HPC Systems.
Proceedings of the 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2015

Low Power Job Scheduler for Supercomputers: A Rule-Based Power-Aware Scheduler.
Proceedings of the IEEE International Conference on Data Science and Data Intensive Systems, 2015

2014
Best Practices and Lessons Learned from Deploying and Operating Large-Scale Data-Centric Parallel File Systems.
Proceedings of the International Conference for High Performance Computing, 2014

MapReuse: Reusing Computation in an In-Memory MapReduce System.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

Improving large-scale storage system performance via topology-aware and balanced data placement.
Proceedings of the 20th IEEE International Conference on Parallel and Distributed Systems, 2014

Lazy Checkpointing: Exploiting Temporal Locality in Failures to Mitigate Checkpointing Overheads on Extreme-Scale Systems.
Proceedings of the 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2014

2013
Active flash: towards energy-efficient, in-situ data analytics on extreme-scale machines.
Proceedings of the 11th USENIX conference on File and Storage Technologies, 2013

2012
Reducing Data Movement Costs Using Energy-Efficient, Active Computation on SSD.
Proceedings of the 2012 Workshop on Power-Aware Computing Systems, HotPower'12, 2012

Architectural characterization and similarity analysis of sunspider and Google's V8 Javascript benchmarks.
Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software, 2012

Modeling and Analyzing Key Performance Factors of Shared Memory MapReduce.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

2011
HAQu: Hardware-accelerated queueing for fine-grained threading on a chip multiprocessor.
Proceedings of the 17th International Conference on High-Performance Computer Architecture (HPCA-17 2011), 2011

2010
MMT: Exploiting fine-grained parallelism in dynamic memory management.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010


  Loading...