Sriram Krishnamoorthy

Orcid: 0000-0002-4682-1002

According to our database1, Sriram Krishnamoorthy authored at least 152 papers between 2003 and 2022.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2022
TAMM: Tensor Algebra for Many-body Methods.
CoRR, 2022

Efficient Hierarchical State Vector Simulation of Quantum Circuits via Acyclic Graph Partitioning.
Proceedings of the IEEE International Conference on Cluster Computing, 2022

2021
PaKman: A Scalable Algorithm for Generating Genomic Contigs on Distributed Memory Machines.
IEEE Trans. Parallel Distributed Syst., 2021

GFCCLib: Scalable and efficient coupled-cluster Green's function library for accurately tackling many-body electronic structure problems.
Comput. Phys. Commun., 2021

SV-sim: scalable PGAS-based state vector simulation of quantum circuits.
Proceedings of the International Conference for High Performance Computing, 2021

Characterizing Impacts of Storage Faults on HPC Applications: A Methodology and Insights.
Proceedings of the IEEE International Conference on Cluster Computing, 2021

Robustness Analysis of Loop-Free Floating-Point Programs via Symbolic Automatic Differentiation.
Proceedings of the IEEE International Conference on Cluster Computing, 2021

2020
Reliability Analysis for Unreliable FSM Computations.
ACM Trans. Archit. Code Optim., 2020

FPDetect: Efficient Reasoning About Stencil Programs Using Selective Direct Evaluation.
ACM Trans. Archit. Code Optim., 2020

FailAmp: Relativization Transformation for Soft Error Detection in Structured Address Generation.
ACM Trans. Archit. Code Optim., 2020

Analytical Modeling and Design of Gallium Oxide Schottky Barrier Diodes Beyond Unipolar Figure of Merit Using High-k Dielectric Superjunction Structures.
CoRR, 2020

Design of a β-Ga<sub>2</sub>O<sub>3</sub> Schottky Barrier Diode With p-type III-Nitride Guard Ring for Enhanced Breakdown.
CoRR, 2020

An Abstraction-guided Approach to Scalable and Rigorous Floating-Point Error Analysis.
CoRR, 2020

Density matrix quantum circuit simulation via the BSP machine on modern GPU clusters.
Proceedings of the International Conference for High Performance Computing, 2020

Scalable heterogeneous execution of a coupled-cluster model with perturbative triples.
Proceedings of the International Conference for High Performance Computing, 2020

Scalable yet rigorous floating-point error analysis.
Proceedings of the International Conference for High Performance Computing, 2020

COMET: A Domain-Specific Compilation of High-Performance Computational Chemistry.
Proceedings of the Languages and Compilers for Parallel Computing, 2020

2019
Extracting SIMD Parallelism from Recursive Task-Parallel Programs.
ACM Trans. Parallel Comput., 2019

Q# and NWChem: Tools for Scalable Quantum Chemistry on Quantum Computers.
CoRR, 2019

An efficient mixed-mode representation of sparse tensors.
Proceedings of the International Conference for High Performance Computing, 2019

Toward generalized tensor algebra for ab initio quantum chemistry methods.
Proceedings of the 6th ACM SIGPLAN International Workshop on Libraries, 2019

NoC-enabled software/hardware co-design framework for accelerating <i>k-mer</i> counting.
Proceedings of the 13th IEEE/ACM International Symposium on Networks-on-Chip, 2019

PaKman: Scalable Assembly of Large Genomes on Distributed Memory Machines.
Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium, 2019

BonVoision: leveraging spatial data smoothness for recovery from memory soft errors.
Proceedings of the ACM International Conference on Supercomputing, 2019

Performance Models for Data Transfers: A Case Study with Molecular Chemistry Kernels.
Proceedings of the 48th International Conference on Parallel Processing, 2019

Ground-Truth Prediction to Accelerate Soft-Error Impact Analysis for Iterative Methods.
Proceedings of the 26th IEEE International Conference on High Performance Computing, 2019

Mapping Arbitrarily Sparse Two-Body Interactions on One-Dimensional Quantum Circuits.
Proceedings of the 26th IEEE International Conference on High Performance Computing, 2019

Towards Predicting the Impact of Roll-Forward Failure Recovery for HPC Applications.
Proceedings of the 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2019

A Code Generator for High-Performance Tensor Contractions on GPUs.
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2019

MULKSG: MULtiple K Simultaneous Graph Assembly.
Proceedings of the Algorithms for Computational Biology - 6th International Conference, 2019

2018
Argobots: A Lightweight Low-Level Threading and Tasking Framework.
IEEE Trans. Parallel Distributed Syst., 2018

NUMA-Caffe: NUMA-Aware Deep Learning Neural Networks.
ACM Trans. Archit. Code Optim., 2018

Exploring the capabilities of support vector machines in detecting silent data corruptions.
Sustain. Comput. Informatics Syst., 2018

Analytical modeling of cache behavior for affine programs.
Proc. ACM Program. Lang., 2018

HPC Software Verification in Action: A Case Study with Tensor Transposition.
Proceedings of the 2nd IEEE/ACM International Workshop on Software Correctness for HPC Applications, 2018

Performance modeling for GPUs using abstract kernel emulation.
Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2018

GPU code optimization using abstract kernel emulation and sensitivity analysis.
Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2018

TTLG - An Efficient Tensor Transposition Library for GPUs.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

Optimizing Tensor Contractions in CCSD(T) for Efficient Execution on GPUs.
Proceedings of the 32nd International Conference on Supercomputing, 2018

Characterizing the Impact of Soft Errors Affecting Floating-point ALUs using RTL-Ievel Fault Injection.
Proceedings of the 47th International Conference on Parallel Processing, 2018

Quantification, Trade-off Analysis, and Optimal Checkpoint Placement for Reliability and Availability.
Proceedings of the 25th IEEE International Conference on High Performance Computing, 2018

Characterization of the Impact of Soft Errors on Iterative Methods.
Proceedings of the 25th IEEE International Conference on High Performance Computing, 2018

Lightweight detection of cache conflicts.
Proceedings of the 2018 International Symposium on Code Generation and Optimization, 2018

On the theory of speculative checkpointing: time and energy considerations.
Proceedings of the 15th ACM International Conference on Computing Frontiers, 2018

Comparative analysis of soft-error detection strategies: a case study with iterative methods.
Proceedings of the 15th ACM International Conference on Computing Frontiers, 2018

Understanding scale-Dependent soft-Error Behavior of Scientific Applications.
Proceedings of the 18th IEEE/ACM International Symposium on Cluster, 2018

2017
Report of the HPC Correctness Summit, Jan 25-26, 2017, Washington, DC.
CoRR, 2017

Automatic Risk-based Selective Redundancy for Fault-tolerant Task-parallel HPC Applications.
Proceedings of the Third International Workshop on Extreme Scale Programming Models and Middleware, 2017

Exploiting Vector and Multicore Parallelism for Recursive, Data- and Task-Parallel Programs.
Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2017

Optimizing the Four-Index Integral Transform Using Data Movement Lower Bounds Analysis.
Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2017

Cache locality optimization for recursive programs.
Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2017

Efficient Cache Simulation for Affine Computations.
Proceedings of the Languages and Compilers for Parallel Computing, 2017

Localized Fault Recovery for Nested Fork-Join Programs.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

Locality-Aware Dynamic Task Graph Scheduling.
Proceedings of the 46th International Conference on Parallel Processing, 2017

Approximate Computing Techniques for Iterative Graph Algorithms.
Proceedings of the 24th IEEE International Conference on High Performance Computing, 2017

Toward a General Theory of Optimal Checkpoint Placement.
Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017

A Gaussian Process Approach for Effective Soft Error Detection.
Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017

MACORD: Online Adaptive Machine Learning Framework for Silent Error Detection.
Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017

2016
User-Assisted Store Recycling for Dynamic Task Graph Schedulers.
ACM Trans. Archit. Code Optim., 2016

Static and Dynamic Frequency Scaling on Multicore CPUs.
ACM Trans. Archit. Code Optim., 2016

Work stealing for GPU-accelerated parallel programs in a global address space framework.
Concurr. Comput. Pract. Exp., 2016

A domain-specific compiler for a parallel multiresolution adaptive numerical simulation environment.
Proceedings of the International Conference for High Performance Computing, 2016

User-assisted storage reuse determination for dynamic task graphs.
Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2016

PolyCheck: dynamic verification of iteration space transformations on affine programs.
Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 2016

Effective padding of multidimensional arrays to avoid cache conflict misses.
Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2016

Towards Resiliency Evaluation of Vector Programs.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

On the Impact of Widening Vector Registers on Sequence Alignment.
Proceedings of the 45th International Conference on Parallel Processing, 2016

New-Sum: A Novel Online ABFT Scheme For General Iterative Methods.
Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing, 2016

PRESAGE: Protecting Structured Address Generation against Soft Errors.
Proceedings of the 23rd IEEE International Conference on High Performance Computing, 2016

On fusing recursive traversals of K-d trees.
Proceedings of the 25th International Conference on Compiler Construction, 2016

2015
Global transformations for legacy parallel applications via structural analysis and rewriting.
Parallel Comput., 2015

A work stealing based approach for enabling scalable optimal sequence homology detection.
J. Parallel Distributed Comput., 2015

CilkSpec: optimistic concurrency for Cilk.
Proceedings of the International Conference for High Performance Computing, 2015

Efficient execution of recursive programs on commodity vector hardware.
Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2015

HIPS-LSPP Introduction and Committees.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, 2015

On the Impact of Execution Models: A Case Study in Computational Chemistry.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, 2015

2014
Introduction to the JPDC Special Issue on Domain-Specific Languages and High-Level Frameworks for High-Performance Computing.
J. Parallel Distributed Comput., 2014

Addressing failures in exascale computing.
Int. J. High Perform. Comput. Appl., 2014

A Communication-Optimal Framework for Contracting Distributed Tensors.
Proceedings of the International Conference for High Performance Computing, 2014

Optimizing Data Locality for Fork/Join Programs Using Constrained Work Stealing.
Proceedings of the International Conference for High Performance Computing, 2014

Fault-Tolerant Dynamic Task Graph Scheduling.
Proceedings of the International Conference for High Performance Computing, 2014

Compiler-assisted detection of transient memory errors.
Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, 2014

Checksumming Strategies for Data in Volatile Memories.
Proceedings of the 43rd International Conference on Parallel Processing Workshops, 2014

CAST: Contraction Algorithm for Symmetric Tensors.
Proceedings of the 43rd International Conference on Parallel Processing, 2014

Scalable replay with partial-order dependencies for message-logging fault tolerance.
Proceedings of the 2014 IEEE International Conference on Cluster Computing, 2014

2013
A scalable infrastructure for the performance analysis of passive target synchronization.
Parallel Comput., 2013

Multi-Fault Tolerance for Cartesian Data Distributions.
Int. J. Parallel Program., 2013

Optimizing tensor contraction expressions for hybrid CPU-GPU execution.
Clust. Comput., 2013

A framework for load balancing of tensor contraction expressions via dynamic task partitioning.
Proceedings of the International Conference for High Performance Computing, 2013

Steal Tree: low-overhead tracing of work stealing schedulers.
Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, 2013

Efficient scheduling of recursive control flow on GPUs.
Proceedings of the International Conference on Supercomputing, 2013

2012
Empirical performance model-driven data layout optimization and library call selection for tensor contraction expressions.
J. Parallel Distributed Comput., 2012

Performance characterization of global address space applications: a case study with NWChem.
Concurr. Comput. Pract. Exp., 2012

Supporting the Global Arrays PGAS Model Using MPI One-Sided Communication.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

Load Balancing of Dynamical Nucleation Theory Monte Carlo Simulations through Resource Sharing Barriers.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

Data-driven fault tolerance for work stealing computations.
Proceedings of the International Conference on Supercomputing, 2012

On the Use of Term Rewriting for Performance Ooptimization of Legacy HPC Applications.
Proceedings of the 41st International Conference on Parallel Processing, 2012

Work stealing and persistence-based load balancers for iterative overdecomposed applications.
Proceedings of the 21st International Symposium on High-Performance Parallel and Distributed Computing, 2012

Towards scalable optimal sequence homology detection.
Proceedings of the 19th International Conference on High Performance Computing, 2012

Global Futures: A Multithreaded Execution Model for Global Arrays-based Applications.
Proceedings of the 12th IEEE/ACM International Symposium on Cluster, 2012

2011
Poster: FOX: a fault-oblivious extreme scale execution environment.
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, 2011

Scalable implementations of accurate excited-state coupled cluster theories: application of high-level methods to porphyrin-based systems.
Proceedings of the Conference on High Performance Computing Networking, 2011

Poster: High-level, one-sided programming models on MPI: a case study with global arrays and NWChem.
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, 2011

Noncollective Communicator Creation in MPI.
Proceedings of the Recent Advances in the Message Passing Interface, 2011

Lifeline-based global load balancing.
Proceedings of the 16th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2011

A Redundant Communication Approach to Scalable Fault Tolerance in PGAS Programming Models.
Proceedings of the 19th International Euromicro Conference on Parallel, 2011

Application-Specific Fault Tolerance via Data Access Characterization.
Proceedings of the Euro-Par 2011 Parallel Processing - 17th International Conference, 2011

Tolerating correlated failures for generalized Cartesian distributions via bipartite matching.
Proceedings of the 8th Conference on Computing Frontiers, 2011

Practical Loop Transformations for Tensor Contraction Expressions on Multi-level Memory Hierarchies.
Proceedings of the Compiler Construction - 20th International Conference, 2011

Parameterized Micro-benchmarking: An Auto-tuning Approach for Complex Applications.
Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011

2010
High performance Molecular Dynamic simulation on single and multi-GPU systems.
Proceedings of the International Symposium on Circuits and Systems (ISCAS 2010), May 30, 2010

Dynamic load balancing on single- and multi-GPU systems.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Acceleration of Streamed Tensor Contraction Expressions on GPGPU-Based Clusters.
Proceedings of the 2010 IEEE International Conference on Cluster Computing, 2010

Scalable Communication Trace Compression.
Proceedings of the 10th IEEE/ACM International Conference on Cluster, 2010

Selective Recovery from Failures in a Task Parallel Programming Model.
Proceedings of the 10th IEEE/ACM International Conference on Cluster, 2010

2009
An Integrated Approach to Locality-Conscious Processor Allocation and Scheduling of Mixed-Parallel Applications.
IEEE Trans. Parallel Distributed Syst., 2009

Scalable work stealing.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009

Parametric multi-level tiling of imperfectly nested loops.
Proceedings of the 23rd international conference on Supercomputing, 2009

Scalable transparent checkpoint-restart of global address space applications on virtual machines over infiniband.
Proceedings of the 6th Conference on Computing Frontiers, 2009

Data Layout Transformation for Enhancing Data Locality on NUCA Chip Multiprocessors.
Proceedings of the PACT 2009, 2009

2008
Global trees: a framework for linked data structures on distributed memory parallel systems.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2008

Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories.
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2008

Towards effective automatic parallelization for multicore systems.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

A compiler framework for optimization of affine loop nests for gpgpus.
Proceedings of the 22nd Annual International Conference on Supercomputing, 2008

Scioto: A Framework for Global-View Task Parallelism.
Proceedings of the 2008 International Conference on Parallel Processing, 2008

Solving Large, Irregular Graph Problems Using Adaptive Work-Stealing.
Proceedings of the 2008 International Conference on Parallel Processing, 2008

Integrated Data and Task Management for Scientific Applications.
Proceedings of the Computational Science, 2008

Automatic Transformations for Communication-Minimized Parallelization and Locality Optimization in the Polyhedral Model.
Proceedings of the Compiler Construction, 17th International Conference, 2008

2007
Efficient search-space pruning for integrated fusion and tiling transformations.
Concurr. Comput. Pract. Exp., 2007

Effective automatic parallelization of stencil computations.
Proceedings of the ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation, 2007

A global address space framework for locality aware scheduling of block-sparse computations.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

Non-collective parallel I/O for global address space programming models.
Proceedings of the 2007 IEEE International Conference on Cluster Computing, 2007

2006
Layout transformation support for the disk resident arrays framework.
J. Supercomput., 2006

Efficient synthesis of out-of-core algorithms using a nonlinear optimization solver.
J. Parallel Distributed Comput., 2006

Data management and query - Hypergraph partitioning for automatic memory hierarchy management.
Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

Blue Gene system software - Design and implementation of a one-sided communication interface for the IBM eServer Blue Gene® supercomputer.
Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

An approach to locality-conscious load balancing and transparent memory hierarchy management with a global-address-space parallel programming model.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

An extensible global address space framework with decoupled task and data abstractions.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

An Integrated Approach for Processor Allocation and Scheduling of Mixed-Parallel Applications.
Proceedings of the 2006 International Conference on Parallel Processing (ICPP 2006), 2006

Identifying Cost-Effective Common Subexpressions to Reduce Operation Count in Tensor Contraction Evaluations.
Proceedings of the Computational Science, 2006

Task Scheduling and File Replication for Data-Intensive Jobs with Batch-shared I/O.
Proceedings of the 15th IEEE International Symposium on High Performance Distributed Computing, 2006

Locality Conscious Processor Allocation and Scheduling for Mixed Parallel Applications.
Proceedings of the 2006 IEEE International Conference on Cluster Computing, 2006

Combining analytical and empirical approaches in tuning matrix transposition.
Proceedings of the 15th International Conference on Parallel Architectures and Compilation Techniques (PACT 2006), 2006

2005
Synthesis of High-Performance Parallel Programs for a Class of ab Initio Quantum Chemistry Models.
Proc. IEEE, 2005

Integrated Loop Optimizations for Data Locality Enhancement of Tensor Contraction Expressions.
Proceedings of the ACM/IEEE SC2005 Conference on High Performance Networking and Computing, 2005

Cache Miss Characterization and Data Locality Optimization for Imperfectly Nested Loops on Shared Memory Multiprocessors.
Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

Data and Computation Abstractions for Dynamic and Irregular Computations.
Proceedings of the High Performance Computing, 2005

2004
Efficient parallel out-of-core matrix transposition.
Int. J. High Perform. Comput. Netw., 2004

Empirical Performance-Model Driven Data Layout Optimization.
Proceedings of the Languages and Compilers for High Performance Computing, 2004

Efficient Layout Transformation for Disk-Based Multidimensional Arrays.
Proceedings of the High Performance Computing, 2004

2003
Data Locality Optimization for Synthesis of Efficient Out-of-Core Algorithms.
Proceedings of the High Performance Computing - HiPC 2003, 10th International Conference, 2003

A Robust Scheduling Strategy for Moldable Scheduling of Parallel Jobs.
Proceedings of the 2003 IEEE International Conference on Cluster Computing (CLUSTER 2003), 2003


  Loading...