# Laxmikant V. Kalé

Affiliations:
• University of Illinois, USA

According to our database1, Laxmikant V. Kalé authored at least 310 papers between 1984 and 2022.

Collaborative distances:

## ACM Fellow

ACM Fellow 2017, "For development of new parallel programming techniques and their deployment in high performance computing applications".

## IEEE Fellow

IEEE Fellow 2011, "For development of parallel programming techniques".

Book
In proceedings
Article
PhD thesis
Other

## Bibliography

2022
Improving Scalability with GPU-Aware Asynchronous Tasks.
CoRR, 2022

2021
Introduction to the Special Issue on PADS 2019.
ACM Trans. Model. Comput. Simul., 2021

Performance Evaluation of Python Parallel Programming Models: Charm4Py and mpi4py.
CoRR, 2021

GPU-aware Communication with UCX in Parallel Programming Models: Charm++, MPI, and Python.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2021

CharminG: A Scalable GPU-resident Runtime System.
Proceedings of the HPDC '21: The 30th International Symposium on High-Performance Parallel and Distributed Computing, 2021

Performance Evaluation of Python Parallel Programming Models: and mpi4py.
Proceedings of the 6th IEEE/ACM International Workshop on Extreme Scale Programming Models and Middleware, 2021

Accelerating Messages by Avoiding Copies in an Asynchronous Task-based Programming Model.
Proceedings of the 6th IEEE/ACM International Workshop on Extreme Scale Programming Models and Middleware, 2021

2020
Optimizing point-to-point communication between adaptive MPI endpoints in shared memory.
Concurr. Comput. Pract. Exp., 2020

Heterogeneous computing with OpenMP and Hydra.
Concurr. Comput. Pract. Exp., 2020

Achieving Computation-Communication Overlap with Overdecomposition on GPU Systems.
Proceedings of the 5th IEEE/ACM International Workshop on Extreme Scale Programming Models and Middleware, 2020

Unified data movement for offloading Charm++ applications.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium Workshops, 2020

End-to-end performance modeling of distributed GPU applications.
Proceedings of the ICS '20: 2020 International Conference on Supercomputing, 2020

2019
Scalable GW software for quasiparticle properties using OpenAtom.
Comput. Phys. Commun., 2019

Histogram Sort with Sampling.
Proceedings of the 31st ACM on Symposium on Parallelism in Algorithms and Architectures, 2019

An Adaptive Non-Blocking GVT Algorithm.
Proceedings of the 2019 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation, 2019

Fine-Grained Energy Efficiency Using Per-Core DVFS with an Adaptive Runtime System.
Proceedings of the Tenth International Green and Sustainable Computing Conference, 2019

2018
Argobots: A Lightweight Low-Level Threading and Tasking Framework.
IEEE Trans. Parallel Distributed Syst., 2018

Scalable molecular dynamics with NAMD on the Summit system.
IBM J. Res. Dev., 2018

Accelerating Scientific Applications on Heterogeneous Systems with HybridOMP.
Proceedings of the High Performance Computing for Computational Science - VECPAR 2018, 2018

Adaptive Methods for Irregular Parallel Discrete Event Simulation Workloads.
Proceedings of the 2018 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation, 2018

CharmPy: A Python Parallel Programming Model.
Proceedings of the IEEE International Conference on Cluster Computing, 2018

Multi-Level Load Balancing with an Integrated Runtime Approach.
Proceedings of the 18th IEEE/ACM International Symposium on Cluster, 2018

2017
Energy-optimal configuration selection for manycore chips with variation.
Int. J. High Perform. Comput. Appl., 2017

Visualizing, Measuring, and Tuning Adaptive MPI Parameters.
Proceedings of the Programming and Performance Visualization Tools, 2017

Integrating OpenMP into the Charm++ Programming Model.
Proceedings of the Third International Workshop on Extreme Scale Programming Models and Middleware, 2017

Improving the memory access locality of hybrid MPI applications.
Proceedings of the 24th European MPI Users' Group Meeting, 2017

POSTER: Automated Load Balancer Selection Based on Application Characteristics.
Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2017

Partitioning Low-Diameter Networks to Eliminate Inter-Job Interference.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

A Memory Heterogeneity-Aware Runtime System for Bandwidth-Sensitive HPC Applications.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017

Automatic topology mapping of diverse large-scale parallel applications.
Proceedings of the International Conference on Supercomputing, 2017

Support for Power Efficient Proactive Cooling Mechanisms.
Proceedings of the 24th IEEE International Conference on High Performance Computing, 2017

Runtime Techniques for Programming with Fast and Slow Memory.
Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017

Massively Parallel Simulations of Spread of Infectious Diseases over Realistic Social Networks.
Proceedings of the 17th IEEE/ACM International Symposium on Cluster, 2017

2016
Evaluating and Improving the Performance and Scheduling of HPC Applications in Cloud.
IEEE Trans. Cloud Comput., 2016

Solvers for <i>O</i> (N) Electronic Structure in the Strong Scaling Limit.
SIAM J. Sci. Comput., 2016

Power, Reliability, and Performance: One System to Rule them All.
Computer, 2016

OpenAtom: Scalable Ab-Initio Molecular Dynamics with Diverse Capabilities.
Proceedings of the High Performance Computing - 31st International Conference, 2016

Runtime Coordinated Heterogeneous Tasks in Charm++.
Proceedings of the Second International Workshop on Extreme Scale Programming Models and Middleware, 2016

FlipBack: automatic targeted protection against silent data corruption.
Proceedings of the International Conference for High Performance Computing, 2016

Evaluating HPC networks via simulation of parallel workloads.
Proceedings of the International Conference for High Performance Computing, 2016

Neural Network-Based Task Scheduling with Preemptive Fan Control.
Proceedings of the 4th International Workshop on Energy Efficient Supercomputing, 2016

Towards PDES in a Message-Driven Paradigm: A Preliminary Case Study Using Charm++.
Proceedings of the 2016 annual ACM Conference on SIGSIM Principles of Advanced Discrete Simulation, 2016

Mitigating Processor Variation through Dynamic Load Balancing.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

Variation Among Processors Under Turbo Boost in HPC Systems.
Proceedings of the 2016 International Conference on Supercomputing, 2016

2015
Using Migratable Objects to Enhance Fault Tolerance Schemes in Supercomputers.
IEEE Trans. Parallel Distributed Syst., 2015

Power Management of Extreme-Scale Networks with On/Off Links in Runtime Systems.
ACM Trans. Parallel Comput., 2015

Camel: collective-aware message logging.
J. Supercomput., 2015

A Fault-Tolerance Protocol for Parallel Applications with Communication Imbalance.
Proceedings of the 27th International Symposium on Computer Architecture and High Performance Computing, 2015

Energy-efficient computing for HPC workloads on heterogeneous manycore chips.
Proceedings of the Sixth International Workshop on Programming Models and Applications for Multicores and Manycores, 2015

A Batch System with Efficient Adaptive Scheduling for Malleable and Evolving Applications.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

Scalable Asynchronous Contact Mechanics Using Charm++.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

Analyzing Energy-Time Tradeoff in Power Overprovisioned HPC Data Centers.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, 2015

Charm++ and MPI: Combining the Best of Both Worlds.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

HIPS-LSPP Keynotes.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, 2015

Identifying the Culprits Behind Network Congestion.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

Preliminary Evaluation of a Parallel Trace Replay Tool for HPC Network Simulations.
Proceedings of the Euro-Par 2015: Parallel Processing Workshops, 2015

2014
Structure-adaptive parallel solution of sparse triangular linear systems.
Parallel Comput., 2014

Energy profile of rollback-recovery strategies in high performance computing.
Parallel Comput., 2014

Solvers for $\mathcal{O} (N)$ Electronic Structure in the Strong Scaling Limit.
CoRR, 2014

Using an Adaptive HPC Runtime System to Reconfigure the Cache Hierarchy.
Proceedings of the International Conference for High Performance Computing, 2014

Maximizing Throughput of Overprovisioned HPC Data Centers Under a Strict Power Budget.
Proceedings of the International Conference for High Performance Computing, 2014

Mapping to Irregular Torus Topologies and Other Techniques for Petascale Biomolecular Simulation.
Proceedings of the International Conference for High Performance Computing, 2014

Optimizing Data Locality for Fork/Join Programs Using Constrained Work Stealing.
Proceedings of the International Conference for High Performance Computing, 2014

Maximizing Throughput on a Dragonfly Network.
Proceedings of the International Conference for High Performance Computing, 2014

Parallel Programming with Migratable Objects: Charm++ in Practice.
Proceedings of the International Conference for High Performance Computing, 2014

Overcoming the Scalability Challenges of Epidemic Simulations on Blue Waters.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

PICS: a performance-analysis-based introspective control system to steer parallel applications.
Proceedings of the 4th International Workshop on Runtime and Operating Systems for Supercomputers, 2014

TRAM: Optimizing Fine-Grained Communication with Topological Routing and Aggregation of Messages.
Proceedings of the 43rd International Conference on Parallel Processing, 2014

Scaling the ISAM Land Surface Model through Parallelization of Inter-component Data Transfer.
Proceedings of the 43rd International Conference on Parallel Processing, 2014

Dynamic load balancing in GPU-based systems for a MPI program.
Proceedings of the International Conference on High Performance Computing & Simulation, 2014

Towards realizing the potential of malleable jobs.
Proceedings of the 21st International Conference on High Performance Computing, 2014

Optimizing the performance of parallel applications on a 5D torus via task mapping.
Proceedings of the 21st International Conference on High Performance Computing, 2014

Scalable replay with partial-order dependencies for message-logging fault tolerance.
Proceedings of the 2014 IEEE International Conference on Cluster Computing, 2014

Controlling Concurrency and Expressing Synchronization in Charm++ Programs.
Proceedings of the Concurrent Objects and Beyond, 2014

2013
Dynamic Load Balancing in GPU-Based Systems - Early Experiments.
CoRR, 2013

A 'cool' way of improving the reliability of HPC machines.
Proceedings of the International Conference for High Performance Computing, 2013

ACR: automatic checkpoint/restart for soft and hard error protection.
Proceedings of the International Conference for High Performance Computing, 2013

A distributed dynamic load balancer for iterative applications.
Proceedings of the International Conference for High Performance Computing, 2013

Predicting application performance using supervised learning on communication features.
Proceedings of the International Conference for High Performance Computing, 2013

Adoption protocols for fanout-optimal fault-tolerant termination detection.
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2013

Steal Tree: low-overhead tracing of work stealing schedulers.
Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, 2013

Toward Runtime Power Management of Exascale Networks by on/off Control of Links.
Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013

Towards Efficient Mapping, Scheduling, and Execution of HPC Applications on Platforms in Cloud.
Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013

G-Charm: an adaptive runtime system for message-driven parallel applications on hybrid systems.
Proceedings of the International Conference on Supercomputing, 2013

Characteristics of <i>adaptive</i> runtime systems in HPC.
Proceedings of the 3rd International Workshop on Runtime and Operating Systems for Supercomputers, 2013

HPC-Aware VM Placement in Infrastructure Clouds.
Proceedings of the 2013 IEEE International Conference on Cloud Engineering, 2013

Parallel branch-and-bound for two-stage stochastic integer optimization.
Proceedings of the 20th Annual International Conference on High Performance Computing, 2013

Optimizing power allocation to CPU and memory subsystems in overprovisioned HPC systems.
Proceedings of the 2013 IEEE International Conference on Cluster Computing, 2013

Thermal aware automated load balancing for HPC applications.
Proceedings of the 2013 IEEE International Conference on Cluster Computing, 2013

The Who, What, Why, and How of High Performance Computing in the Cloud.
Proceedings of the IEEE 5th International Conference on Cloud Computing Technology and Science, 2013

Improving HPC Application Performance in Cloud through Dynamic Load Balancing.
Proceedings of the 13th IEEE/ACM International Symposium on Cluster, 2013

N-body Simulations with ChaNGa.
Proceedings of the Parallel Science and Engineering Applications - The Charm++ Approach., 2013

Scalable Molecular Dynamics with NAMD.
Proceedings of the Parallel Science and Engineering Applications - The Charm++ Approach., 2013

OpenAtom: Ab initio Molecular Dynamics for Petascale Platforms.
Proceedings of the Parallel Science and Engineering Applications - The Charm++ Approach., 2013

The Charm++ Programming Model.
Proceedings of the Parallel Science and Engineering Applications - The Charm++ Approach., 2013

Designing Charm++ Programs.
Proceedings of the Parallel Science and Engineering Applications - The Charm++ Approach., 2013

Tools for Debugging and Performance Analysis.
Proceedings of the Parallel Science and Engineering Applications - The Charm++ Approach., 2013

2012
"Cool" Load Balancing for High Performance Computing Data Centers.
IEEE Trans. Computers, 2012

Using shared arrays in message-driven parallel programs.
Parallel Comput., 2012

Poster: Evaluation Topology Mapping via Graph Partitioning.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012

Abstract: Evaluating Topology Mapping via Graph Partitioning.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012

Parallelizing Information Set Generation for Game Tree Search Applications.
Proceedings of the IEEE 24th International Symposium on Computer Architecture and High Performance Computing, 2012

Assessing Energy Efficiency of Fault Tolerance Protocols for HPC Systems.
Proceedings of the IEEE 24th International Symposium on Computer Architecture and High Performance Computing, 2012

Scalable Algorithms for Distributed-Memory Adaptive Mesh Refinement.
Proceedings of the IEEE 24th International Symposium on Computer Architecture and High Performance Computing, 2012

Scalable Algorithms for Constructing Balanced Spanning Trees on System-Ranked Process Groups.
Proceedings of the Recent Advances in the Message Passing Interface, 2012

Collectives on Two-Tier Direct Networks.
Proceedings of the Recent Advances in the Message Passing Interface, 2012

Mapping Dense LU Factorization on Multicore Supercomputer Nodes.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

Dynamic Scheduling for Work Agglomeration on Heterogeneous Clusters.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

Simulating the Spread of Infectious Disease over Large Realistic Social Networks Using Charm++.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

Efficient 'Cool Down' of Parallel Applications.
Proceedings of the 41st International Conference on Parallel Processing Workshops, 2012

Cloud Friendly Load Balancing for HPC Applications: Preliminary Work.
Proceedings of the 41st International Conference on Parallel Processing Workshops, 2012

A Hierarchical Approach for Load Balancing on Parallel Multi-core Systems.
Proceedings of the 41st International Conference on Parallel Processing, 2012

Performance Optimization of a Parallel, Two Stage Stochastic Linear Program.
Proceedings of the 18th IEEE International Conference on Parallel and Distributed Systems, 2012

Work stealing and persistence-based load balancers for iterative overdecomposed applications.
Proceedings of the 21st International Symposium on High-Performance Parallel and Distributed Computing, 2012

Exploring the performance and mapping of HPC applications to platforms in the cloud.
Proceedings of the 21st International Symposium on High-Performance Parallel and Distributed Computing, 2012

A scalable double in-memory checkpoint and restart scheme towards exascale.
Proceedings of the IEEE/IFIP International Conference on Dependable Systems and Networks Workshops, 2012

A message-logging protocol for multicore systems.
Proceedings of the IEEE/IFIP International Conference on Dependable Systems and Networks Workshops, 2012

Hiding Checkpoint Overhead in HPC Applications with a Semi-Blocking Algorithm.
Proceedings of the 2012 IEEE International Conference on Cluster Computing, 2012

Automated Load Balancing Invocation Based on Application Characteristics.
Proceedings of the 2012 IEEE International Conference on Cluster Computing, 2012

2011
Car-Parrinello Method.
Proceedings of the Encyclopedia of Parallel Computing, 2011

Sorting.
Proceedings of the Encyclopedia of Parallel Computing, 2011

Combinatorial Search.
Proceedings of the Encyclopedia of Parallel Computing, 2011

NAMD (NAnoscale Molecular Dynamics).
Proceedings of the Encyclopedia of Parallel Computing, 2011

Charm++.
Proceedings of the Encyclopedia of Parallel Computing, 2011

Load Balancing, Distributed Memory.
Proceedings of the Encyclopedia of Parallel Computing, 2011

Programming heterogeneous clusters with accelerators using object-based programming.
Sci. Program., 2011

Parssse: an Adaptive Parallel State Space Search Engine.
Parallel Process. Lett., 2011

Detecting and Using Critical Paths at Runtime in Message Driven Parallel Programs.
Int. J. Netw. Comput., 2011

Periodic hierarchical load balancing for large supercomputers.
Int. J. High Perform. Comput. Appl., 2011

Optimizing communication for Charm++ applications by reducing network contention.
Concurr. Comput. Pract. Exp., 2011

ACM SRC poster: optimizing all-to-all algorithm for PERCS network using simulation.
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, 2011

A 'cool' load balancer for parallel applications.
Proceedings of the Conference on High Performance Computing Networking, 2011

Enabling and scaling biomolecular simulations of 100 million atoms on petascale machines with a multicore-optimized message-driven runtime.
Proceedings of the Conference on High Performance Computing Networking, 2011

Poster: enabling massive parallelism for stochastic optimization.
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, 2011

Avoiding hot-spots on two-level direct networks.
Proceedings of the Conference on High Performance Computing Networking, 2011

An Adaptive Framework for Large-Scale State Space Search.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Temperature Aware Load Balancing for Parallel Applications: Preliminary Work.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Evaluation of Simple Causal Message Logging for Large-Scale Fault Tolerant HPC Systems.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Programming Heterogeneous Systems.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Architectural Constraints to Attain 1 Exaflop/s for Three Scientific Application Classes.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Automatic Handling of Global Variables for Multi-threaded MPI Programs.
Proceedings of the 17th IEEE International Conference on Parallel and Distributed Systems, 2011

Simulation-Based Performance Analysis and Tuning for a Two-Level Directly Connected System.
Proceedings of the 17th IEEE International Conference on Parallel and Distributed Systems, 2011

Heuristic-Based Techniques for Mapping Irregular Communication Graphs to Mesh Topologies.
Proceedings of the 13th IEEE International Conference on High Performance Computing & Communication, 2011

Optimizing multicore performance with message driven execution: A case study.
Proceedings of the 18th International Conference on High Performance Computing, 2011

On the Use of Cluster-Based Partial Message Logging to Improve Fault Tolerance for MPI HPC Applications.
Proceedings of the Euro-Par 2011 Parallel Processing - 17th International Conference, 2011

Dynamic Load Balance for Optimized Message Logging in Fault Tolerant HPC Applications.
Proceedings of the 2011 IEEE International Conference on Cluster Computing (CLUSTER), 2011

2010
Understanding Application Performance via Micro-benchmarks on Three Large Supercomputers: Intrepid, Ranger and Jaguar.
Int. J. High Perform. Comput. Appl., 2010

Optimizing a parallel runtime system for multicore clusters: a case study.
Proceedings of the 2010 TeraGrid Conference, 2010

Scaling Hierarchical N-body Simulations on GPU Clusters.
Proceedings of the Conference on High Performance Computing Networking, 2010

A Comparative Analysis of Load Balancing Algorithms Applied to a Weather Forecast Model.
Proceedings of the 22st International Symposium on Computer Architecture and High Performance Computing, 2010

Debugging Large Scale Applications in a Virtualized Environment.
Proceedings of the Languages and Compilers for Parallel Computing, 2010

Robust non-intrusive record-replay with processor extraction.
Proceedings of the 8th Workshop on Parallel and Distributed Systems: Testing, 2010

Highly scalable parallel sorting.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Static macro data flow: Compiling global control into local control.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Detecting and using critical paths at runtime in message driven parallel programs.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Hierarchical Load Balancing for Charm++ Applications on Large Supercomputers.
Proceedings of the 39th International Conference on Parallel Processing, 2010

Simulating Large Scale Parallel Applications Using Statistical Models for Sequential Execution Blocks.
Proceedings of the 16th IEEE International Conference on Parallel and Distributed Systems, 2010

Optimizing an MPI weather forecasting model via processor virtualization.
Proceedings of the 2010 International Conference on High Performance Computing, 2010

A study of memory-aware scheduling in message driven parallel programs.
Proceedings of the 2010 International Conference on High Performance Computing, 2010

Automated mapping of regular communication graphs on mesh interconnects.
Proceedings of the 2010 International Conference on High Performance Computing, 2010

Automatic MPI to AMPI Program Transformation Using Photran.
Proceedings of the Euro-Par 2010 Parallel Processing Workshops, 2010

Team-Based Message Logging: Preliminary Results.
Proceedings of the 10th IEEE/ACM International Conference on Cluster, 2010

Accelerator Support in the Charm++ Parallel Programming Model.
Proceedings of the Scientific Computing with Multicore and Accelerators., 2010

2009
Quantifying Network Contention on Large Parallel Machines.
Parallel Process. Lett., 2009

Parallel Simulations of Dynamic Fracture Using Extrinsic Cohesive Elements.
J. Sci. Comput., 2009

Early Application Development/Tuning and Application Characterization/ Segmentation.
Int. J. High Perform. Comput. Appl., 2009

Programming Models at Exascale: Adaptive Runtime Systems, Incomplete Simple Languages, and Interoperability.
Int. J. High Perform. Comput. Appl., 2009

Toward Exascale Resilience.
Int. J. High Perform. Comput. Appl., 2009

Towards a framework for abstracting accelerators in parallel applications: experience with cell.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009

Topology aware task mapping techniques: an api and case study.
Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2009

Dynamic high-level scripting in parallel applications.
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

An evaluative study on the effect of contention on message latencies in large supercomputers.
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

Dynamic topology aware load balancing algorithms for molecular dynamics applications.
Proceedings of the 23rd international conference on Supercomputing, 2009

CkDirect: Unsynchronized One-Sided Communication in a Message-Driven Paradigm.
Proceedings of the ICPPW 2009, 2009

Integrated Performance Views in Charm++: Projections Meets TAU.
Proceedings of the ICPP 2009, 2009

Continuous performance monitoring for large-scale parallel applications.
Proceedings of the 16th International Conference on High Performance Computing, 2009

A Case Study of Communication Optimizations on 3D Mesh Interconnects.
Proceedings of the Euro-Par 2009 Parallel Processing, 2009

09191 Abstracts Collection - Fault Tolerance in High-Performance Computing and Grids.
Proceedings of the Fault Tolerance in High-Performance Computing and Grids, 03.05., 2009

2008
Benefits of Topology Aware Mapping for Mesh Interconnects.
Parallel Process. Lett., 2008

Scalable molecular dynamics with NAMD on the IBM Blue Gene/L system.
IBM J. Res. Dev., 2008

Fine-grained parallelization of the Car - Parrinello ab initio molecular dynamics method on the IBM Blue Gene/L supercomputer.
IBM J. Res. Dev., 2008

Parallel adaptive simulations of dynamic fracture events.
Eng. Comput., 2008

A Case Study in Tightly Coupled Multi-paradigm Parallel Programming.
Proceedings of the Languages and Compilers for Parallel Computing, 2008

Memory tagging in Charm++.
Proceedings of the 6th Workshop on Parallel and Distributed Systems: Testing, 2008

Towards scalable performance analysis and visualization through data reduction.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Massively parallel cosmological simulations with ChaNGa.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

NoiseMiner: An algorithm for scalable automatic computational noise and software interference detection.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Overcoming scaling challenges in biomolecular simulations across multiple platforms.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Application-specific topology-aware mapping for three dimensional topologies.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

The Excitement in Parallel Computing.
Proceedings of the High Performance Computing, 2008

2007
Optimizing Distributed Application Performance Using Dynamic Grid Topology-Aware Load Balancing.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

A Fault Tolerance Protocol with Fast Fault Recovery.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

Charisma: orchestrating migratable parallel objects.
Proceedings of the 16th International Symposium on High-Performance Distributed Computing (HPDC-16 2007), 2007

2006
Performance evaluation of automatic checkpoint-based fault tolerance for AMPI and Charm++.
ACM SIGOPS Oper. Syst. Rev., 2006

HPC-Colony: services and interfaces for very large systems.
ACM SIGOPS Oper. Syst. Rev., 2006

Parallelization of a level set method for simulating dendritic growth.
J. Parallel Distributed Comput., 2006

Scaling applications to massively parallel machines using Projections performance analysis tool.
Future Gener. Comput. Syst., 2006

ParFUM: a parallel framework for unstructured meshes for scalable dynamic physics applications.
Eng. Comput., 2006

Scalable Cosmological Simulations on Parallel Machines.
Proceedings of the High Performance Computing for Computational Science, 2006

Poster reception - Charm++ simplifies coding for the cell processor.
Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

Poster reception - Cosmological simulations on supercomputers.
Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

Performance evaluation of adaptive MPI.
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2006

Achieving strong scaling with NAMD on Blue Gene/L.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

New parallel programming abstractions and the role of compilers.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

Support for adaptivity in ARMCI using migratable objects.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

Topology-aware task mapping for reducing communication contention on large parallel machines.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

Multiple Flows of Control in Migratable Parallel Programs.
Proceedings of the 2006 International Conference on Parallel Processing Workshops (ICPP Workshops 2006), 2006

Proactive Fault Tolerance in MPI Applications Via Task Migration.
Proceedings of the High Performance Computing, 2006

Parallel Computational Biology.
Proceedings of the Parallel Processing for Scientific Computing, 2006

2005
Scalable molecular dynamics with NAMD.
J. Comput. Chem., 2005

Simulation-Based Performance Prediction for Large Parallel Machines.
Int. J. Parallel Program., 2005

Parallel VHDL simulation.
Proceedings of the 37th Winter Simulation Conference, Orlando, FL, USA, December 4-7, 2005, 2005

Scaling an optimistic parallel simulation of large-scale interconnection networks.
Proceedings of the 37th Winter Simulation Conference, Orlando, FL, USA, December 4-7, 2005, 2005

Performance Prediction Using Simulation of Large-Scale Interconnection Networks in POSE.
Proceedings of the 19th Workshop on Parallel and Distributed Simulation, 2005

Using Message-Driven Objects to Mask Latency in Grid Computing Applications.
Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

Improved Point-to-Point and Collective Communication Performance with Output-Queued High-Radix Routers.
Proceedings of the High Performance Computing, 2005

2004
Performance and modularity benefits of message-driven execution.
J. Parallel Distributed Comput., 2004

Scalable fine-grained parallelization of plane-wave-based ab initio molecular dynamics for large supercomputers.
J. Comput. Chem., 2004

An orchestration language for parallel objects.
Proceedings of the 7th workshop on Workshop on languages, 2004

MSA: Multiphase Specifically Shared Arrays.
Proceedings of the Languages and Compilers for High Performance Computing, 2004

Performance Modeling and Programming Environments for Petaflops Computers and the Blue Gene Machine.
Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), 2004

BigSim: A Parallel Simulator for Performance Prediction of Extremely Large Parallel Machines.
Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), 2004

Opportunities and Challenges of Modern Communication Architectures: Case Study with QsNet.
Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), 2004

Debugging Support for Charm++.
Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), 2004

A Fault Tolerant Protocol for Massively Parallel Systems.
Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), 2004

POSE: Getting Over Grainsize in Parallel Discrete Event Simulation.
Proceedings of the 33rd International Conference on Parallel Processing (ICPP 2004), 2004

Faucets: Efficient Resource Allocation on the Computational Grid.
Proceedings of the 33rd International Conference on Parallel Processing (ICPP 2004), 2004

Scaling All-to-All Multicast on Fat-tree Networks.
Proceedings of the 10th International Conference on Parallel and Distributed Systems, 2004

FTC-Charm++: an in-memory checkpoint-based fault tolerant runtime for Charm++ and MPI.
Proceedings of the 2004 IEEE International Conference on Cluster Computing (CLUSTER 2004), 2004

2003
Supporting dynamic parallel object arrays.
Concurr. Comput. Pract. Exp., 2003

Adaptive MPI.
Proceedings of the Languages and Compilers for Parallel Computing, 2003

A Framework for Collective Personalized Communication.
Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), 2003

Scaling Molecular Dynamics to 3000 Processors with Projections: A Performance Analysis Case Study.
Proceedings of the Computational Science - ICCS 2003, 2003

Jade: A Parallel Message-Driven Java.
Proceedings of the Computational Science - ICCS 2003, 2003

2002
NAMD: biomolecular simulation on thousands of processors.
Proceedings of the 2002 ACM/IEEE conference on Supercomputing, 2002

A Parallel-Object Programming Model for PetaFLOPS Machines and Blue Gene/Cyclops.
Proceedings of the 16th International Parallel and Distributed Processing Symposium (IPDPS 2002), 2002

A voxel-based parallel collision detection algorithm.
Proceedings of the 16th international conference on Supercomputing, 2002

A Malleable-Job System for Timeshared Parallel Machines.
Proceedings of the 2nd IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2002), 2002

2001
An Interface Model for Parallel Components.
Proceedings of the Languages and Compilers for Parallel Computing, 2001

Emulating PetaFLOPS Machines and Blue Gene.
Proceedings of the 15th International Parallel & Distributed Processing Symposium (IPDPS-01), 2001

Adaptive Load Balancing for MPI Programs.
Proceedings of the Computational Science - ICCS 2001, 2001

2000
Scalable Molecular Dynamics for Large Biomolecular Systems.
Proceedings of the Proceedings Supercomputing 2000, 2000

Workshop on Run-Time Systems for Parallel Programming (RTSPP).
Proceedings of the Parallel and Distributed Processing, 2000

Run-Time Support for Adaptive Load Balancing.
Proceedings of the Parallel and Distributed Processing, 2000

A New Approach to Software Integration Frameworks for Multi-physics Simulation Codes.
Proceedings of the Architecture of Scientific Software, 2000

A Parallel Framework for Explicit FEM.
Proceedings of the High Performance Computing, 2000

1999
Multilingual Debugging Support for Data-Driven and Thread-Based Parallel Languages.
Proceedings of the Languages and Compilers for Parallel Computing, 1999

Branch and Bound Based Load Balancing for Parallel Applications.
Proceedings of the Computing in Object-Oriented Parallel Environments, 1999

Application Performance of a Linux Cluster Using Converse.
Proceedings of the Parallel and Distributed Processing, 1999

Avoiding Algorithmic Obfuscation in a Message-Driven Parallel MD Code.
Proceedings of the Computational Molecular Dynamics: Challenges, Methods, Ideas, 1999

1998
Static Networks: A Powerful and Elegant Exteansion to Concurrent Object-Oriented Languages.
Proceedings of the Computing in Object-Oriented Parallel Environments, 1998

Load Balancing in Parallel Molecular Dynamics.
Proceedings of the Solving Irregularly Structured Problems in Parallel, 1998

Multiparadigm, Multilingual Interoperability: Experience with Converse.
Proceedings of the Parallel and Distributed Processing, 10 IPPS/SPDP'98 Workshops Held in Conjunction with the 12th International Parallel Processing Symposium and 9th Symposium on Parallel and Distributed Processing, Orlando, Florida, USA, March 30, 1998

1997
Design and Implementation of Parallel Java with Global Object Space.
Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 1997

NAMD: A Case Study in Multilingual Parallel Programming.
Proceedings of the Languages and Compilers for Parallel Computing, 1997

1996
NAMD: a Parallel, Object-Oriented Molecular Dynamics Program.
Int. J. High Perform. Comput. Appl., 1996

Automating Runtime Optimizations for Load Balancing in Irregular Problems.
Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 1996

Threads for Interoperable Parallel Programming.
Proceedings of the Languages and Compilers for Parallel Computing, 1996

Converse: An Interoperable Framework for Parallel Programming.
Proceedings of IPPS '96, 1996

Automating Parallel Runtime Optimizations Using Post-Mortem Analysis.
Proceedings of the 10th international conference on Supercomputing, 1996

Towards Automatic Performance Analysis.
Proceedings of the 1996 International Conference on Parallel Processing, 1996

Simulating Message-Driven Programs.
Proceedings of the 1996 International Conference on Parallel Processing, 1996

Structured Dagger: A Coordination Language for Message-Driven Programming.
Proceedings of the Euro-Par '96 Parallel Processing, 1996

1995
Efficient Parallel Graph Coloring with Prioritization.
Proceedings of the Parallel Symbolic Languages and Systems, 1995

Modularity, Reuse and Efficiency with Message-Driven Libraries.
Proceedings of the Seventh SIAM Conference on Parallel Processing for Scientific Computing, 1995

Agents: An Undistorted Representation of Problem Structure.
Proceedings of the Languages and Compilers for Parallel Computing, 1995

Compiling Portable Message-Driven Programs.
Proceedings of the 1995 International Conference on Parallel Processing, 1995

A Parallel Adaptive Fast Multipole Algorithm for<i>n</i>-Body Problems.
Proceedings of the 1995 International Conference on Parallel Processing, 1995

1994
Machine Independent AND and OR Parallel Execution of Logic Programs: Part II-Compiled Execution.
IEEE Trans. Parallel Distributed Syst., 1994

Machine Independent AND and OR Parallel Execution of Logic Programs: Part I-The Binding Environment.
IEEE Trans. Parallel Distributed Syst., 1994

Information Sharing Mechanisms in Parallel Programs.
Proceedings of the 8th International Symposium on Parallel Processing, 1994

Dagger: Combining Benefits of Synchronous and Asynchronous Communication Styles.
Proceedings of the 8th International Symposium on Parallel Processing, 1994

1993
Efficient implementation of concurrent object-oriented programs.
Proceedings of the Addendum to the Proceedings on Object-Oriented Programming Systems, 1993

CHARM++: A Portable Concurrent Object Oriented System Based On C++.
Proceedings of the Eighth Annual Conference on Object-Oriented Programming Systems, 1993

Loop Transformations for Prolog Programs.
Proceedings of the Languages and Compilers for Parallel Computing, 1993

A Load Balancing Strategy for Prioritized Execution of Tasks.
Proceedings of the Seventh International Parallel Processing Symposium, 1993

A Comparison Based Parallel Sorting Algorithm.
Proceedings of the 1993 International Conference on Parallel Processing, 1993

1992
A join algorithm for combining AND parallel solutions in AND/OR parallel systems.
Int. J. Parallel Program., 1992

Prioritization in Parallel Symbolic Computing.
Proceedings of the Parallel Symbolic Computing: Languages, 1992

Estimating the Ingerent Parallelism in Prolog Programs.
Proceedings of the International Conference on Fifth Generation Computer Systems. FGCS 1992, 1992

1991
Chare Kernel - a Runtime Support System for Parallel Computations.
J. Parallel Distributed Comput., 1991

The Reduce-Or Process Model for Parallel Execution of Logic Programs.
J. Log. Program., 1991

High level support for divide-and-conquer parallelism.
Proceedings of the Proceedings Supercomputing '91, 1991

Implementation of a Parallel Prolog Interpreter on Multiprocessors.
Proceedings of the Fifth International Parallel Processing Symposium, Proceedings, Anaheim, California, USA, April 30, 1991

Fortran-Style Transformations for Functional Programs.
Proceedings of the International Conference on Parallel Processing, 1991

Supporting Machine Independent Programming on Diverse Parallel Architectures.
Proceedings of the International Conference on Parallel Processing, 1991

1990
An Almost Perfect Heuristic for the N Nonattacking Queens Problem.
Inf. Process. Lett., 1990

Parallel state-space search for a first solution with consistent linear speedups.
Int. J. Parallel Program., 1990

Joining AND Parallel Solutions in AND/OR Parallel Systems.
Proceedings of the Logic Programming, Proceedings of the 1990 North American Conference, Austin, Texas, USA, October 29, 1990

A Chare Kernel Implementation of a Parallel Prolog Compiler.
Proceedings of the Second ACM SIGPLAN Symposium on Princiles & Practice of Parallel Programming (PPOPP), 1990

The Chare Kernel Parallel Programming Language and System.
Proceedings of the 1990 International Conference on Parallel Processing, 1990

Consistent Linear Speedups to a First Solution in Parallel State-Space Search.
Proceedings of the 8th National Conference on Artificial Intelligence. Boston, Massachusetts, USA, July 29, 1990

1989
Obtaining First Solutions Faster in AND-OR Parallel Execution of Logic Programs.
Proceedings of the Logic Programming, 1989

Compiled Execution of the Reduce-OR Process Model on Multiprocessors.
Proceedings of the Logic Programming, 1989

A dynamic scheduling strategy for the Chare-Kernel system.
Proceedings of the Proceedings Supercomputing '89, Reno, NV, USA, November 12-17, 1989, 1989

An Abstract Machine for the Reduce-OR Process Model for Parallel Prolog.
Proceedings of the Knowledge Based Computer Systems, 1989

The Chare-Kernel Base Language: Preliminary Performance Results.
Proceedings of the International Conference on Parallel Processing, 1989

A Specialized Expert System for Judicial Decision Support.
Proceedings of the Second International Conference on Artificial Intelligence and Law, 1989

The Mesh Superceded?
Proceedings of the Computer Trends in the 1990s, 1989

1988
OR parallel execution of Prolog programs with side effects.
J. Supercomput., 1988

Comparing the Performance of Two Dynamic Load Distribution Methods.
Proceedings of the International Conference on Parallel Processing, 1988

A Memory Organization Independent Binding Environment for AND and OR Parallel Execution of Logic Programs.
Proceedings of the Logic Programming, 1988

Prolog at the University of Illinois.
Proceedings of the COMPCON'88, Digest of Papers, Thirty-Third IEEE Computer Society International Conference, San Francisco, California, USA, February 29, 1988

A Tree Representation for Parallel Problem Solving.
Proceedings of the 7th National Conference on Artificial Intelligence, 1988

1987
'Completeness' and 'Full Parallelism' of Parallel Logic Programming Schemes.
Proceedings of the 1987 Symposium on Logic Programming, San Francisco, California, USA, August 31, 1987

The REDUCE-OR Process Model for Parallel Evaluation of Logic Programs.
Proceedings of the Logic Programming, 1987

1986
Optimal Communication Neighborhoods.
Proceedings of the International Conference on Parallel Processing, 1986

1985
Lattice-Mesh: A Multi-Bus Architecture.
Proceedings of the International Conference on Parallel Processing, 1985

1984
Executing Distributed Prolog Programs on a Broadcast Network.
Proceedings of the 1984 International Symposium on Logic Programming, 1984

A Class of Architectures for a Prolog Machine.
Proceedings of the Second International Logic Programming Conference, 1984