Bronis R. de Supinski

Orcid: 0000-0002-0339-1006

According to our database1, Bronis R. de Supinski authored at least 211 papers between 1999 and 2023.

Collaborative distances:

Awards

ACM Fellow

ACM Fellow 2022, "For contributions to the design of large-scale systems and their programming systems and software".

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2023
Machine Learning-Driven Adaptive OpenMP For Portable Performance on Heterogeneous Systems.
CoRR, 2023

LM4HPC: Towards Effective Language Model Application in High-Performance Computing.
Proceedings of the OpenMP: Advanced Task-Based, Device and Compiler Programming, 2023

2022
Data-driven global weather predictions at high resolutions.
Int. J. High Perform. Comput. Appl., 2022

An analytical performance model of generalized hierarchical scheduling.
Int. J. High Perform. Comput. Appl., 2022

Extending OpenMP to Support Automated Function Specialization Across Translation Units.
Proceedings of the OpenMP in a Modern World: From Multi-device Support to Meta Programming, 2022

Scalable Composition and Analysis Techniques for Massive Scientific Workflows.
Proceedings of the 18th IEEE International Conference on e-Science, 2022

2021
Mitigating Inter-Job Interference via Process-Level Quality-of-Service.
ACM Trans. Parallel Comput., 2021

Special Issue Introduction: The Gordon Bell Special Prize for HPC-Based COVID-19 Research Finalists.
Int. J. High Perform. Comput. Appl., 2021

Extending OpenMP for Machine Learning-Driven Adaptation.
Proceedings of the Accelerator Programming Using Directives - 8th International Workshop, 2021

Beyond Explicit Transfers: Shared and Managed Memory in OpenMP.
Proceedings of the OpenMP: Enabling Massive Node-Level Parallelism, 2021

Inter-loop optimization in RAJA using loop chains.
Proceedings of the ICS '21: 2021 International Conference on Supercomputing, 2021

Monitoring Large Scale Supercomputers: A Case Study with the Lassen Supercomputer.
Proceedings of the IEEE International Conference on Cluster Computing, 2021

2020
Unified Sequential Optimization Directives in OpenMP.
Proceedings of the OpenMP: Portable Multi-Level Parallelism on Modern Systems, 2020

2019
Statistical and machine learning models for optimizing energy in parallel applications.
Int. J. High Perform. Comput. Appl., 2019


Ompparser: A Standalone and Unified OpenMP Parser.
Proceedings of the OpenMP: Conquering the Full Hardware Spectrum, 2019

A Framework for Enabling OpenMP Autotuning.
Proceedings of the OpenMP: Conquering the Full Hardware Spectrum, 2019

Making OpenMP Ready for C++ Executors.
Proceedings of the OpenMP: Conquering the Full Hardware Spectrum, 2019

Extending OpenMP Metadirective Semantics for Runtime Adaptation.
Proceedings of the OpenMP: Conquering the Full Hardware Spectrum, 2019

2018
The Ongoing Evolution of OpenMP.
Proc. IEEE, 2018

Big data and extreme-scale computing.
Int. J. High Perform. Comput. Appl., 2018


Energy efficiency modeling of parallel applications.
Proceedings of the International Conference for High Performance Computing, 2018

Extending OpenMP to Facilitate Loop Optimization.
Proceedings of the Evolving OpenMP for Evolving Architectures, 2018

A Study of Network Quality of Service in Many-Core MPI Applications.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium Workshops, 2018

2017
ALEA: A Fine-Grained Energy Profiling Tool.
ACM Trans. Archit. Code Optim., 2017

SCALO: Scalability-Aware Parallelism Orchestration for Multi-Threaded Workloads.
ACM Trans. Archit. Code Optim., 2017

A survey on software methods to improve the energy efficiency of parallel computing.
Int. J. High Perform. Comput. Appl., 2017

Application Modernization for the Exascale Era.
Comput. Sci. Eng., 2017

Application Modernization at LLNL and the Sierra Center of Excellence.
Comput. Sci. Eng., 2017

A Bottleneck-Centric Tuning Policy for Optimizing Energy in Parallel Programs.
Proceedings of the Parallel Computing is Everywhere, 2017

Custom Data Mapping for Composable Data Management.
Proceedings of the Scaling OpenMP for Exascale Performance and Portability, 2017

Directive-Based Partitioning and Pipelining for Graphics Processing Units.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

2016
Exploiting Redundancy and Application Scalability for Cost-Effective, Time-Constrained Execution of HPC Applications on Amazon EC2.
IEEE Trans. Parallel Distributed Syst., 2016

Evaluating and extending user-level fault tolerance in MPI applications.
Int. J. High Perform. Comput. Appl., 2016

Economic Viability of Hardware Overprovisioning in Power-Constrained High Performance Computing.
Proceedings of the 4th International Workshop on Energy Efficient Supercomputing, 2016

Runtime Correctness Analysis of MPI-3 Nonblocking Collectives.
Proceedings of the 23rd European MPI Users' Group Meeting, EuroMPI 2016, 2016

Approaches for Task Affinity in OpenMP.
Proceedings of the OpenMP: Memory, Devices, and Tasks, 2016

A Case for Extending Task Dependencies.
Proceedings of the OpenMP: Memory, Devices, and Tasks, 2016


Transactional Memory for Algebraic Multigrid Smoothers.
Proceedings of the OpenMP: Memory, Devices, and Tasks, 2016

I/O Aware Power Shifting.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

MPMD Framework for Offloading Load Balance Computation.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

Directive-Based Pipelining Extension for OpenMP.
Proceedings of the 2016 IEEE International Conference on Cluster Computing, 2016

A scalable and composable map-reduce system.
Proceedings of the 2016 IEEE International Conference on Big Data (IEEE BigData 2016), 2016

2015
CoreTSAR: Core Task-Size Adapting Runtime.
IEEE Trans. Parallel Distributed Syst., 2015

Diagnosis of Performance Faults in LargeScale MPI Applications via Probabilistic Progress-Dependence Inference.
IEEE Trans. Parallel Distributed Syst., 2015

Debugging high-performance computing applications at massive scales.
Commun. ACM, 2015

A Run-Time System for Power-Constrained HPC Applications.
Proceedings of the High Performance Computing - 30th International Conference, 2015

The Spack package manager: bringing order to HPC software chaos.
Proceedings of the International Conference for High Performance Computing, 2015

Decoupled load balancing.
Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2015

Supporting multiple accelerators in high-level programming models.
Proceedings of the Sixth International Workshop on Programming Models and Applications for Multicores and Manycores, 2015

HpMC: An Energy-aware Management System of Multi-level Memory Architectures.
Proceedings of the 2015 International Symposium on Memory Systems, 2015

Another Trip to the Wall: How Much Will Stacked DRAM Benefit HPC?
Proceedings of the 2015 International Symposium on Memory Systems, 2015

Supporting Indirect Data Mapping in OpenMP.
Proceedings of the OpenMP: Heterogenous Execution and Data Movements, 2015

Enabling Region Merging Optimizations in OpenMP.
Proceedings of the OpenMP: Heterogenous Execution and Data Movements, 2015

Towards Task-Parallel Reductions in OpenMP.
Proceedings of the OpenMP: Heterogenous Execution and Data Movements, 2015

Practical Resource Management in Power-Constrained, High Performance Computing.
Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing, 2015

Event-Action Mappings for Parallel Tools Infrastructures.
Proceedings of the Euro-Par 2015: Parallel Processing, 2015

ALEA: Fine-Grain Energy Profiling with Basic Block Sampling.
Proceedings of the 2015 International Conference on Parallel Architectures and Compilation, 2015

2014
Detailed Modeling and Evaluation of a Scalable Multilevel Checkpointing System.
IEEE Trans. Parallel Distributed Syst., 2014

CoreTSAR: Adaptive Worksharing for Heterogeneous Systems.
Proceedings of the Supercomputing - 29th International Conference, 2014

Evaluating User-Level Fault Tolerance for MPI Applications.
Proceedings of the 21st European MPI Users' Group Meeting, 2014

Towards Transactional Memory for OpenMP.
Proceedings of the Using and Improving OpenMP for Devices, Tasks, and More, 2014

On the Algorithmic Aspects of Using OpenMP Synchronization Mechanisms: The Effects of Transactional Memory.
Proceedings of the Using and Improving OpenMP for Devices, Tasks, and More, 2014

FMI: Fault Tolerant Messaging Interface for Fast and Transparent Recovery.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

Load balancing n-body simulations with highly non-uniform density.
Proceedings of the 2014 International Conference on Supercomputing, 2014

MPI Runtime Error Detection with MUST: A Scalable and Crash-Safe Approach.
Proceedings of the 43rd International Conference on Parallel Processing Workshops, 2014

Adaptive Configuration Selection for Power-Constrained Heterogeneous Systems.
Proceedings of the 43rd International Conference on Parallel Processing, 2014

Exploiting redundancy for cost-effective, time-constrained execution of HPC applications on amazon EC2.
Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing, 2014

Memory Usage Optimizations for Online Event Analysis.
Proceedings of the Solving Software Challenges for Exascale, 2014

A User-Level InfiniBand-Based File System and Checkpoint Strategy for Burst Buffers.
Proceedings of the 14th IEEE/ACM International Symposium on Cluster, 2014

2013
Strategies for Energy-Efficient Resource Management of Hybrid Programming Models.
IEEE Trans. Parallel Distributed Syst., 2013

Characterizing and mitigating work time inflation in task parallel programs.
Sci. Program., 2013

McrEngine: A scalable checkpointing system using data-aware aggregation and compression.
Sci. Program., 2013

MPI runtime error detection with MUST: Advances in deadlock detection.
Sci. Program., 2013

Parallelizing heavyweight debugging tools with mpiecho.
Parallel Comput., 2013

LIBI: A framework for bootstrapping extreme scale software systems.
Parallel Comput., 2013

Trellis: Portability across architectures with a high-level framework.
J. Parallel Distributed Comput., 2013

Distributed wait state tracking for runtime MPI deadlock detection.
Proceedings of the International Conference for High Performance Computing, 2013

Runtime MPI collective checking with tree-based overlay networks.
Proceedings of the 20th European MPI Users's Group Meeting, 2013

Early Experiences with the OpenMP Accelerator Model.
Proceedings of the OpenMP in the Era of Low Power Devices and Accelerators, 2013

HPPAC Introduction.
Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013

Efficient and Scalable Retrieval Techniques for Global File Properties.
Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

Exploring hardware overprovisioning in power-constrained, high performance computing.
Proceedings of the International Conference on Supercomputing, 2013

Automatically adapting programs for mixed-precision floating-point computation.
Proceedings of the International Conference on Supercomputing, 2013

Massively parallel loading.
Proceedings of the International Conference on Supercomputing, 2013

Intralayer Communication for Tree-Based Overlay Networks.
Proceedings of the 42nd International Conference on Parallel Processing, 2013

A comparative study of high-performance computing on the cloud.
Proceedings of the 22nd International Symposium on High-Performance Parallel and Distributed Computing, 2013

Alignment-Based Metrics for Trace Comparison.
Proceedings of the Euro-Par 2013 Parallel Processing, 2013

Topic 1: Support Tools and Environments - (Introduction).
Proceedings of the Euro-Par 2013 Parallel Processing, 2013

Optimizing power allocation to CPU and memory subsystems in overprovisioned HPC systems.
Proceedings of the 2013 IEEE International Conference on Cluster Computing, 2013

2012
Critical path-based thread placement for NUMA systems.
SIGMETRICS Perform. Evaluation Rev., 2012

Design and modeling of a non-blocking checkpointing system.
Proceedings of the SC Conference on High Performance Computing Networking, 2012

Poster: Automatically Adapting Programs for Mixed-Precision Floating-Point Computation.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012

Abstract: Automatically Adapting Programs for Mixed-Precision Floating-Point Computation.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012

Abstract: Slack-Conscious Lightweight Loop Scheduling for Improving Scalability of Bulk-synchronous MPI Applications.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012

Poster: Evaluation Topology Mapping via Graph Partitioning.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012

Abstract: Evaluating Topology Mapping via Graph Partitioning.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012

MPI Runtime Error Detection with MUST: Advanced Error Reports.
Proceedings of the Tools for High Performance Computing 2012, 2012

A Case for Including Transactions in OpenMP II: Hardware Transactional Memory.
Proceedings of the OpenMP in a Heterogeneous World - 8th International Workshop on OpenMP, 2012

The myrmics memory allocator: hierarchical, message-passing allocation for global address spaces.
Proceedings of the International Symposium on Memory Management, 2012

HPPAC Introduction.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

Heterogeneous Task Scheduling for Accelerated OpenMP.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

Beyond DVFS: A First Look at Performance under a Hardware-Enforced Power Bound.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

Holistic Debugging of MPI Derived Datatypes.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

Designing Non-blocking Allreduce with Collective Offload on InfiniBand Clusters: A Case Study with Conjugate Gradient Solvers.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

GTI: A Generic Tools Infrastructure for Event-Based Tools in Parallel Systems.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

Scalable Critical-Path Based Performance Analysis.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

Model-based, memory-centric performance and power optimization on NUMA multiprocessors.
Proceedings of the 2012 IEEE International Symposium on Workload Characterization, 2012

Quantifying the effectiveness of load balance algorithms.
Proceedings of the International Conference on Supercomputing, 2012

Integrated in-system storage architecture for high performance computing.
Proceedings of the 2nd International Workshop on Runtime and Operating Systems for Supercomputers, 2012

Fault resilience of the algebraic multi-grid solver.
Proceedings of the International Conference on Supercomputing, 2012

Mechanisms and Evaluation of Cross-Layer Fault-Tolerance for Supercomputing.
Proceedings of the 41st International Conference on Parallel Processing, 2012

Asynchronous checkpoint migration with MRNet in the Scalable Checkpoint / Restart Library.
Proceedings of the IEEE/IFIP International Conference on Dependable Systems and Networks Workshops, 2012

Automatic fault characterization via abnormality-enhanced classification.
Proceedings of the IEEE/IFIP International Conference on Dependable Systems and Networks, 2012

Probabilistic diagnosis of performance faults in large-scale parallel applications.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

2011
The scalable process topology interface of MPI 2.2.
Concurr. Comput. Pract. Exp., 2011

Formal analysis of MPI-based parallel programs.
Commun. ACM, 2011

Large scale debugging of parallel tasks with AutomaDeD.
Proceedings of the Conference on High Performance Computing Networking, 2011

Exascale Algorithms for Generalized MPI_Comm_split.
Proceedings of the Recent Advances in the Message Passing Interface, 2011

Order Preserving Event Aggregation in TBONs.
Proceedings of the Recent Advances in the Message Passing Interface, 2011

OpenMP for Accelerators.
Proceedings of the OpenMP in the Petascale Era - 7th International Workshop on OpenMP, 2011

Reconciling Sampling and Direct Instrumentation for Unintrusive Call-Path Profiling of MPI Programs.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Exploiting Data Similarity to Reduce Memory Footprints.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Practical performance prediction under Dynamic Voltage Frequency Scaling.
Proceedings of the 2011 International Green Computing Conference and Workshops, 2011

Scalable memory registration for high performance networks using helper threads.
Proceedings of the 8th Conference on Computing Frontiers, 2011

Large Scale Verification of MPI Programs Using Lamport Clocks with Lazy Update.
Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011

2010
Transforming MPI source code based on communication patterns.
Future Gener. Comput. Syst., 2010

A Scalable and Distributed Dynamic Formal Verifier for MPI Programs.
Proceedings of the Conference on High Performance Computing Networking, 2010

Design, Modeling, and Evaluation of a Scalable Multi-level Checkpointing System.
Proceedings of the Conference on High Performance Computing Networking, 2010

Efficient MPI Support for Advanced Hybrid Programming Models.
Proceedings of the Recent Advances in the Message Passing Interface, 2010

ScalaTrace: Tracing, Analysis and Modeling of HPC Codes at Scale.
Proceedings of the Applied Parallel and Scientific Computing, 2010

Towards an Error Model for OpenMP.
Proceedings of the Beyond Loop Level Parallelism in OpenMP: Accelerators, 2010

A Case for Including Transactions in OpenMP.
Proceedings of the Beyond Loop Level Parallelism in OpenMP: Accelerators, 2010

A ROSE-Based OpenMP 3.0 Research Compiler Supporting Multiple Runtime Libraries.
Proceedings of the Beyond Loop Level Parallelism in OpenMP: Accelerators, 2010

A Proposal for User-Defined Reductions in OpenMP.
Proceedings of the Beyond Loop Level Parallelism in OpenMP: Accelerators, 2010

Hybrid MPI/OpenMP power-aware computing.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Power-aware MPI task aggregation prediction for high-end computing systems.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Using focused regression for accurate time-constrained scaling of scientific applications.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Clustering performance data efficiently at massive scales.
Proceedings of the 24th International Conference on Supercomputing, 2010

Exploitation of Dynamic Communication Patterns through Static Analysis.
Proceedings of the 39th International Conference on Parallel Processing, 2010

Comparing Scalability Prediction Strategies on an SMP of CMPs.
Proceedings of the Euro-Par 2010 - Parallel Processing, 16th International Euro-Par Conference, Ischia, Italy, August 31, 2010

AutomaDeD: Automata-based debugging for dissimilar parallel tasks.
Proceedings of the 2010 IEEE/IFIP International Conference on Dependable Systems and Networks, 2010

Minimizing MPI Resource Contention in Multithreaded Multicore Environments.
Proceedings of the 2010 IEEE International Conference on Cluster Computing, 2010

2009
ScalaTrace: Scalable compression and replay of communication traces for high-performance computing.
J. Parallel Distributed Comput., 2009

CLOMP: Accurately Characterizing OpenMP Application Overheads.
Int. J. Parallel Program., 2009

Scalable temporal order analysis for large scale debugging.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009

MUST: A Scalable Approach to Runtime Error Detection in MPI Programs.
Proceedings of the Tools for High Performance Computing 2009, 2009

PSMalloc: content based memory management for MPI applications.
Proceedings of the 10th workshop on MEmory performance, 2009

Machine learning based online performance prediction for runtime parallelization and task scheduling.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2009

Adagio: making DVS practical for complex HPC applications.
Proceedings of the 23rd international conference on Supercomputing, 2009

A graph based approach for MPI deadlock detection.
Proceedings of the 23rd international conference on Supercomputing, 2009

2008
Efficient architectural design space exploration via predictive modeling.
ACM Trans. Archit. Code Optim., 2008

BlueGene/L applications: Parallelism On a Massive Scale.
Int. J. High Perform. Comput. Appl., 2008

Lessons learned at 208K: towards debugging millions of cores.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2008

Scalable load-balance measurement for SPMD codes.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2008

On the Performance of Transparent MPI Piggyback Messages.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2008

Preserving time in large-scale communication traces.
Proceedings of the 22nd Annual International Conference on Supercomputing, 2008

Soft error vulnerability of iterative linear algebra methods.
Proceedings of the 22nd Annual International Conference on Supercomputing, 2008

A regression-based approach to scalability prediction.
Proceedings of the 22nd Annual International Conference on Supercomputing, 2008

Detecting Patterns in MPI Communication Traces.
Proceedings of the 2008 International Conference on Parallel Processing, 2008

Overcoming Scalability Challenges for Tool Daemon Launching.
Proceedings of the 2008 International Conference on Parallel Processing, 2008

Using MPI Communication Patterns to Guide Source Code Transformations.
Proceedings of the Computational Science, 2008

Prediction models for multi-dimensional power-performance optimization on many cores.
Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, 2008

2007
METRIC: Memory tracing via dynamic binary rewriting to identify cache inefficiencies.
ACM Trans. Program. Lang. Syst., 2007

Dynamic Binary Instrumentation and Data Aggregation on Large Scale Systems.
Int. J. Parallel Program., 2007

Complete Formal Specification of the OpenMP Memory Model.
Int. J. Parallel Program., 2007

Predicting parallel application performance via machine learning approaches.
Concurr. Comput. Pract. Exp., 2007

P<sup><i>N</i></sup>MPI tools: a whole lot greater than the sum of their parts.
Proceedings of the ACM/IEEE Conference on High Performance Networking and Computing, 2007

Bounding energy consumption in large-scale MPI programs.
Proceedings of the ACM/IEEE Conference on High Performance Networking and Computing, 2007

Methods of inference and learning for performance modeling of parallel applications.
Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2007

Benchmarking the Stack Trace Analysis Tool for BlueGene/L.
Proceedings of the Parallel Computing: Architectures, 2007

Scalable Compression and Replay of Communication Traces in Massively P arallel E nvironments.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

Stack Trace Analysis for Large Scale Debugging.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

Pynamic: the Python Dynamic Benchmark.
Proceedings of the IEEE 10th International Symposium on Workload Characterization, 2007

Practical Differential Profiling.
Proceedings of the Euro-Par 2007, 2007

Identifying energy-efficient concurrency levels using machine learning.
Proceedings of the 2007 IEEE International Conference on Cluster Computing, 2007

2006
Analysis of cache-coherence bottlenecks with hybrid hardware/software techniques.
ACM Trans. Archit. Code Optim., 2006

Poster reception - Scalable compression and replay of communication traces in massively parallel environments.
Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

Gordon Bell finalists I - Large-scale electronic structure calculations of high-Z metals on the BlueGene/L platform.
Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

Poster reception - Patterns in parallel programs: toward high-level understanding of large-scale traces.
Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

Formal Specification of the OpenMP Memory Model.
Proceedings of the OpenMP Shared Memory Parallel Programming - International Workshops, 2006

Improving distributed memory applications testing by message perturbation.
Proceedings of the 4th Workshop on Parallel and Distributed Systems: Testing, 2006

Dynamic program phase detection in distributed shared-memory multiprocessors.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

A Flexible and Dynamic Infrastructure for MPI Tool Interoperability.
Proceedings of the 2006 International Conference on Parallel Processing (ICPP 2006), 2006

Exploring Unexpected Behavior in MPI.
Proceedings of the High Performance Computing and Communications, 2006

Topic 1: Support Tools and Environments.
Proceedings of the Euro-Par 2006, Parallel Processing, 12th International Euro-Par Conference, Dresden, Germany, August 28, 2006

Toward Enhancing OpenMP's Work-Sharing Directives.
Proceedings of the Euro-Par 2006, Parallel Processing, 12th International Euro-Par Conference, Dresden, Germany, August 28, 2006

Efficiently exploring architectural design spaces via predictive modeling.
Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems, 2006

2005
Scalable dynamic binary instrumentation for Blue Gene/L.
SIGARCH Comput. Archit. News, 2005

Evaluating high-performance computers.
Concurr. Pract. Exp., 2005

Large-Scale First-Principles Molecular Dynamics simulations on the BlueGene/L Platform using the Qbox code.
Proceedings of the ACM/IEEE SC2005 Conference on High Performance Networking and Computing, 2005

Tera-Scalable Algorithms for Variable-Density Elliptic Hydrodynamics with Spectral Accuracy.
Proceedings of the ACM/IEEE SC2005 Conference on High Performance Networking and Computing, 2005

The OpenMP Memory Model.
Proceedings of the OpenMP Shared Memory Parallel Programming - International Workshops, 2005

Improving the computational intensity of unstructured mesh applications.
Proceedings of the 19th Annual International Conference on Supercomputing, 2005

A hybrid hardware/software approach to efficiently determine cache coherence Bottlenecks.
Proceedings of the 19th Annual International Conference on Supercomputing, 2005


An Approach to Performance Prediction for Parallel Applications.
Proceedings of the Euro-Par 2005, Parallel Processing, 11th International Euro-Par Conference, Lisbon, Portugal, August 30, 2005

2003
A C++ Infrastructure for Automatic Introduction and Translation of OpenMP Directives.
Proceedings of the OpenMP Shared Memory Parallel Programming, 2003

DMPL: An OpenMP DLL Debugging Interface.
Proceedings of the OpenMP Shared Memory Parallel Programming, 2003

Identifying and Exploiting Spatial Regularity in Data Memory References.
Proceedings of the ACM/IEEE SC2003 Conference on High Performance Networking and Computing, 2003

Semantic-Driven Parallelization of Loops Operating on User-Defined Containers.
Proceedings of the Languages and Compilers for Parallel Computing, 2003

METRIC: Tracking Down Inefficiencies in the Memory Hierarchy via Binary Rewriting.
Proceedings of the 1st IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2003), 2003

2002
A Multilevel Approach to Topology-Aware Collective Operations in Computational Grids
CoRR, 2002

2000
Delta coherence protocols.
IEEE Concurr., 2000

Dynamic Software Testing of MPI Applications with Umpire.
Proceedings of the Proceedings Supercomputing 2000, 2000

Exploiting Hierarchy in Parallel Computer Networks to Optimize Collective Operation Performance.
Proceedings of the 14th International Parallel & Distributed Processing Symposium (IPDPS'00), 2000

1999
Benchmarking Pthreads Performance.
Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 1999

Experience with Mixed MPI/Threaded Programming Models.
Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 1999

Accurately Measuring MPI Broadcasts in a Computational Grid.
Proceedings of the Eighth IEEE International Symposium on High Performance Distributed Computing, 1999


  Loading...