Interface for Sparse Linear Algebra Operations.
CoRR, 2024

The GraphBLAS 3.0 Project.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

Advancing Direct Convolution Using Convolution Slicing Optimization and ISA Extensions.
ACM Trans. Archit. Code Optim., December, 2023

Fast matrix multiplication via compiler-only layered data reorganization and intrinsic lowering.
Softw. Pract. Exp., September, 2023

YaConv: Convolution with Low Cache Footprint.
ACM Trans. Archit. Code Optim., March, 2023

C++ and Interoperability Between Libraries: The GraphBLAS C++ Specification.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023

Compiling for the IBM Matrix Engine for Enterprise Workloads.
IEEE Micro, 2022

Exploiting the New Power ISA™ Matrix Math Instructions Through Compiler Built-ins.
Proceedings of the Languages and Compilers for Parallel Computing, 2022

Modeling Matrix Engines for Portability and Performance.
Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium, 2022

GraphBLAS: C++ Iterators for Sparse Matrices.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2022

Dense dynamic blocks: optimizing SpMM for processors with vector and matrix units using machine learning techniques.
Proceedings of the ICS '22: 2022 International Conference on Supercomputing, Virtual Event, June 28, 2022

Return-oriented programming protection in the IBM POWER10.
Proceedings of the CF '22: 19th ACM International Conference on Computing Frontiers, Turin, Italy, May 17, 2022

KernelFaRer: Replacing Native-Code Idioms with High-Performance Library Calls.
ACM Trans. Archit. Code Optim., 2021

IBM's POWER10 Processor.
IEEE Micro, 2021

Encrypted Data Processing.
CoRR, 2021

A matrix math facility for Power ISA(TM) processors.
CoRR, 2021

Introduction to GraphBLAS 2.0.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2021

Considerations for a Distributed GraphBLAS API.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium Workshops, 2020

A Roadmap for the GraphBLAS C++ API.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium Workshops, 2020

LAGraph: A Community Effort to Collect Graph Algorithms Built on Top of the GraphBLAS.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2019

Graph Programming Interface (GPI): A Linear Algebra Programming Model for Large Scale Graph Computations.
Int. J. Parallel Program., 2018

IBM POWER9 processor core.
IBM J. Res. Dev., 2018

IBM POWER9 and cognitive computing.
IBM J. Res. Dev., 2018

Implementing the GraphBLAS C API.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium Workshops, 2018

GraphBLAS: handling performance concerns in large graph analytics.
Proceedings of the 15th ACM International Conference on Computing Frontiers, 2018

Design of the GraphBLAS API for C.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017

GraphBLAS C API: Ideas for future versions of the specification.
Proceedings of the 2017 IEEE High Performance Extreme Computing Conference, 2017

Enabling massive deep neural networks with the GraphBLAS.
Proceedings of the 2017 IEEE High Performance Extreme Computing Conference, 2017

Introduction to the Special Issue on PPoPP'14.
ACM Trans. Parallel Comput., 2016

Workshop on high-performance computational finance.
Concurr. Comput. Pract. Exp., 2016

Speeding Up Stencil Computations with Kernel Convolution.
Proceedings of the 28th International Symposium on Computer Architecture and High Performance Computing, 2016

Efficient implementation of scatter-gather operations for large scale graph analytics.
Proceedings of the 2016 IEEE High Performance Extreme Computing Conference, 2016

Graph programming interface (GPI): a linear algebra programming model for large scale graph computations.
Proceedings of the ACM International Conference on Computing Frontiers, CF'16, 2016

IBM POWER8 processor core microarchitecture.
IBM J. Res. Dev., 2015

Simple, portable and fast SIMD intrinsic programming: generic simd library.
Proceedings of the 2014 Workshop on Programming models for SIMD/Vector processing, 2014

Latest trends in computer architectures and parallel and distributed technologies.
Concurr. Comput. Pract. Exp., 2013

Design and Implementation of a Scalable Membership Service for Supercomputer Resiliency-Aware Runtime.
Proceedings of the Euro-Par 2013 Parallel Processing, 2013

Special Issue for the Workshop on High Performance Computational Finance.
Concurr. Comput. Pract. Exp., 2012

Dynamic method to evaluate code optimization effectiveness.
Proceedings of the Workshop on Software and Compilers for Embedded Systems, 2012

Accelerating business analytics applications.
Proceedings of the 18th IEEE International Symposium on High Performance Computer Architecture, 2012

IBM RS/6000 SP.
Proceedings of the Encyclopedia of Parallel Computing, 2011

IBM Power Architecture.
Proceedings of the Encyclopedia of Parallel Computing, 2011

IBM Blue Gene Supercomputer.
Proceedings of the Encyclopedia of Parallel Computing, 2011

Guest Editors' Introduction.
Int. J. Parallel Program., 2011

Poster: scalable infrastructure to support supercomputer resiliency-aware applications and load balancing.
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, 2011

The Case for Full-Throttle Computing: An Alternative Datacenter Design Strategy.
IEEE Micro, 2010

Scalable data center provisioning and control.
IBM J. Res. Dev., 2009

Kittyhawk: Enabling cooperation and competition in a global, shared computational system.
IBM J. Res. Dev., 2009

Fifth International Workshop on System Management Techniques, Processes, and Services (SMTPS).
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

True value: assessing and optimizing the cost of computing at the data center level.
Proceedings of the 6th Conference on Computing Frontiers, 2009

Multitoroidal Interconnects For Tightly Coupled Supercomputers.
IEEE Trans. Parallel Distributed Syst., 2008

BlueGene/L applications: Parallelism On a Massive Scale.
Int. J. High Perform. Comput. Appl., 2008

Scalable server provisioning with HOP-SCOTCH.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

The Blue Gene/L Supercomputer: A Hardware and Software Story.
Int. J. Parallel Program., 2007

Performance Evaluation of a Commercial Application, Trade, in Scale-out Environments.
Proceedings of the 15th International Symposium on Modeling, 2007

Performance Studies of a WebSphere Application, Trade, in Scale-out and Scale-up Environments.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

Scale-up x Scale-out: A Case Study using Nutch/Lucene.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

Base Operating System Provisioning and Bringup for a Commercial Supercomputer.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

Scalability of the Nutch search engine.
Proceedings of the 21th Annual International Conference on Supercomputing, 2007

Experiences Understanding Performance in a Commercial Scale-Out Environment.
Proceedings of the Euro-Par 2007, 2007

HPC-Colony: services and interfaces for very large systems.
ACM SIGOPS Oper. Syst. Rev., 2006

Exploiting multilevel parallelism using OpenMP on a massive multithreaded architecture.
J. Embed. Comput., 2006

Blue Gene system software - Topology mapping for Blue Gene/L supercomputer.
Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

Blue Gene system software - Designing a highly-scalable operating system: the Blue Gene/L story.
Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

Blue Gene system software - Design and implementation of a one-sided communication interface for the IBM eServer Blue Gene® supercomputer.
Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

Delivering Teraflops: An Account of how Blue Gene was Brought to Life.
Proceedings of the 2006 IEEE John Vincent Atanasoff International Symposium on Modern Computing (JVA2006), 2006

A database-centric approach to system management in the Blue Gene/L supercomputer.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

Achieving Breakthrough Science with the Blue Gene/L Supercomputer.
Proceedings of the Computational Science, 2006

High performance file I/O for the Blue Gene/L supercomputer.
Proceedings of the 12th International Symposium on High-Performance Computer Architecture, 2006

VICTORIA: VMX indirect compute technology oriented towards in-line acceleration.
Proceedings of the Third Conference on Computing Frontiers, 2006

Blue Gene/L programming and operating environment.
IBM J. Res. Dev., 2005

Blue Gene/L performance tools.
IBM J. Res. Dev., 2005

Resource allocation and utilization in the Blue Gene/L supercomputer.
IBM J. Res. Dev., 2005

Design and implementation of message-passing services for the Blue Gene/L supercomputer.
IBM J. Res. Dev., 2005

Open Job Management Architecture for the Blue Gene/L Supercomputer.
Proceedings of the Job Scheduling Strategies for Parallel Processing, 2005

Optimizing NANOS OpenMP for the IBM Cyclops Multithreaded Architecture.
Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

Performance Implications of Periodic Checkpointing on Large-Scale Cluster Systems.
Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

Optimization of MPI collective communication on BlueGene/L systems.
Proceedings of the 19th Annual International Conference on Supercomputing, 2005

The Evolution of the Blue Gene/L Supercomputer.
Proceedings of the Euro-Par 2005, Parallel Processing, 11th International Euro-Par Conference, Lisbon, Portugal, August 30, 2005

Early Experience with Scientific Applications on the Blue Gene/L Supercomputer.
Proceedings of the Euro-Par 2005, Parallel Processing, 11th International Euro-Par Conference, Lisbon, Portugal, August 30, 2005

Probabilistic QoS Guarantees for Supercomputing Systems.
Proceedings of the 2005 International Conference on Dependable Systems and Networks (DSN 2005), 28 June, 2005

Filtering Failure Logs for a BlueGene/L Prototype.
Proceedings of the 2005 International Conference on Dependable Systems and Networks (DSN 2005), 28 June, 2005

Unlocking the Performance of the BlueGene/L Supercomputer.
Proceedings of the ACM/IEEE SC2004 Conference on High Performance Networking and Computing, 2004

Architecture and Performance of the BlueGene/L Message Layer.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2004

The Hierarchically Tiled Arrays programming approach.
Proceedings of the 7th Workshop on languages, 2004

Implementation of Parallel Numerical Algorithms Using Hierarchically Tiled Arrays.
Proceedings of the Languages and Compilers for High Performance Computing, 2004

Multi-toroidal Interconnects: Using Additional Communication Links to Improve Utilization of Parallel Computers.
Proceedings of the Job Scheduling Strategies for Parallel Processing, 2004

The BlueGene/L pseudo cycle-accurate simulator.
Proceedings of the 2004 IEEE International Symposium on Performance Analysis of Systems and Software, 2004

Fault-Aware Job Scheduling for BlueGene/L Systems.
Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), 2004

Adaptive incremental checkpointing for massively parallel systems.
Proceedings of the 18th Annual International Conference on Supercomputing, 2004

What are the future trends in high-performance inter.connects for parallel computers? [Panel 1].
Proceedings of the 12th Annual IEEE Symposium on High Performance Interconnects, 2004

An Integrated Approach to Parallel Scheduling Using Gang-Scheduling, Backfilling, and Migration.
IEEE Trans. Parallel Distributed Syst., 2003

Dissecting Cyclops: a detailed analysis of a multithreaded architecture.
SIGARCH Comput. Archit. News, 2003

An Overview Of The Bluegene/L System Software Organization.
Parallel Process. Lett., 2003

Supporting multidimensional arrays in Java.
Concurr. Comput. Pract. Exp., 2003

Evaluation of OpenMP for the Cyclops Multithreaded Architecture.
Proceedings of the OpenMP Shared Memory Parallel Programming, 2003

Enabling Dual-Core Mode in BlueGene/L: Challenges and Solutions.
Proceedings of the 15th Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2003), 2003

MPI on BlueGene/L: Designing an Efficient General Purpose Messaging Solution for a Large Cellular System.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface,10th European PVM/MPI Users' Group Meeting, Venice, Italy, September 29, 2003

Programming for Locality and Parallelism with Hierarchically Tiled Arrays.
Proceedings of the Languages and Compilers for Parallel Computing, 2003

Critical event prediction for proactive management in large-scale computer clusters.
Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, August 24, 2003

Gang Scheduling Extensions for I/O Intensive Workloads.
Proceedings of the Job Scheduling Strategies for Parallel Processing, 2003

A Volumetric FFT for BlueGene/L.
Proceedings of the High Performance Computing - HiPC 2003, 10th International Conference, 2003

Obtaining Hardware Performance Metrics for the BlueGene/L Supercomputer.
Proceedings of the Euro-Par 2003. Parallel Processing, 2003

An Overview of the Blue Gene/L System Software Organization.
Proceedings of the Euro-Par 2003. Parallel Processing, 2003

NINJA: Java for high performance numerical computing.
Sci. Program., 2002

Semi-hierarchical approach for reliability, availability, and serviceability of cellular systems.
SIGARCH Comput. Archit. News, 2002

Demonstrating the Scalability of a Molecular Dynamics Application on a Petaflops Computer.
Int. J. Parallel Program., 2002

Hypergeometric Functions in Exact Geometric Computation.
Proceedings of the Computability and Complexity in Analysis, 2002

Modeling and analysis of dynamic coscheduling in parallel and distributed environments.
Proceedings of the International Conference on Measurements and Modeling of Computer Systems, 2002

An overview of the BlueGene/L Supercomputer.
Proceedings of the 2002 ACM/IEEE conference on Supercomputing, 2002

Job Scheduling for the BlueGene/L System.
Proceedings of the Job Scheduling Strategies for Parallel Processing, 2002

A C++ Implementation of the Co-Array Programming Model for Blue Gene/L.
Proceedings of the 16th International Parallel and Distributed Processing Symposium (IPDPS 2002), 2002

Evaluation of a Multithreaded Architecture for Cellular Computing.
Proceedings of the Eighth International Symposium on High-Performance Computer Architecture (HPCA'02), 2002

Job Scheduling for the BlueGene/L System (Research Note).
Proceedings of the Euro-Par 2002, 2002

Impact of Workload and System Parameters on Next Generation Cluster Scheduling Mechanisms.
IEEE Trans. Parallel Distributed Syst., 2001

Blue Gene: A vision for protein science using a petaflop supercomputer.
IBM Syst. J., 2001

Java and numerical computing.
Comput. Sci. Eng., 2001

The NINJA project.
Commun. ACM, 2001

A comparison of three approaches to language, compiler, and library support for multidimensional arrays in Java.
Proceedings of the ACM 2001 Java Grande Conference, Stanford University, California, USA, 2001

Demonstrating the scalability of a molecular dynamics application on a Petaflop computer.
Proceedings of the 15th international conference on Supercomputing, 2001

Blue Gene: A Massively Parallel System.
Proceedings of the Computational Science - ICCS 2001, 2001

From flop to megaflops: Java for technical computing.
ACM Trans. Program. Lang. Syst., 2000

Automatic Loop Transformations and Parallelization for Java.
Parallel Process. Lett., 2000

Java programming for high-performance numerical computing.
IBM Syst. J., 2000

JavaGrande - High Performance Computing with Java.
Proceedings of the Applied Parallel Computing, 2000

Design and evaluation of a linear algebra package for Java.
Proceedings of the ACM 2000 Java Grande Conference, San Francisco, CA, USA, 2000

Improving Parallel Job Scheduling by Combining Gang Scheduling and Backfilling Techniques.
Proceedings of the 14th International Parallel & Distributed Processing Symposium (IPDPS'00), 2000

A simulation-based study of scheduling mechanisms for a dynamic cluster environment.
Proceedings of the 14th international conference on Supercomputing, 2000

The Impact of Migration on Parallel Job Scheduling for Distributed Systems.
Proceedings of the Euro-Par 2000, Parallel Processing, 6th International Euro-Par Conference, Munich, Germany, August 29, 2000

High Performance Computing with the Array Package for Java: A Case Study using Data Mining.
Proceedings of the ACM/IEEE Conference on Supercomputing, 1999

An Evaluation of Parallel Job Scheduling for ASCI Blue-Pacific.
Proceedings of the ACM/IEEE Conference on Supercomputing, 1999

Semantic Inlining - the Compiler Support for Java in Technical Computing.
Proceedings of the Ninth SIAM Conference on Parallel Processing for Scientific Computing, 1999

A Standard Java Array Package for Technical Computing.
Proceedings of the Ninth SIAM Conference on Parallel Processing for Scientific Computing, 1999

High Performance Numerical Computing in Java: Language and Compiler Issues.
Proceedings of the Languages and Compilers for Parallel Computing, 1999

Efficient Support for Complex Numbers in Java.
Proceedings of the ACM 1999 Conference on Java Grande, JAVA '99, San Francisco, CA, USA, 1999

Process Tracking for Parallel Job Control.
Proceedings of the Job Scheduling Strategies for Parallel Processing, 1999

A Gang-Scheduling System for ASCI Blue-Pacific.
Proceedings of the High-Performance Computing and Networking, 7th International Conference, 1999

The fused multiply-add instruction leads to algorithms for extended-precision floating point: applications to java and high-performance computing.
Proceedings of the 1999 conference of the Centre for Advanced Studies on Collaborative Research, 1999

Dynamic Data Distribution and Processor Repartitioning for Irregularly Structured Computations.
J. Parallel Distributed Comput., 1998

Optimizing Array Reference Checking in Java Programs.
IBM Syst. J., 1998

An Infrastructure for Efficient Parallel Job Execution in Terascale Computing Environments.
Proceedings of the ACM/IEEE Conference on Supercomputing, 1998

Dynamic resource management on distributed systems using reconfigurable applications.
IBM J. Res. Dev., 1997

A Checkpointing Strategy for Scalable Recovery on Distributed Parallel Systems.
Proceedings of the ACM/IEEE Conference on Supercomputing, 1997

Design and Implementation of Computational Steering for Parallel Scientific Applications.
Proceedings of the Eighth SIAM Conference on Parallel Processing for Scientific Computing, 1997

Run-time Support for Dynamic Processor Allocation in HPF Programs.
Proceedings of the Eighth SIAM Conference on Parallel Processing for Scientific Computing, 1997

A Programming Environment for Dynamic Resource Allocation and Data Distribution.
Proceedings of the Languages and Compilers for Parallel Computing, 1996

Supporting Dynamic Data and Processor Repartitioning for Irregular Applications.
Proceedings of the Parallel Algorithms for Irregularly Structured Problems, 1996

Application-Assisted Dynamic Scheduling on Large-Scal Multi-Computer Systems.
Proceedings of the Euro-Par '96 Parallel Processing, 1996

The Performance Impact of Granularity Control and Functional Parallelism.
Proceedings of the Languages and Compilers for Parallel Computing, 1995

Autoscheduling in a Distributed Shared-Memory Environment.
Proceedings of the Languages and Compilers for Parallel Computing, 1994
