Scalable training of trustworthy and energy-efficient predictive graph foundation models for atomistic materials modeling: a case study with HydraGNN.

[BibT_eX]

[DOI]

Massimiliano Lupo Pasini

Jong Youl Choi

J. Supercomput., March, 2025

ORBIT-2: Scaling Exascale Vision Foundation Models for Weather and Climate Downscaling.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2025

2024

Scalable Training of Graph Foundation Models for Atomistic Materials Modeling: A Case Study with HydraGNN.

[BibT_eX]

[DOI]

Massimiliano Lupo Pasini

CoRR, 2024

ORBIT: Oak Ridge Base Foundation Model for Earth System Predictability.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2024

2023

DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies.

[BibT_eX]

[DOI]

Cindy Orozco Bohorquez

Massimiliano Lupo Pasini

CoRR, 2023

A Research Retrospective on AMD's Exascale Computing Journey.

[BibT_eX]

[DOI]

Gabriel H. Loh

Michael J. Schulte

Mike Ignatowski

Vignesh Adhinarayanan

Kishore Punniyamurthy

Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

2019

Optimizing Hyperplane Sweep Operations Using Asynchronous Multi-grain GPU Tasks.

[BibT_eX]

[DOI]

Anirudh Mohan Kaushik

Ashwin M. Aji

Muhammad Amber Hassaan

Proceedings of the IEEE International Symposium on Workload Characterization, 2019

Adaptive Task Aggregation for High-Performance Sparse Solvers on GPUs.

[BibT_eX]

[DOI]

Proceedings of the 28th International Conference on Parallel Architectures and Compilation Techniques, 2019

2018

Investigating Data Layout Transformations in Chapel.

[BibT_eX]

[DOI]

Apan Qasem

Ashwin M. Aji

Michael L. Chu

Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium Workshops, 2018

Taming irregular applications via advanced dynamic parallelism on GPUs.

[BibT_eX]

[DOI]

Proceedings of the 15th ACM International Conference on Computing Frontiers, 2018

2017

Characterizing data organization effects on heterogeneous memory architectures.

[BibT_eX]

[DOI]

Apan Qasem

Ashwin M. Aji

Gregory Rodgers

Proceedings of the 2017 International Symposium on Code Generation and Optimization, 2017

2016

MPI-ACC: Accelerator-Aware MPI for Scientific Applications.

[BibT_eX]

[DOI]

John M. Mellor-Crummey

Xiaosong Ma

Rajeev Thakur

IEEE Trans. Parallel Distributed Syst., 2016

MultiCL: Enabling automatic scheduling for task-parallel workloads in OpenCL.

[BibT_eX]

[DOI]

Parallel Comput., 2016

Implementing directed acyclic graphs with the heterogeneous system architecture.

[BibT_eX]

[DOI]

Proceedings of the 9th Annual Workshop on General Purpose Processing using Graphics Processing Unit, 2016

2015

Programming High-Performance Clusters with Heterogeneous Computing Devices.

[BibT_eX]

[DOI]

Ashwin M. Aji

PhD thesis, 2015

Automatic Command Queue Scheduling for Task-Parallel Workloads in OpenCL.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

2013

Synchronization and Ordering Semantics in Hybrid MPI+GPU Programming.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013

Online Performance Projection for Clusters with Heterogeneous GPUs.

[BibT_eX]

[DOI]

Proceedings of the 19th IEEE International Conference on Parallel and Distributed Systems, 2013

pVOCL: Power-Aware Dynamic Placement and Migration in Virtualized GPU Environments.

[BibT_eX]

[DOI]

Proceedings of the IEEE 33rd International Conference on Distributed Computing Systems, 2013

On the efficacy of GPU-integrated MPI for scientific applications.

[BibT_eX]

[DOI]

John M. Mellor-Crummey

Xiaosong Ma

Rajeev Thakur

Proceedings of the 22nd International Symposium on High-Performance Parallel and Distributed Computing, 2013

Contagion Diffusion with EpiSimdemics.

[BibT_eX]

Proceedings of the Parallel Science and Engineering Applications - The Charm++ Approach., 2013

2012

Efficient Intranode Communication in GPU-Accelerated Systems.

[BibT_eX]

[DOI]

Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

Simulating the Spread of Infectious Disease over Large Realistic Social Networks Using Charm++.

[BibT_eX]

[DOI]

Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

DMA-Assisted, Intranode Communication in GPU Accelerated Systems.

[BibT_eX]

[DOI]

Proceedings of the 14th IEEE International Conference on High Performance Computing and Communication & 9th IEEE International Conference on Embedded Software and Systems, 2012

MPI-ACC: An Integrated and Extensible Approach to Data Movement in Accelerator-based Systems.

[BibT_eX]

[DOI]

Proceedings of the 14th IEEE International Conference on High Performance Computing and Communication & 9th IEEE International Conference on Embedded Software and Systems, 2012

2011

Poster: large-scale computational epidemiology modeling using charm++.

[BibT_eX]

[DOI]

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, 2011

High-performance biocomputing for simulating the spread of contagion over large contact networks.

[BibT_eX]

[DOI]