Kaushik Kandadi Suresh

CoRR, April, 2026

2025

Characterizing Communication Patterns in Distributed Large Language Model Inference.

[BibT_eX]

[DOI]

Lang Xu

Quentin Anthony

Nawras Alnaasan

Goutham Kalikrishna Reddy Kuncham

Proceedings of the IEEE Symposium on High-Performance Interconnects, 2025

Performance Characterization of Data Transfer and Allocation Strategies on AMD MI300A APUs: Early Experiences.

[BibT_eX]

[DOI]

Siyuan Zhang

Proceedings of the 32nd IEEE International Conference on High Performance Computing, Data and Analytics, HiPC 2025, 2025

Towards Dynamic Message Passing Protocols for Stencil-Based Communication Patterns.

[BibT_eX]

[DOI]

Goutham Kalikrishna Reddy Kuncham

Proceedings of the IEEE International Conference on Cluster Computing, 2025

2024

HINT: Designing Cache-Efficient MPI_Alltoall using Hybrid Memory Copy Ordering and Non-Temporal Instructions.

[BibT_eX]

[DOI]

Nick Contini

Nawras Alnaasan

Mustafa Abduljabbar

Aamir Shafi

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

Using BlueField-3 SmartNICs to Offload Vector Operations in Krylov Subspace Methods.

[BibT_eX]

[DOI]

Proceedings of the 31st IEEE International Conference on High Performance Computing, 2024

Effective and Efficient Offloading Designs for One-Sided Communication to SmartNICs.

[BibT_eX]

[DOI]

Proceedings of the 31st IEEE International Conference on High Performance Computing, 2024

2023

Network-Assisted Noncontiguous Transfers for GPU-Aware MPI Libraries.

[BibT_eX]

[DOI]

Kawthar Shafie Khorassani

IEEE Micro, 2023

DPU-Bench: A Micro-Benchmark Suite to Measure Offload Efficiency Of SmartNICs.

[BibT_eX]

[DOI]

Steve Poole

Proceedings of the Practice and Experience in Advanced Research Computing, 2023

A Novel Framework for Efficient Offloading of Communication Operations to Bluefield SmartNICs.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023

In-Depth Evaluation of a Lower-Level Direct-Verbs API on InfiniBand-based Clusters: Early Experiences.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023

Enabling Reconfigurable HPC through MPI-based Inter-FPGA Communication.

[BibT_eX]

[DOI]

Nicholas Contini

Proceedings of the 37th International Conference on Supercomputing, 2023

Designing In-network Computing Aware Reduction Collectives in MPI.

[BibT_eX]

[DOI]

Goutham Kalikrishna Reddy Kuncham

Proceedings of the IEEE Symposium on High-Performance Interconnects, 2023

Battle of the BlueFields: An In-Depth Comparison of the BlueField-2 and BlueField-3 SmartNICs.

[BibT_eX]

[DOI]

Stephen W. Poole

Proceedings of the IEEE Symposium on High-Performance Interconnects, 2023

2022

Network Assisted Non-Contiguous Transfers for GPU-Aware MPI Libraries.

[BibT_eX]

[DOI]

Kawthar Shafie Khorassani

Proceedings of the IEEE Symposium on High-Performance Interconnects, 2022

Efficient Personalized and Non-Personalized Alltoall Communication for Modern Multi-HCA GPU-Based Clusters.

[BibT_eX]

[DOI]

Akshay Paniraja Guptha

Proceedings of the 29th IEEE International Conference on High Performance Computing, 2022

2021

Layout-aware Hardware-assisted Designs for Derived Data Types in MPI.

[BibT_eX]

[DOI]

Seyedeh Mahdieh Ghazimirsaeed

Chen-Chun Chen

Aamir Shafi

Proceedings of the 28th IEEE International Conference on High Performance Computing, 2021

2020

Communication-Aware Hardware-Assisted MPI Overlap Engine.

[BibT_eX]

[DOI]

Jahanzeb Maqbool Hashmi

Sourav Chakraborty

Seyedeh Mahdieh Ghazimirsaeed

Proceedings of the High Performance Computing - 35th International Conference, 2020

Scalable MPI Collectives using SHARP: Large Scale Performance Evaluation on the TACC Frontera System.

[BibT_eX]

[DOI]

Nick Sarkauskas

Jahanzeb Maqbool Hashmi

Proceedings of the Workshop on Exascale MPI, 2020

Performance Characterization of Network Mechanisms for Non-Contiguous Data Transfers in MPI.

[BibT_eX]

[DOI]

Seyedeh Mahdieh Ghazimirsaeed

Jahanzeb Maqbool Hashmi

Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium Workshops, 2020

2019

Designing a Profiling and Visualization Tool for Scalable and In-depth Analysis of High-Performance GPU Clusters.

[BibT_eX]

[DOI]

Pouya Kousha