Aamir Shafi

Manjunath Gorentla Venkata

Proceedings of the 2025 IEEE International Parallel and Distributed Processing Symposium, 2025

2024

Training Ultra Long Context Language Model with Fully Pipelined Distributed Transformer.

[BibT_eX]

[DOI]

CoRR, 2024

Accelerating communication with multi-HCA aware collectives in MPI.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2024

Infer-HiRes: Accelerating Inference for High-Resolution Images with Quantization and Distributed Deep Learning.

[BibT_eX]

[DOI]

Proceedings of the Practice and Experience in Advanced Research Computing 2024: Human Powered Computing, 2024

Accelerating MPI AllReduce Communication with Efficient GPU-Based Compression Schemes on Modern GPU Clusters.

[BibT_eX]

[DOI]

Proceedings of the ISC High Performance 2024 Research Paper Proceedings (39th International Conference), 2024

Exploiting Inter-Layer Expert Affinity for Accelerating Mixture-of-Experts Model Inference.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

PML-MPI: A Pre-Trained ML Framework for Efficient Collective Algorithm Selection in MPI.

[BibT_eX]

[DOI]

Mingzhe Han

Goutham Kalikrishna Reddy Kuncham

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

HINT: Designing Cache-Efficient MPI_Alltoall using Hybrid Memory Copy Ordering and Non-Temporal Instructions.

[BibT_eX]

[DOI]

Nick Contini

Nawras Alnaasan

Mustafa Abduljabbar

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

The Case for Co-Designing Model Architectures with Hardware.

[BibT_eX]

[DOI]

Proceedings of the 53rd International Conference on Parallel Processing, 2024

Demystifying the Communication Characteristics for Distributed Transformer Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE Symposium on High-Performance Interconnects, 2024

Characterizing Communication in Distributed Parameter-Efficient Fine-Tuning for Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE Symposium on High-Performance Interconnects, 2024

Scaling Large Language Model Training on Frontier with Low-Bandwidth Partitioning.

[BibT_eX]

[DOI]

Proceedings of the 31st IEEE International Conference on High Performance Computing, 2024

Using BlueField-3 SmartNICs to Offload Vector Operations in Krylov Subspace Methods.

[BibT_eX]

[DOI]

Proceedings of the 31st IEEE International Conference on High Performance Computing, 2024

HyperSack: Distributed Hyperparameter Optimization for Deep Learning using Resource-Aware Scheduling on Heterogeneous GPU Systems.

[BibT_eX]

[DOI]

Proceedings of the 31st IEEE International Conference on High Performance Computing, 2024

Accelerating Large Language Model Training with Hybrid GPU-based Compression.

[BibT_eX]

[DOI]

Proceedings of the 24th IEEE International Symposium on Cluster, 2024

2023

High Performance MPI over the Slingshot Interconnect.

[BibT_eX]

[DOI]

J. Comput. Sci. Technol., February, 2023

Network-Assisted Noncontiguous Transfers for GPU-Aware MPI Libraries.

[BibT_eX]

[DOI]

IEEE Micro, 2023

Performance Characterization of using Quantization for DNN Inference on Edge Devices: Extended Version.

[BibT_eX]

[DOI]

CoRR, 2023

Accelerating Distributed Deep Learning Training with Compression Assisted Allgather and Reduce-Scatter Communication.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023

A Novel Framework for Efficient Offloading of Communication Operations to Bluefield SmartNICs.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023

In-Depth Evaluation of a Lower-Level Direct-Verbs API on InfiniBand-based Clusters: Early Experiences.

[BibT_eX]

[DOI]

Benjamin Michalowicz

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023

MCR-DL: Mix-and-Match Communication Runtime for Deep Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023

Performance Characterization of Using Quantization for DNN Inference on Edge Devices.

[BibT_eX]

[DOI]

Proceedings of the 7th IEEE International Conference on Fog and Edge Computing, 2023

Designing In-network Computing Aware Reduction Collectives in MPI.

[BibT_eX]

[DOI]

Goutham Kalikrishna Reddy Kuncham

Proceedings of the IEEE Symposium on High-Performance Interconnects, 2023

Flover: A Temporal Fusion Framework for Efficient Autoregressive Model Parallel Inference.

[BibT_eX]

[DOI]

Proceedings of the 30th IEEE International Conference on High Performance Computing, 2023

Implementing and Optimizing a GPU-aware MPI Library for Intel GPUs: Early Experiences.

[BibT_eX]

[DOI]

Chen-Chun Chen

Goutham Kalikrishna Reddy Kuncham

Proceedings of the 23rd IEEE/ACM International Symposium on Cluster, 2023

ScaMP: Scalable Meta-Parallelism for Deep Learning Search.

[BibT_eX]

[DOI]

Proceedings of the 23rd IEEE/ACM International Symposium on Cluster, 2023

HARVEST: High-Performance Artificial Vision Framework for Expert Labeling using Semi-Supervised Training.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Big Data, 2023

MPI4Spark Meets YARN: Enhancing MPI4Spark through YARN support for HPC.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Big Data, 2023

2022

Optimizing Distributed DNN Training Using CPUs and BlueField-2 DPUs.

[BibT_eX]

[DOI]

IEEE Micro, 2022

High Performance MPI over the Slingshot Interconnect: Early Experiences.

[BibT_eX]

[DOI]

Proceedings of the PEARC '22: Practice and Experience in Advanced Research Computing, Boston, MA, USA, July 10, 2022

Accelerating MPI All-to-All Communication with Online Compression on Modern GPU Clusters.

[BibT_eX]

[DOI]

Qinghua Zhou

Pouya Kousha

Quentin Anthony

Proceedings of the High Performance Computing - 37th International Conference, 2022

"Hey CAI" - Conversational AI Enabled User Interface for HPC Tools.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing - 37th International Conference, 2022

Hy-Fi: Hybrid Five-Dimensional Parallel DNN Training on High-Performance GPU Clusters.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing - 37th International Conference, 2022

Arm meets Cloud: A Case Study of MPI Library Performance on AWS Arm-based HPC Cloud with Elastic Fabric Adapter.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2022

Highly Efficient Alltoall and Alltoallv Communication Algorithms for GPU Systems.

[BibT_eX]

[DOI]

Chen-Chun Chen

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2022

OMB-Py: Python Micro-Benchmarks for Evaluating Performance of MPI Libraries on HPC Systems.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2022

Towards Java-based HPC using the MVAPICH2 Library: Early Experiences.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2022

Designing Hierarchical Multi-HCA Aware Allgather in MPI.

[BibT_eX]

[DOI]

Proceedings of the Workshop Proceedings of the 51st International Conference on Parallel Processing, 2022

Network Assisted Non-Contiguous Transfers for GPU-Aware MPI Libraries.

[BibT_eX]

[DOI]

Proceedings of the IEEE Symposium on High-Performance Interconnects, 2022

Accelerating Broadcast Communication with GPU Compression for Deep Learning Workloads.

[BibT_eX]

[DOI]

Proceedings of the 29th IEEE International Conference on High Performance Computing, 2022

Efficient Personalized and Non-Personalized Alltoall Communication for Modern Multi-HCA GPU-Based Clusters.

[BibT_eX]

[DOI]

Kamal Raj Sankarapandian Dayala Ganesh Ram

Akshay Paniraja Guptha

Proceedings of the 29th IEEE International Conference on High Performance Computing, 2022

Designing Efficient Pipelined Communication Schemes using Compression in MPI Libraries.

[BibT_eX]

[DOI]

Proceedings of the 29th IEEE International Conference on High Performance Computing, 2022

AccDP: Accelerated Data-Parallel Distributed DNN Training for Modern GPU-Based HPC Clusters.

[BibT_eX]

[DOI]

Proceedings of the 29th IEEE International Conference on High Performance Computing, 2022

Lightning Talks of EduHPC 2022.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM International Workshop on Education for High Performance Computing, 2022

Spark Meets MPI: Towards High-Performance Communication Framework for Spark using MPI.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Cluster Computing, 2022

2021

INAM: Cross-stack Profiling and Analysis of Communication in MPI-based Applications.

[BibT_eX]

[DOI]

Pouya Kousha

Proceedings of the PEARC '21: Practice and Experience in Advanced Research Computing, 2021

Accelerating CPU-based Distributed DNN Training on Modern HPC Clusters using BlueField-2 DPUs.

[BibT_eX]

[DOI]

Proceedings of the IEEE Symposium on High-Performance Interconnects, 2021

Layout-aware Hardware-assisted Designs for Derived Data Types in MPI.

[BibT_eX]

[DOI]

Seyedeh Mahdieh Ghazimirsaeed

Chen-Chun Chen

Mohammadreza Bayatpour

Proceedings of the 28th IEEE International Conference on High Performance Computing, 2021

Towards Architecture-aware Hierarchical Communication Trees on Modern HPC Systems.

[BibT_eX]

[DOI]

Jahanzeb Maqbool Hashmi

Shulei Xu

Seyedeh Mahdieh Ghazimirsaeed

Mohammadreza Bayatpour

Proceedings of the 28th IEEE International Conference on High Performance Computing, 2021

Efficient MPI-based Communication for GPU-Accelerated Dask Applications.

[BibT_eX]

[DOI]

Jahanzeb Maqbool Hashmi

Seyedeh Mahdieh Ghazimirsaeed

Proceedings of the 21st IEEE/ACM International Symposium on Cluster, 2021

2020

Accelerating GPU-based Machine Learning in Python using MPI Library: A Case Study with MVAPICH2-GDR.

[BibT_eX]

[DOI]

Quentin Anthony

Proceedings of the 6th IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments, 2020

Blink: Towards Efficient RDMA-based Communication Coroutines for Parallel Python Applications.

[BibT_eX]

[DOI]

Jahanzeb Maqbool Hashmi

Mohammed Abdulrahman Alqahtani

Proceedings of the 27th IEEE International Conference on High Performance Computing, 2020

2019

Student Outcomes Assessment Methodology for ABET Accreditation: A Case Study of Computer Science and Computer Information Systems Programs.

[BibT_eX]

[DOI]

IEEE Access, 2019

2018

Parameter estimation of qualitative biological regulatory networks on high performance computing hardware.

[BibT_eX]

[DOI]

BMC Syst. Biol., 2018

Performance Comparison of a Parallel Recommender Algorithm Across Three Hadoop-Based Frameworks.

[BibT_eX]

[DOI]

Proceedings of the 30th International Symposium on Computer Architecture and High Performance Computing, 2018

2016

An efficient schedulability condition for non-preemptive real-time systems at common scheduling points.

[BibT_eX]

[DOI]

J. Supercomput., 2016

Towards Scalable Java HPC with Hybrid and Native Communication Devices in MPJ Express.

[BibT_eX]

[DOI]

Int. J. Parallel Program., 2016

2015

Virtual TCAM for Data Center switches.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Network Function Virtualization and Software Defined Networks, 2015

MPJ Express Meets YARN: Towards Java HPC on Hadoop Systems.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Computational Science, 2015

2014

Design and Implementation of Parallel Debugger and Profiler for MPJ Express.

[BibT_eX]

[DOI]

Aleem Akhtar

Mohsan Jameel

CoRR, 2014

Teaching parallel programming using Java.

[BibT_eX]

[DOI]

Proceedings of the Workshop on Education for High-Performance Computing, 2014

Design and Implementation of Hybrid and Native Communication Devices for Java HPC.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Computational Science, 2014

High Performance Message-passing InfiniBand Communication Device for Java HPC.

[BibT_eX]

[DOI]

Omar Khan

Mohsan Jameel

Proceedings of the International Conference on Computational Science, 2014

2013

An architectural evaluation of SDN controllers.

[BibT_eX]

[DOI]

Proceedings of IEEE International Conference on Communications, 2013

An MPI-IO Compliant Java Based Parallel I/O Library.

[BibT_eX]

[DOI]

Proceedings of the 13th IEEE/ACM International Symposium on Cluster, 2013

2012

Memory-mapping support for reducer hyperobjects.

[BibT_eX]

[DOI]

I-Ting Angelina Lee

Charles E. Leiserson

Proceedings of the 24th ACM Symposium on Parallelism in Algorithms and Architectures, 2012

Towards Efficient Support for Parallel I/O in Java HPC.

[BibT_eX]

[DOI]

Proceedings of the 13th International Conference on Parallel and Distributed Computing, 2012

High performance Java sockets (HPJS) for scientific health clouds.

[BibT_eX]

[DOI]

Proceedings of the IEEE 14th International Conference on e-Health Networking, 2012

2011

Collective Asynchronous Remote Invocation (CARI): A High-Level and Effcient Communication API for Irregular Applications.

[BibT_eX]

[DOI]

Wakeel Ahmad

Proceedings of the International Conference on Computational Science, 2011

Device level communication libraries for high-performance computing in Java.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2011

2010

Multicore-enabling the MPJ express messaging library.

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Principles and Practice of Programming in Java, 2010

2009

Nested parallelism for multi-core HPC systems using Java.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 2009

A comparative study of Java and C performance in two large-scale parallel applications.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2009

Towards efficient shared memory communications in MPJ express.

[BibT_eX]

[DOI]

Jawad Manzoor

Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

2008

A parallel implementation of the Finite-Domain Time-Difference algorithm using MPJ express.

[BibT_eX]

[DOI]

Aftab Hussain

Jamil Raza

Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

2007

A Buffering Layer to Support Derived Types and Proprietary Networks for Java HPC.

[BibT_eX]

[DOI]

Scalable Comput. Pract. Exp., 2007

2006

Nested parallelism for multi-core systems using Java.

[BibT_eX]

[DOI]

PhD thesis, 2006

MPJ Express Meets Gadget: Towards a Java Code for Cosmological Simulations.

[BibT_eX]

[DOI]

Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2006

Parallel and Distributed Computing with Java.

[BibT_eX]

[DOI]

Mark A. Baker

Matthew Grove

Proceedings of the 5th International Symposium on Parallel and Distributed Computing (ISPDC 2006), 2006

An Approach to Buffer Management in Java HPC Messaging.

[BibT_eX]

[DOI]

Proceedings of the Computational Science, 2006

MPJ Express: Towards Thread Safe Java HPC.

[BibT_eX]

[DOI]