Nathan R. Tallent

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2025

Exploration of LLM Lossless Compression on Scientific Data.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Parallel and Distributed Processing Symposium, 2025

BMQSim: Overcoming Memory Constraints in Quantum Circuit Simulation with a High-Fidelity Compression Framework.

[BibT_eX]

[DOI]

Proceedings of the 39th ACM International Conference on Supercomputing, 2025

ProHD: Projection-Based Hausdorff Distance Approximation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Data Mining, 2025

PowerTrip: Exploiting Federated Heterogeneous Datacenter Power for Distributed ML Training.

[BibT_eX]

[DOI]

Proceedings of the 2025 ACM Symposium on Cloud Computing, 2025

2024

Final Report for CHESS: Cloud, High-Performance Computing, and Edge for Science and Security.

[BibT_eX]

[DOI]

CoRR, 2024

Workflows Community Summit 2024: Future Trends and Challenges in Scientific Workflows.

[BibT_eX]

[DOI]

Rafael Ferreira da Silva

Anderson Andrei Da Silva

Rolando P. Hong Enriquez

Liliane N. O. Kunstmann

Bruno de Paula Kinoshita

CoRR, 2024

Overcoming Memory Constraints in Quantum Circuit Simulation with a High-Fidelity Compression Framework.

[BibT_eX]

[DOI]

CoRR, 2024

OPDR: Order-Preserving Dimension Reduction for Semantic Embedding of Multimodal Scientific Data.

[BibT_eX]

[DOI]

CoRR, 2024

SAM-I-Am: Semantic Boosting for Zero-shot Atomic-Scale Electron Micrograph Segmentation.

[BibT_eX]

[DOI]

CoRR, 2024

Shifting Between Compute and Memory Bounds: A Compression-Enabled Roofline Model.

[BibT_eX]

[DOI]

Ramasoumya Naraparaju

Proceedings of the SC24-W: Workshops of the International Conference for High Performance Computing, 2024

Understanding and Predicting Cross-Application I/O Interference in HPC Storage Systems.

[BibT_eX]

[DOI]

Proceedings of the SC24-W: Workshops of the International Conference for High Performance Computing, 2024

MemFriend: Understanding Memory Performance with Spatial-Temporal Affinity.

[BibT_eX]

[DOI]

Proceedings of the International Symposium on Memory Systems, 2024

Automatic Extraction of Network Configurations for Realistic Simulation and Validation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2024

Performance Analysis of Data Processing in Distributed File Systems with Near Data Processing.

[BibT_eX]

[DOI]

Proceedings of the International Symposium on Networks, Computers and Communications, 2024

Graph Analytics on Jellyfish topology.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

HPC Network Simulation Tuning via Automatic Extraction of Hardware Parameters.

[BibT_eX]

[DOI]

Proceedings of the IEEE High Performance Extreme Computing Conference, 2024

DaYu: Optimizing Distributed Scientific Workflows by Decoding Dataflow Semantics and Dynamics.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Cluster Computing, 2024

MassiveGNN: Efficient Training via Prefetching for Massively Connected Distributed Graphs.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Cluster Computing, 2024

Improving I/O-aware Workflow Scheduling via Data Flow Characterization and trade-off Analysis.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Big Data, 2024

Identifying Outliers in AI-based Image Compression.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Big Data, 2024

2023

Accelerating matrix-centric graph processing on GPUs through bit-level optimizations.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., July, 2023

Data Flow Lifecycles for Optimizing Workflow Coordination.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2023

2022

Characterizing Performance of Graph Neighborhood Communication Patterns.

[BibT_eX]

[DOI]

Sayan Ghosh

IEEE Trans. Parallel Distributed Syst., 2022

QuaL<sup>2</sup> M: Learning Quantitative Performance of Latency-Sensitive Code.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2022

Bit-GraphBLAS: Bit-Level Optimizations of Matrix-Centric Graph Processing on GPU.

[BibT_eX]

[DOI]

Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium, 2022

ReWorDs 2022 Keynote: Towards Orchestrating Distributed & Data-Intensive Workflows.

[BibT_eX]

[DOI]

Proceedings of the 18th IEEE International Conference on e-Science, 2022

MemGaze: Rapid and Effective Load-Level Memory Trace Analysis.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Cluster Computing, 2022

2021

EXAGRAPH: Graph and combinatorial methods for enabling exascale applications.

[BibT_eX]

[DOI]

Sivasankaran Rajamanickam

Oguz Selvitopi

Antonino Tumeo

Int. J. High Perform. Comput. Appl., 2021

Single-node partitioned-memory for huge graph analytics: cost and performance trade-offs.

[BibT_eX]

[DOI]

Sayan Ghosh

Marco Minutoli

Ramesh Peri

Ananth Kalyanaraman

Proceedings of the International Conference for High Performance Computing, 2021

Diolkos: improving ethernet throughput through dynamic port selection.

[BibT_eX]

[DOI]

Proceedings of the CF '21: Computing Frontiers Conference, 2021

WinnowML: Stable feature selection for maximizing prediction accuracy of time-based system modeling.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE International Conference on Big Data (Big Data), 2021

2020

Evaluating Modern GPU Interconnect: PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2020

Scaling Deep Learning workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing.

[BibT_eX]

[DOI]

Future Gener. Comput. Syst., 2020

Rapid Memory Footprint Access Diagnostics.

[BibT_eX]

[DOI]

Ozgur O. Kilic

Ryan D. Friese

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2020

Geomancy: Automated Performance Enhancement through Data Layout Optimization.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2020

Vertex Reordering for Real-World Graphs and Applications: An Empirical Evaluation.

[BibT_eX]

[DOI]

Reet Barik

Marco Minutoli

Ananth Kalyanaraman

Proceedings of the IEEE International Symposium on Workload Characterization, 2020

Effectively Using Remote I/O For Work Composition in Distributed Workflows.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Big Data (IEEE BigData 2020), 2020

2019

Rapidly Measuring Loop Footprints.

[BibT_eX]

[DOI]

Ozgur O. Kilic

Ryan D. Friese

Proceedings of the 2019 IEEE International Conference on Cluster Computing, 2019

TAZeR: Hiding the Cost of Remote I/O in Distributed Scientific Workflows.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE International Conference on Big Data (IEEE BigData), 2019

2018

Stochastic Programming Approach for Resource Selection Under Demand Uncertainty.

[BibT_eX]

[DOI]

Tanveer Hossain Bhuiyan

Proceedings of the Job Scheduling Strategies for Parallel Processing, 2018

Tartan: Evaluating Modern GPU Interconnect via a Multi-GPU Benchmark Suite.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Symposium on Workload Characterization, 2018

Optimizing Distributed Data-Intensive Workflows.

[BibT_eX]

[DOI]

Ryan D. Friese

Malachi Schram

Kevin J. Barker

Proceedings of the IEEE International Conference on Cluster Computing, 2018

Deep Learning for Enhancing Fault Tolerant Capabilities of Scientific Workflows.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Big Data (IEEE BigData 2018), 2018

2017

Representative paths analysis.

[BibT_eX]

[DOI]

Darren J. Kerbyson

Adolfy Hoisie

Proceedings of the International Conference for High Performance Computing, 2017

Evaluating On-Node GPU Interconnects for Deep Learning Workloads.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing Systems. Performance Modeling, Benchmarking, and Simulation, 2017

Scaling Deep Learning Workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017

Generating Performance Models for Irregular Applications.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

2016

Assessing Advanced Technology in CENATE.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Networking, 2016

Modeling the Impact of Silicon Photonics on Graph Analytics.

[BibT_eX]

[DOI]

Daniel G. Chavarría-Miranda

Kevin J. Barker

Antonino Tumeo

Andrés Márquez

Darren J. Kerbyson

Adolfy Hoisie

Proceedings of the IEEE International Conference on Networking, 2016

Fault Modeling of Extreme Scale Applications Using Machine Learning.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

Algorithm and Architecture Independent Benchmarking with SEAK.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

2015

A case for application-oblivious energy-efficient MPI runtime.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2015

Towards efficient scheduling of data intensive high energy physics workflows.

[BibT_eX]

[DOI]

Proceedings of the 10th Workshop on Workflows in Support of Large-Scale Science, 2015

Diagnosing the causes and severity of one-sided message contention.

[BibT_eX]

[DOI]

Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2015

Power and performance trade-offs for Space Time Adaptive Processing.

[BibT_eX]

[DOI]

Proceedings of the 26th IEEE International Conference on Application-specific Systems, 2015

2014

Palm: easing the burden of analytical performance modeling.

[BibT_eX]

[DOI]

Adolfy Hoisie

Proceedings of the 2014 International Conference on Supercomputing, 2014

2011

Using Sampling to Understand Parallel Program Performance.

[BibT_eX]

[DOI]

Proceedings of the Tools for High Performance Computing 2011, 2011

Scalable fine-grained call path tracing.

[BibT_eX]

[DOI]

Michael Franco

Reed Landrum

Proceedings of the 25th International Conference on Supercomputing, 2011, Tucson, AZ, USA, May 31, 2011

2010

HPCTOOLKIT: tools for performance analysis of optimized parallel programs.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2010

Scalable Identification of Load Imbalance in Parallel Executions Using Call Path Profiles.

[BibT_eX]

[DOI]

Proceedings of the Conference on High Performance Computing Networking, 2010

Analyzing lock contention in multithreaded applications.

[BibT_eX]

[DOI]

Allan Porterfield

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2010

Effectively Presenting Call Path Profiles of Application Performance.

[BibT_eX]

[DOI]

Proceedings of the 39th International Conference on Parallel Processing, 2010

2009

Identifying Performance Bottlenecks in Work-Stealing Computations.

[BibT_eX]

[DOI]

Computer, 2009

Diagnosing performance bottlenecks in emerging petascale applications.

[BibT_eX]

[DOI]

Michael W. Fagan

Mark Krentel

Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009

Effective performance measurement and analysis of multithreaded applications.

[BibT_eX]

[DOI]

Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2009

Binary analysis for measurement and attribution of program performance.

[BibT_eX]

[DOI]

Michael W. Fagan

Proceedings of the 2009 ACM SIGPLAN Conference on Programming Language Design and Implementation, 2009

2008

OpenAD/F: A Modular Open-Source Tool for Automatic Differentiation of Fortran Codes.

[BibT_eX]

[DOI]

Michelle Mills Strout

Patrick Heimbach

Chris Hill

Carl Wunsch

ACM Trans. Math. Softw., 2008

2002

HPCVIEW: A Tool for Top-down Analysis of Node Performance.

[BibT_eX]

[DOI]

Robert J. Fowler

Gabriel Marin