Srinivas Sridharan

Orcid: 0009-0001-0651-370X

Affiliations:

Intel Corporation, Hillsboro, OR, USA
Intel Corporation, Bangalore, India
University of Notre Dame, IN, USA

According to our database¹, Srinivas Sridharan authored at least 37 papers between 2007 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Bibliography

2026

MLCommons Chakra: Advancing Performance Benchmarking and Co-design using Standardized Execution Traces.

[BibT_eX]

[DOI]

CoRR, May, 2026

Flint: Compiler Enabled Cluster-Free Design Space Exploration for Distributed ML.

[BibT_eX]

[DOI]

CoRR, April, 2026

Maya: Optimizing Deep Learning Training Workloads using GPU Runtime Emulation.

[BibT_eX]

[DOI]

Proceedings of the 21st European Conference on Computer Systems, 2026

2025

STAGE: A Symbolic Tensor grAph GEnerator for distributed AI system co-design.

[BibT_eX]

[DOI]

CoRR, November, 2025

COSMIC: Enabling Full-Stack Co-Design and Optimization of Distributed Machine Learning Systems.

[BibT_eX]

[DOI]

CoRR, May, 2025

Toward a Standardized Representation for Deep Learning Collective Algorithms.

[BibT_eX]

[DOI]

IEEE Micro, 2025

2024

LayerDAG: A Layerwise Autoregressive Diffusion Model for Directed Acyclic Graph Generation.

[BibT_eX]

[DOI]

CoRR, 2024

Towards a Standardized Representation for Deep Learning Collective Algorithms.

[BibT_eX]

[DOI]

Proceedings of the IEEE Symposium on High-Performance Interconnects, 2024

2023

Chakra: Advancing Performance Benchmarking and Co-design using Standardized Execution Traces.

[BibT_eX]

[DOI]

CoRR, 2023

Mystique: Accurate and Scalable Production AI Benchmarks Generation.

[BibT_eX]

[DOI]

CoRR, 2023

Better Together: Jointly Optimizing ML Collective Scheduling and Execution Planning using SYNDICATE.

[BibT_eX]

[DOI]

Proceedings of the 20th USENIX Symposium on Networked Systems Design and Implementation, 2023

ASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems for Large-model Training at Scale.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2023

Mystique: Enabling Accurate and Scalable Generation of Production AI Benchmarks.

[BibT_eX]

[DOI]

Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

2022

Themis: a network bandwidth-aware collective scheduling policy for distributed training of DL models.

[BibT_eX]

[DOI]

Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022

Software-hardware co-design for fast and scalable training of deep learning recommendation models.

[BibT_eX]

[DOI]

Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022

Impact of RoCE Congestion Control Policies on Distributed Training of DNNs.

[BibT_eX]

[DOI]

Proceedings of the IEEE Symposium on High-Performance Interconnects, 2022

2021

High-performance, Distributed Training of Large-scale Deep Learning Recommendation Models.

[BibT_eX]

[DOI]

CoRR, 2021

Enabling Compute-Communication Overlap in Distributed Deep Learning Training Platforms.

[BibT_eX]

[DOI]

Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021

2020

Efficient Communication Acceleration for Next-Gen Scale-up Deep Learning Training Platforms.

[BibT_eX]

[DOI]

CoRR, 2020

Deep Learning Training in Facebook Data Centers: Design of Scale-up and Scale-out Systems.

[BibT_eX]

[DOI]

CoRR, 2020

ASTRA-SIM: Enabling SW/HW Co-Design Exploration for Distributed DL Training Platforms.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2020

Scalable Distributed Training of Recommendation Models: An ASTRA-SIM + NS3 case-study with TCP/IP transport.

[BibT_eX]

[DOI]

Proceedings of the IEEE Symposium on High-Performance Interconnects, 2020

2019

Planning for performance: Enhancing achievable performance for MPI through persistent collective operations.

[BibT_eX]

[DOI]

Daniel J. Holmes

Bradley Morgan

Anthony Skjellum

Purushotham V. Bangalore

Srinivas Sridharan

Parallel Comput., 2019

Automatic Model Parallelism for Deep Neural Networks with Compiler and Hardware Support.

[BibT_eX]

[DOI]

Sanket Tavarageri

Srinivas Sridharan

Bharat Kaul

CoRR, 2019

TensorFlow at Scale: Performance and productivity analysis of distributed training with Horovod, MLSL, and Cray PE ML.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2019

Training Google Neural Machine Translation on an Intel CPU Cluster.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE International Conference on Cluster Computing, 2019

2018

On Scale-out Deep Learning Training for Cloud and HPC.

[BibT_eX]

[DOI]

Srinivas Sridharan

Karthikeyan Vaidyanathan

CoRR, 2018

Mixed Precision Training of Convolutional Neural Networks using Integer Operations.

[BibT_eX]

[DOI]

Proceedings of the 6th International Conference on Learning Representations, 2018

2017

Deep learning at 15PF: supervised and semi-supervised classification for scientific data.

[BibT_eX]

[DOI]

Md. Mostofa Ali Patwary

Proceedings of the International Conference for High Performance Computing, 2017

Planning for performance: persistent collective operations for MPI.

[BibT_eX]

[DOI]

Bradley Morgan

Daniel J. Holmes

Anthony Skjellum

Purushotham V. Bangalore

Srinivas Sridharan

Proceedings of the 24th European MPI Users' Group Meeting, 2017

2016

Distributed Deep Learning Using Synchronous Stochastic Gradient Descent.

[BibT_eX]

[DOI]

Dipankar Das

Sasikanth Avancha

Dheevatsa Mudigere

Karthikeyan Vaidyanathan

CoRR, 2016

Comparing Runtime Systems with Exascale Ambitions Using the Parallel Research Kernels.

[BibT_eX]

[DOI]

Rob F. Van der Wijngaart

Proceedings of the High Performance Computing - 31st International Conference, 2016

2015

Exploring Shared-Memory Optimizations for an Unstructured Mesh CFD Application on Modern Parallel Systems.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

2014

Enabling Efficient Multithreaded MPI Communication through a Library-Based Implementation of MPI Endpoints.

[BibT_eX]

[DOI]

Srinivas Sridharan

James Dinan

Dhiraj D. Kalamkar

Proceedings of the International Conference for High Performance Computing, 2014

2012

Extending the BT NAS parallel benchmark to exascale computing.

[BibT_eX]

[DOI]

Rob F. Van der Wijngaart

Srinivas Sridharan

Victor W. Lee

Proceedings of the SC Conference on High Performance Computing Networking, 2012

High Performance Non-uniform FFT on Modern X86-based Multi-core Systems.

[BibT_eX]

[DOI]

Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

2007

Evaluating synchronization techniques for light-weight multithreaded/multicore architectures.

[BibT_eX]

[DOI]

Srinivas Sridharan

Arun Rodrigues

Peter M. Kogge

Proceedings of the SPAA 2007: Proceedings of the 19th Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2007

Srinivas Sridharan

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...