Srinivas Sridharan

Affiliations:
  • Intel Corporation, Hillsboro, OR, USA
  • Intel Corporation, Bangalore, India
  • University of Notre Dame, IN, USA


According to our database1, Srinivas Sridharan authored at least 27 papers between 2007 and 2023.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2023
Chakra: Advancing Performance Benchmarking and Co-design using Standardized Execution Traces.
CoRR, 2023

Better Together: Jointly Optimizing ML Collective Scheduling and Execution Planning using SYNDICATE.
Proceedings of the 20th USENIX Symposium on Networked Systems Design and Implementation, 2023

ASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems for Large-model Training at Scale.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2023

2022
Themis: a network bandwidth-aware collective scheduling policy for distributed training of DL models.
Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022


Impact of RoCE Congestion Control Policies on Distributed Training of DNNs.
Proceedings of the IEEE Symposium on High-Performance Interconnects, 2022

2021
High-performance, Distributed Training of Large-scale Deep Learning Recommendation Models.
CoRR, 2021

Enabling Compute-Communication Overlap in Distributed Deep Learning Training Platforms.
Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021

2020
Efficient Communication Acceleration for Next-Gen Scale-up Deep Learning Training Platforms.
CoRR, 2020

Deep Learning Training in Facebook Data Centers: Design of Scale-up and Scale-out Systems.
CoRR, 2020

ASTRA-SIM: Enabling SW/HW Co-Design Exploration for Distributed DL Training Platforms.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2020

Scalable Distributed Training of Recommendation Models: An ASTRA-SIM + NS3 case-study with TCP/IP transport.
Proceedings of the IEEE Symposium on High-Performance Interconnects, 2020

2019
Planning for performance: Enhancing achievable performance for MPI through persistent collective operations.
Parallel Comput., 2019

Automatic Model Parallelism for Deep Neural Networks with Compiler and Hardware Support.
CoRR, 2019

TensorFlow at Scale: Performance and productivity analysis of distributed training with Horovod, MLSL, and Cray PE ML.
Concurr. Comput. Pract. Exp., 2019

Training Google Neural Machine Translation on an Intel CPU Cluster.
Proceedings of the 2019 IEEE International Conference on Cluster Computing, 2019

2018
On Scale-out Deep Learning Training for Cloud and HPC.
CoRR, 2018

Mixed Precision Training of Convolutional Neural Networks using Integer Operations.
Proceedings of the 6th International Conference on Learning Representations, 2018

2017
Deep learning at 15PF: supervised and semi-supervised classification for scientific data.
Proceedings of the International Conference for High Performance Computing, 2017

Planning for performance: persistent collective operations for MPI.
Proceedings of the 24th European MPI Users' Group Meeting, 2017

2016
Distributed Deep Learning Using Synchronous Stochastic Gradient Descent.
CoRR, 2016

Comparing Runtime Systems with Exascale Ambitions Using the Parallel Research Kernels.
Proceedings of the High Performance Computing - 31st International Conference, 2016

2015
Exploring Shared-Memory Optimizations for an Unstructured Mesh CFD Application on Modern Parallel Systems.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

2014
Enabling Efficient Multithreaded MPI Communication through a Library-Based Implementation of MPI Endpoints.
Proceedings of the International Conference for High Performance Computing, 2014

2012
Extending the BT NAS parallel benchmark to exascale computing.
Proceedings of the SC Conference on High Performance Computing Networking, 2012

High Performance Non-uniform FFT on Modern X86-based Multi-core Systems.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

2007
Evaluating synchronization techniques for light-weight multithreaded/multicore architectures.
Proceedings of the SPAA 2007: Proceedings of the 19th Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2007


  Loading...