Srinivas Sridharan

According to our database1, Srinivas Sridharan authored at least 17 papers between 2007 and 2020.

Efficient Communication Acceleration for Next-Gen Scale-up Deep Learning Training Platforms.
CoRR, 2020

Deep Learning Training in Facebook Data Centers: Design of Scale-up and Scale-out Systems.
CoRR, 2020

ASTRA-SIM: Enabling SW/HW Co-Design Exploration for Distributed DL Training Platforms.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2020

Scalable Distributed Training of Recommendation Models: An ASTRA-SIM + NS3 case-study with TCP/IP transport.
Proceedings of the IEEE Symposium on High-Performance Interconnects, 2020

Automatic Model Parallelism for Deep Neural Networks with Compiler and Hardware Support.
CoRR, 2019

TensorFlow at Scale: Performance and productivity analysis of distributed training with Horovod, MLSL, and Cray PE ML.
Concurr. Comput. Pract. Exp., 2019

Training Google Neural Machine Translation on an Intel CPU Cluster.
Proceedings of the 2019 IEEE International Conference on Cluster Computing, 2019

On Scale-out Deep Learning Training for Cloud and HPC.
CoRR, 2018

Mixed Precision Training of Convolutional Neural Networks using Integer Operations.
Proceedings of the 6th International Conference on Learning Representations, 2018

Deep learning at 15PF: supervised and semi-supervised classification for scientific data.
Proceedings of the International Conference for High Performance Computing, 2017

Distributed Deep Learning Using Synchronous Stochastic Gradient Descent.
CoRR, 2016

Comparing Runtime Systems with Exascale Ambitions Using the Parallel Research Kernels.
Proceedings of the High Performance Computing - 31st International Conference, 2016

Exploring Shared-Memory Optimizations for an Unstructured Mesh CFD Application on Modern Parallel Systems.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

Enabling Efficient Multithreaded MPI Communication through a Library-Based Implementation of MPI Endpoints.
Proceedings of the International Conference for High Performance Computing, 2014

Extending the BT NAS parallel benchmark to exascale computing.
Proceedings of the SC Conference on High Performance Computing Networking, 2012

High Performance Non-uniform FFT on Modern X86-based Multi-core Systems.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

Evaluating synchronization techniques for light-weight multithreaded/multicore architectures.
Proceedings of the SPAA 2007: Proceedings of the 19th Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2007