Bharat Kaul

According to our database¹, Bharat Kaul authored at least 32 papers between 2012 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

Scalable Pretraining of Large Mixture of Experts Language Models on Aurora Super Computer.

[BibT_eX]

[DOI]

CoRR, April, 2026

2024

Generative Active Learning for the Search of Small-molecule Protein Binders.

[BibT_eX]

[DOI]

CoRR, 2024

2023

AUTOSPARSE: Towards Automated Sparse Training of Deep Neural Networks.

[BibT_eX]

[DOI]

CoRR, 2023

2022

Accelerating Deep Learning based Identification of Chromatin Accessibility from noisy ATAC-seq Data.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2022

2021

PolyDL: Polyhedral Optimizations for Creation of High-performance DL Primitives.

[BibT_eX]

[DOI]

Ramakrishna Upadrasta

ACM Trans. Archit. Code Optim., 2021

MADRaS : Multi Agent Driving Simulator.

[BibT_eX]

[DOI]

J. Artif. Intell. Res., 2021

Efficient and Generic 1D Dilated Convolution Layer for Deep Learning.

[BibT_eX]

[DOI]

CoRR, 2021

AI Powered Compiler Techniques for DL Code Optimization.

[BibT_eX]

[DOI]

Ramakrishna Upadrasta

CoRR, 2021

GNNerator: A Hardware/Software Framework for Accelerating Graph Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 58th ACM/IEEE Design Automation Conference, 2021

SEERL: Sample Efficient Ensemble Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the AAMAS '21: 20th International Conference on Autonomous Agents and Multiagent Systems, 2021

2020

PolyScientist: Automatic Loop Transformations Combined with Microkernels for Optimization of Deep Learning Primitives.

[BibT_eX]

[DOI]

Ramakrishna Upadrasta

Bharat Kaul

CoRR, 2020

SIGMA: A Sparse and Irregular GEMM Accelerator with Flexible Interconnects for DNN Training.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2020

ERLP: Ensembles of Reinforcement Learning Policies (Student Abstract).

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019

K-TanH: Hardware Efficient Activations For Deep Learning.

[BibT_eX]

[DOI]

CoRR, 2019

High Performance Scalable FPGA Accelerator for Deep Neural Networks.

[BibT_eX]

[DOI]

CoRR, 2019

Automatic Model Parallelism for Deep Neural Networks with Compiler and Hardware Support.

[BibT_eX]

[DOI]

Sanket Tavarageri

Srinivas Sridharan

Bharat Kaul

CoRR, 2019

Mixed Precision Training With 8-bit Floating Point.

[BibT_eX]

[DOI]

CoRR, 2019

A Study of BFLOAT16 for Deep Learning Training.

[BibT_eX]

[DOI]

Nataraj Jammalamadaka

CoRR, 2019

Manna: An Accelerator for Memory-Augmented Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019

X-MANN: A Crossbar based Architecture for Memory Augmented Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 56th Annual Design Automation Conference 2019, 2019

2018

On Scale-out Deep Learning Training for Cloud and HPC.

[BibT_eX]

[DOI]

Srinivas Sridharan

Karthikeyan Vaidyanathan

CoRR, 2018

Mixed Precision Training of Convolutional Neural Networks using Integer Operations.

[BibT_eX]

[DOI]

Proceedings of the 6th International Conference on Learning Representations, 2018

Out-of-Distribution Detection Using an Ensemble of Self Supervised Leave-Out Classifiers.

[BibT_eX]

[DOI]

Apoorv Vyas

Nataraj Jammalamadaka

Proceedings of the Computer Vision - ECCV 2018, 2018

RAIL: Risk-Averse Imitation Learning.

[BibT_eX]

[DOI]

Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, 2018

2017

Ternary Neural Networks with Fine-Grained Quantization.

[BibT_eX]

[DOI]

CoRR, 2017

Mixed Low-precision Deep Learning Inference using Dynamic Fixed Point.

[BibT_eX]

[DOI]

CoRR, 2017

Ternary Residual Networks.

[BibT_eX]

[DOI]

CoRR, 2017

ScaleDeep: A Scalable Compute Architecture for Learning and Evaluating Deep Networks.

[BibT_eX]

[DOI]

Swagath Venkataramani

Proceedings of the 44th Annual International Symposium on Computer Architecture, 2017

2016

Distributed Deep Learning Using Synchronous Stochastic Gradient Descent.

[BibT_eX]

[DOI]

Dipankar Das

Sasikanth Avancha

Dheevatsa Mudigere

Karthikeyan Vaidyanathan

CoRR, 2016

2015

Exploring Shared-Memory Optimizations for an Unstructured Mesh CFD Application on Modern Parallel Systems.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

2014

Improving Communication Performance and Scalability of Native Applications on Intel Xeon Phi Coprocessor Clusters.

[BibT_eX]

[DOI]

Karthikeyan Vaidyanathan

Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

2012

High Performance Non-uniform FFT on Modern X86-based Multi-core Systems.

[BibT_eX]

[DOI]

Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

Bharat Kaul

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...