Tal Ben-Nun

According to our database1, Tal Ben-Nun authored at least 25 papers between 2009 and 2019.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Other 

Links

On csauthors.net:

Bibliography

2019
Demystifying Parallel and Distributed Deep Learning: An In-depth Concurrency Analysis.
ACM Comput. Surv., 2019

Mix & Match: training convnets with mixed image sizes for improved accuracy, speed and scale resiliency.
CoRR, 2019

Taming Unbalanced Training Workloads in Deep Learning with Partial Collective Operations.
CoRR, 2019

Graph Processing on FPGAs: Taxonomy, Survey, Challenges.
CoRR, 2019

Stateful Dataflow Multigraphs: A Data-Centric Model for High-Performance Parallel Programs.
CoRR, 2019

A Modular Benchmarking Infrastructure for High-Performance and Reproducible Deep Learning.
CoRR, 2019

Augment your batch: better training with larger batches.
CoRR, 2019

A Modular Benchmarking Infrastructure for High-Performance and Reproducible Deep Learning.
Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium, 2019

Substream-Centric Maximum Matchings on FPGA.
Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2019

2018
Neural Code Comprehension: A Learnable Representation of Code Semantics.
CoRR, 2018

μ-cuDNN: Accelerating Deep Learning Frameworks with Micro-Batching.
CoRR, 2018

Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis.
CoRR, 2018

Neural Code Comprehension: A Learnable Representation of Code Semantics.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Optimizing Parallel Graph Connectivity Computation via Subgraph Sampling.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

Accelerating Deep Learning Frameworks with Micro-Batches.
Proceedings of the IEEE International Conference on Cluster Computing, 2018

2017
Groute: An Asynchronous Multi-GPU Programming Model for Irregular Computations.
Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2017

Big data causing big (TLB) problems: taming random memory accesses on the GPU.
Proceedings of the 13th International Workshop on Data Management on New Hardware, 2017

2016
FFMK: A Fast and Fault-Tolerant Microkernel-Based System for Exascale Computing.
Proceedings of the Software for Exascale Computing - SPPEXA 2013-2015, 2016

Spline-based parallel nonlinear optimization of function sequences.
J. Parallel Distrib. Comput., 2016

Reciprocal Grids: A Hierarchical Algorithm for Computing Solution X-ray Scattering Curves from Supramolecular Complexes at High Resolution.
Journal of Chemical Information and Modeling, 2016

Adaptive Work-Efficient Connected Components on the GPU.
CoRR, 2016

2015
Memory access patterns: the missing piece of the multi-GPU puzzle.
Proceedings of the International Conference for High Performance Computing, 2015

2014
MAPS: Optimizing Massively Parallel Applications Using Device-Level Memory Abstraction.
TACO, 2014

2010
Design and implementation of a generic resource sharing virtual time dispatcher.
Proceedings of of SYSTOR 2010: The 3rd Annual Haifa Experimental Systems Conference, 2010

2009
A global scheduling framework for virtualization environments.
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009


  Loading...