Abdul Dakkak

According to our database1, Abdul Dakkak authored at least 25 papers between 2014 and 2021.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2021
FFT blitz: the tensor cores strike back.
Proceedings of the PPoPP '21: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2021

Accelerating Fourier and Number Theoretic Transforms using Tensor Cores and Warp Shuffles.
Proceedings of the 30th International Conference on Parallel Architectures and Compilation Techniques, 2021

2020
Compiling high-level scripting languages to performant code
PhD thesis, 2020

MLModelScope: A Distributed Platform for Model Evaluation and Benchmarking at Scale.
CoRR, 2020

DLBricks: Composable Benchmark Generation to Reduce Deep Learning Benchmarking Effort on CPUs.
Proceedings of the ICPE '20: ACM/SPEC International Conference on Performance Engineering, 2020

DLSpec: A Deep Learning Task Exchange Specification.
Proceedings of the 2020 USENIX Conference on Operational Machine Learning, 2020

Benanza: Automatic μBenchmark Generation to Compute "Lower-bound" Latency and Inform Optimizations of Deep Learning Models on GPUs.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020

XSP: Across-Stack Profiling and Analysis of Machine Learning Models on GPUs.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020

The design and implementation of the wolfram language compiler.
Proceedings of the CGO '20: 18th ACM/IEEE International Symposium on Code Generation and Optimization, 2020

The Design and Implementation of a Scalable Deep Learning Benchmarking Platform.
Proceedings of the 13th IEEE International Conference on Cloud Computing, 2020

2019
The Design and Implementation of a Scalable DL Benchmarking Platform.
CoRR, 2019

Across-Stack Profiling and Characterization of Machine Learning Models on GPUs.
CoRR, 2019

Challenges and Pitfalls of Reproducing Machine Learning Artifacts.
CoRR, 2019

Evaluating Characteristics of CUDA Communication Primitives on High-Bandwidth Interconnects.
Proceedings of the 2019 ACM/SPEC International Conference on Performance Engineering, 2019

MLModelScope: Evaluate and Introspect Cognitive Pipelines.
Proceedings of the 2019 IEEE World Congress on Services, 2019

Accelerating reduction and scan using tensor core units.
Proceedings of the ACM International Conference on Supercomputing, 2019

TrIMS: Transparent and Isolated Model Sharing for Low Latency Deep Learning Inference in Function-as-a-Service.
Proceedings of the 12th IEEE International Conference on Cloud Computing, 2019

2018
MLModelScope: Evaluate and Measure ML Models within AI Pipelines.
CoRR, 2018

TrIMS: Transparent and Isolated Model Sharing for Low Latency Deep LearningInference in Function as a Service Environments.
CoRR, 2018

SCOPE: C3SR Systems Characterization and Benchmarking Framework.
CoRR, 2018

2017
RAI: A Scalable Project Submission System for Parallel Programming Courses.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017

2016
A programming system for future proofing performance critical libraries.
Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2016

WebGPU: A Scalable Online Development Platform for GPU Programming Courses.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

2015
Enhancing the Usability and Utilization of Accelerated Architectures via Docker.
Proceedings of the 8th IEEE/ACM International Conference on Utility and Cloud Computing, 2015

2014
Triolet: a programming system that unifies algorithmic skeleton interfaces for high-performance cluster computing.
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2014


  Loading...