Eiman Ebrahimi

According to our database1, Eiman Ebrahimi authored at least 27 papers between 2006 and 2021.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Other 

Links

On csauthors.net:

Bibliography

2021
Large Graph Convolutional Network Training with GPU-Oriented Data Communication Architecture.
Proc. VLDB Endow., 2021

PyTorch-Direct: Enabling GPU Centric Data Access for Very Large Graph Neural Network Training with Irregular Accesses.
CoRR, 2021

SiP-ML: high-bandwidth optical network interconnects for machine learning training.
Proceedings of the ACM SIGCOMM 2021 Conference, Virtual Event, USA, August 23-27, 2021., 2021

2020
EMOGI: Efficient Memory-access for Out-of-memory Graph-traversal In GPUs.
Proc. VLDB Endow., 2020

Efficient Inference on GPUs for the Sparse Deep Neural Network Graph Challenge 2020.
CoRR, 2020

Planaria: Dynamic Architecture Fission for Spatial Multi-Tenant Acceleration of Deep Neural Networks.
Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020

At-Scale Sparse Deep Neural Network Inference With Efficient GPU Implementation.
Proceedings of the 2020 IEEE High Performance Extreme Computing Conference, 2020

2019
DUCATI: High-performance Address Translation by Extending TLB Reach of GPU-accelerated Systems.
ACM Trans. Archit. Code Optim., 2019

Optimizing Multi-GPU Parallelization Strategies for Deep Learning Training.
IEEE Micro, 2019

2018
Combining HW/SW Mechanisms to Improve NUMA Performance of Multi-GPU Systems.
Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018

A Case for Richer Cross-Layer Abstractions: Bridging the Semantic Gap with Expressive Memory.
Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture, 2018

The Locality Descriptor: A Holistic Cross-Layer Abstraction to Express Data Locality In GPUs.
Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture, 2018

2017
Beyond the socket: NUMA-aware GPUs.
Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

MCM-GPU: Multi-Chip-Module GPUs for Continued Performance Scalability.
Proceedings of the 44th Annual International Symposium on Computer Architecture, 2017

2016
Transparent Offloading and Mapping (TOM): Enabling Programmer-Transparent Near-Data Processing in GPU Systems.
Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture, 2016

Accelerating Dependent Cache Misses with an Enhanced Memory Controller.
Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture, 2016

Selective GPU caches to eliminate CPU-GPU HW cache coherence.
Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture, 2016

2015
Flexible software profiling of GPU architectures.
Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

2012
Fairness via Source Throttling: A Configurable and High-Performance Fairness Substrate for Multicore Memory Systems.
ACM Trans. Comput. Syst., 2012

Energy Savings via Dead Sub-Block Prediction.
Proceedings of the IEEE 24th International Symposium on Computer Architecture and High Performance Computing, 2012

Predicting Performance Impact of DVFS for Realistic Memory Systems.
Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture, 2012

2011
Parallel application memory scheduling.
Proceedings of the 44rd Annual IEEE/ACM International Symposium on Microarchitecture, 2011

Prefetch-aware shared resource management for multi-core systems.
Proceedings of the 38th International Symposium on Computer Architecture (ISCA 2011), 2011

2010
Fairness via source throttling: a configurable and high-performance fairness substrate for multi-core memory systems.
Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems, 2010

2009
Coordinated control of multiple prefetchers in multi-core systems.
Proceedings of the 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009), 2009

Techniques for bandwidth-efficient prefetching of linked data structures in hybrid prefetching systems.
Proceedings of the 15th International Conference on High-Performance Computer Architecture (HPCA-15 2009), 2009

2006
DCim++: a C++ library for object oriented hardware design and distributed simulation.
Proceedings of the International Symposium on Circuits and Systems (ISCAS 2006), 2006


  Loading...