Khaled Z. Ibrahim

According to our database1, Khaled Z. Ibrahim authored at least 39 papers between 2001 and 2017.

Collaborative distances :
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Other 

Links

On csauthors.net:

Bibliography

2017
Cross-scale efficient tensor contractions for coupled cluster computations through multiple programming model backends.
J. Parallel Distrib. Comput., 2017

Reaching bandwidth saturation using transparent injection parallelization.
IJHPCA, 2017

APHiD: Hierarchical Task Placement to Enable a Tapered Fat Tree Topology for Lower Power and Cost in HPC Networks.
Proceedings of the 17th IEEE/ACM International Symposium on Cluster, 2017

2016
Scaling Spark on Lustre.
Proceedings of the High Performance Computing, 2016

Extreme scale plasma turbulence simulations on top supercomputers worldwide.
Proceedings of the International Conference for High Performance Computing, 2016

Characterizing the Performance of Hybrid Memory Cube Using ApexMAP Application Probes.
Proceedings of the Second International Symposium on Memory Systems, 2016

Scaling Spark on HPC Systems.
Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing, 2016

2015
Modern Gyrokinetic Particle-In-Cell Simulation of Fusion Plasmas on Top Supercomputers.
CoRR, 2015

Exploiting communication concurrency on high performance computing systems.
Proceedings of the Sixth International Workshop on Programming Models and Applications for Multicores and Manycores, 2015

2014
The Case for Partitioning Virtual Machines on Multicore Architectures.
IEEE Trans. Parallel Distrib. Syst., 2014

Efficient Interoperability of OpenSHMEM on Multicore Architectures.
Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models, 2014

An Evaluation of One-Sided and Two-Sided Communication Paradigms on Relaxed-Ordering Interconnect.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

On the conditions for efficient interoperability with threads: an experience with PGAS languages using cray communication domains.
Proceedings of the 2014 International Conference on Supercomputing, 2014

Analysis and tuning of libtensor framework on multicore architectures.
Proceedings of the 21st International Conference on High Performance Computing, 2014

2013
Analysis and optimization of gyrokinetic toroidal simulations on homogenous and heterogenous platforms.
IJHPCA, 2013

Kinetic turbulence simulations at extreme scale on leadership-class systems.
Proceedings of the International Conference for High Performance Computing, 2013

2012
Code Development of High-Performance Applications for Power-Efficient Architectures.
Proceedings of the Handbook of Energy-Aware and Green Computing - Two Volume Set., 2012

Poster: Advances in Gyrokinetic Particle in Cell Simulation for Fusion Plasmas to Extreme Scale.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012

Abstract: Advances in Gyrokinetic Particle in Cell Simulation for Fusion Plasmas to Extreme Scale.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012

Congestion avoidance on manycore high performance computing systems.
Proceedings of the International Conference on Supercomputing, 2012

Concurrent Phase Classification for Accelerating MPSoC Simulation.
Proceedings of the ARCS 2012 Workshops, 28. Februar - 2. März 2012, München, Germany, 2012

2011
Gyrokinetic particle-in-cell optimization on emerging multi- and manycore platforms.
Parallel Computing, 2011

Gyrokinetic toroidal simulations on leading multi- and manycore HPC systems.
Proceedings of the Conference on High Performance Computing Networking, 2011

Optimized pre-copy live migration for memory intensive applications.
Proceedings of the Conference on High Performance Computing Networking, 2011

Characterizing the Performance of Parallel Applications on Multi-socket Virtual Machines.
Proceedings of the 11th IEEE/ACM International Symposium on Cluster, 2011

2010
Parallel application sampling for accelerating MPSoC simulation.
Design Autom. for Emb. Sys., 2010

Characterizing the Relation Between Apex-Map Synthetic Probes and Reuse Distance Distributions.
Proceedings of the 39th International Conference on Parallel Processing, 2010

Bridging the gap between complex software paradigms and power-efficient parallel architectures.
Proceedings of the International Green Computing Conference 2010, 2010

2009
Power-Aware Bus Coscheduling for Periodic Realtime Applications Running on Multiprocessor SoC.
Trans. HiPEAC, 2009

Efficient SIMDization and data management of the Lattice QCD computation on the Cell Broadband Engine.
Scientific Programming, 2009

2008
Fine-grained parallelization of lattice QCD kernel routine on GPUs.
J. Parallel Distrib. Comput., 2008

Implementing Wilson-Dirac operator on the cell broadband engine.
Proceedings of the 22nd Annual International Conference on Supercomputing, 2008

Multi-granularity sampling for simulating concurrent heterogeneous applications.
Proceedings of the 2008 International Conference on Compilers, 2008

2007
Adaptive Sampling for Efficient MPSoC Architecture Simulation.
Proceedings of the 15th International Symposium on Modeling, 2007

2005
Correlation between Detailed and Simplified Simulations in Studying Multiprocessor Architecture.
Proceedings of the 23rd International Conference on Computer Design (ICCD 2005), 2005

Efficient Architectural Support for Secure Bus-Based Shared Memory Multiprocessor.
Proceedings of the Advances in Computer Systems Architecture, 10th Asia-Pacific Conference, 2005

2003
Extending OpenMP to Support Slipstream Execution Mode.
Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), 2003

Slipstream Execution Mode for CMP-Based Multiprocessors.
Proceedings of the Ninth International Symposium on High-Performance Computer Architecture (HPCA'03), 2003

2001
On the Exploitation of Value Predication and Producer Identification to Reduce Barrier Synchronization Time.
Proceedings of the 15th International Parallel & Distributed Processing Symposium (IPDPS-01), 2001


  Loading...