Georg Hager

According to our database1, Georg Hager
  • authored at least 112 papers between 2002 and 2018.
  • has a "Dijkstra number"2 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Other 

Links

On csauthors.net:

Bibliography

2018
Multidimensional Intratile Parallelization for Memory-Starved Stencil Computations.
TOPC, 2018

Chebyshev Filter Diagonalization on Modern Manycore Processors and GPGPUs.
CoRR, 2018

On the accuracy and usefulness of analytic energy models for contemporary multicore processors.
CoRR, 2018

Chebyshev Filter Diagonalization on Modern Manycore Processors and GPGPUs.
Proceedings of the High Performance Computing - 33rd International Conference, 2018

On the Accuracy and Usefulness of Analytic Energy Models for Contemporary Multicore Processors.
Proceedings of the High Performance Computing - 33rd International Conference, 2018

2017
GHOST: Building Blocks for High Performance Sparse Linear Algebra on Heterogeneous Systems.
International Journal of Parallel Programming, 2017

Validation of hardware events for successful performance pattern identification in High Performance Computing.
CoRR, 2017

PVSC-DTM: A domain-specific language and matrix-free stencil code for investigating electronic properties of Dirac and topological materials.
CoRR, 2017

CRAFT: A library for easier application-level Checkpoint/Restart and Automatic Fault Tolerance.
CoRR, 2017

LIKWID Monitoring Stack: A flexible framework enabling job specific performance monitoring for the masses.
CoRR, 2017

An analysis of core- and chip-level architectural features in four generations of Intel server processors.
CoRR, 2017

Kerncraft: A Tool for Analytic Performance Modeling of Loop Kernels.
CoRR, 2017

Performance analysis of the Kahan-enhanced scalar product on current multi-core and many-core processors.
Concurrency and Computation: Practice and Experience, 2017

An Analysis of Core- and Chip-Level Architectural Features in Four Generations of Intel Server Processors.
Proceedings of the High Performance Computing - 32nd International Conference, 2017

LIKWID Monitoring Stack: A Flexible Framework Enabling Job Specific Performance monitoring for the masses.
Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017

2016
Towards an Exascale Enabled Sparse Solver Repository.
Proceedings of the Software for Exascale Computing - SPPEXA 2013-2015, 2016

Performance Engineering and Energy Efficiency of Building Blocks for Large, Sparse Eigenvalue Computations on Heterogeneous Supercomputers.
Proceedings of the Software for Exascale Computing - SPPEXA 2013-2015, 2016

High-performance implementation of Chebyshev filter diagonalization for interior eigenvalue computations.
J. Comput. Physics, 2016

Performance analysis of the Kahan-enhanced scalar product on current multi- and manycore processors.
CoRR, 2016

Chip-level and multi-node analysis of energy-optimized lattice Boltzmann CFD simulations.
Concurrency and Computation: Practice and Experience, 2016

Exploring performance and power properties of modern multi-core chips via simple machine models.
Concurrency and Computation: Practice and Experience, 2016

Performance and power for highly parallel systems.
Concurrency and Computation: Practice and Experience, 2016

Optimization of an Electromagnetics Code with Multicore Wavefront Diamond Blocking and Multi-dimensional Intra-Tile Parallelization.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

Analysis of Intel's Haswell Microarchitecture Using the ECM Model and Microbenchmarks.
Proceedings of the Architecture of Computing Systems - ARCS 2016, 2016

2015
Increasing the Performance of the Jacobi-Davidson Method by Blocking.
SIAM J. Scientific Computing, 2015

Multicore-Optimized Wavefront Diamond Blocking for Optimizing Stencil Updates.
SIAM J. Scientific Computing, 2015

Short Note on Costs of Floating Point Operations on current x86-64 Architectures: Denormals, Overflow, Underflow, and Division by Zero.
CoRR, 2015

Building a fault tolerant application using the GASPI communication layer.
CoRR, 2015

High-performance implementation of Chebyshev filter diagonalization for interior eigenvalue computations.
CoRR, 2015

Multi-dimensional intra-tile parallelization for memory-starved stencil computations.
CoRR, 2015

Optimization of an electromagnetics code with multicore wavefront diamond blocking and multi-dimensional intra-tile parallelization.
CoRR, 2015

GHOST: Building blocks for high performance sparse linear algebra on heterogeneous systems.
CoRR, 2015

Analysis of Intel's Haswell Microarchitecture Using The ECM Model and Microbenchmarks.
CoRR, 2015

Performance analysis of the Kahan-enhanced scalar product on current multicore processors.
CoRR, 2015

Automatic Loop Kernel Analysis and Performance Modeling With Kerncraft.
CoRR, 2015

Automatic loop kernel analysis and performance modeling with Kerncraft.
Proceedings of the 6th International Workshop on Performance Modeling, 2015

Performance Analysis of the Kahan-Enhanced Scalar Product on Current Multicore Processors.
Proceedings of the Parallel Processing and Applied Mathematics, 2015

Performance Engineering of the Kernel Polynomal Method on Large-Scale CPU-GPU Systems.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

Quantifying Performance Bottlenecks of Stencil Computations Using the Execution-Cache-Memory Model.
Proceedings of the 29th ACM on International Conference on Supercomputing, 2015

Building a Fault Tolerant Application Using the GASPI Communication Layer.
Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

2014
A Unified Sparse Matrix Data Format for Efficient General Sparse Matrix-Vector Multiplication on Modern Processors with Wide SIMD Units.
SIAM J. Scientific Computing, 2014

Domain-Specific Optimization of Two Jacobi Smoother Kernels and Their Evaluation in the ECM Performance Model.
Parallel Processing Letters, 2014

Modeling and analyzing performance for highly optimized propagation steps of the lattice Boltzmann method on sparse lattices.
CoRR, 2014

Quantifying performance bottlenecks of stencil computations using the Execution-Cache-Memory model.
CoRR, 2014

Multicore-optimized wavefront diamond blocking for optimizing stencil updates.
CoRR, 2014

Towards energy efficiency and maximum computational intensity for stencil algorithms using wavefront diamond temporal blocking.
CoRR, 2014

Performance Engineering of the Kernel Polynomial Method on Large-Scale CPU-GPU Systems.
CoRR, 2014

Comparing the Performance of Different x86 SIMD Instruction Sets for a Medical Imaging Application on Modern Multi- and Manycore Chips.
CoRR, 2014

Performance Engineering for a Medical Imaging Application on the Intel Xeon Phi Accelerator.
CoRR, 2014

Comparing the performance of different x86 SIMD instruction sets for a medical imaging application on modern multi- and manycore chips.
Proceedings of the 2014 Workshop on Programming models for SIMD/Vector processing, 2014

Overhead Analysis of Performance Counter Measurements.
Proceedings of the 43rd International Conference on Parallel Processing Workshops, 2014

ESSEX: Equipping Sparse Solvers for Exascale.
Proceedings of the Euro-Par 2014: Parallel Processing Workshops, 2014

Performance Engineering for a Medical Imaging Application on the Intel Xeon Phi Accelerator.
Proceedings of the ARCS 2014, 2014

2013
A Survey of Checkpoint/Restart Techniques on Distributed Memory Systems.
Parallel Processing Letters, 2013

Pushing the limits for medical image reconstruction on recent standard multicore processors.
IJHPCA, 2013

An analysis of energy-optimized lattice-Boltzmann CFD simulations from the chip to the highly parallel level
CoRR, 2013

Optimization of FASTEST-3D for Modern Multicore Systems
CoRR, 2013

Model-guided Performance Analysis of the Sparse Matrix-Matrix Multiplication
CoRR, 2013

Asynchronous MPI for the Masses
CoRR, 2013

A unified sparse matrix data format for modern processors with wide SIMD units.
CoRR, 2013

Comparison of different propagation steps for lattice Boltzmann methods.
Computers & Mathematics with Applications, 2013

An Evaluation of Different I/O Techniques for Checkpoint/Restart.
Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013

Model-guided performance analysis of the sparse matrix-matrix multiplication.
Proceedings of the International Conference on High Performance Computing & Simulation, 2013

2012
Expression Templates Revisited: A Performance Analysis of Current Methodologies.
SIAM J. Scientific Computing, 2012

Exploring performance and power properties of modern multicore chips via simple machine models
CoRR, 2012

Best practices for HPM-assisted performance engineering on modern multicore processors
CoRR, 2012

Sparse Matrix-vector Multiplication on GPGPU Clusters: A New Storage Format and a Scalable Implementation.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

High performance smart expression template math libraries.
Proceedings of the 2012 International Conference on High Performance Computing & Simulation, 2012

Performance Patterns and Hardware Metrics on Modern Multicore Processors: Best Practices for Performance Engineering.
Proceedings of the Euro-Par 2012: Parallel Processing Workshops, 2012

Performance Engineering: From Numbers to Insight.
Proceedings of the Euro-Par 2012: Parallel Processing Workshops, 2012

2011
Hybrid-Parallel Sparse Matrix-Vector Multiplication with Explicit Communication Overlap on Current Multicore-Based Systems.
Parallel Processing Letters, 2011

A flexible Patch-based lattice Boltzmann parallelization approach for heterogeneous GPU-CPU clusters.
Parallel Computing, 2011

Efficient multicore-aware parallelization strategies for iterative stencil computations.
J. Comput. Science, 2011

Sparse matrix-vector multiplication on GPGPU clusters: A new storage format and a scalable implementation
CoRR, 2011

Performance engineering for the Lattice Boltzmann method on GPGPUs: Architectural requirements and performance results
CoRR, 2011

Domain decomposition and locality optimization for large-scale lattice Boltzmann simulations
CoRR, 2011

Comparison of different Propagation Steps for the Lattice Boltzmann Method
CoRR, 2011

Hybrid-parallel sparse matrix-vector multiplication with explicit communication overlap on current multicore-based systems
CoRR, 2011

Pushing the limits for medical image reconstruction on recent standard multicore processors
CoRR, 2011

LIKWID: Lightweight Performance Tools
CoRR, 2011

Expression Templates Revisited: A Performance Analysis of the Current ET Methodology
CoRR, 2011

Optimizing ccNUMA locality for task-parallel execution under OpenMP and TBB on multicore-based systems
CoRR, 2011

Parallel sparse matrix-vector multiplication as a test case for hybrid MPI+OpenMP programming
CoRR, 2011

Performance analysis and optimization strategies for a D3Q19 lattice Boltzmann kernel on nVIDIA GPUs using CUDA.
Advances in Engineering Software, 2011

Poster: LIKWID: lightweight performance tools.
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, 2011

likwid-bench: An Extensible Microbenchmarking Platform for x86 Multicore Compute Nodes.
Proceedings of the Tools for High Performance Computing 2011, 2011

Parallel Sparse Matrix-Vector Multiplication as a Test Case for Hybrid MPI+OpenMP Programming.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Introduction to High Performance Computing for Scientists and Engineers.
Chapman and Hall / CRC computational science series, CRC Press, ISBN: 978-1-439-81192-4, 2011

2010
Leveraging Shared Caches for Parallel Temporal Blocking of Stencil Codes on Multicore Processors and Clusters.
Parallel Processing Letters, 2010

A Flexible Patch-Based Lattice Boltzmann Parallelization Approach for Heterogeneous GPU-CPU Clusters
CoRR, 2010

Leveraging shared caches for parallel temporal blocking of stencil codes on multicore processors and clusters
CoRR, 2010

LIKWID: A lightweight performance-oriented tool suite for x86 multicore environments
CoRR, 2010

Efficient multicore-aware parallelization strategies for iterative stencil computations
CoRR, 2010

Multicore-aware parallel temporal blocking of stencil codes for shared and distributed memory.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

LIKWID: A Lightweight Performance-Oriented Tool Suite for x86 Multicore Environments.
Proceedings of the 39th International Conference on Parallel Processing, 2010

2009
Benchmark Analysis and Application Results for Lattice Boltzmann Simulations on NEC SX Vector and Intel Nehalem Systems.
Parallel Processing Letters, 2009

Multicore-aware parallel temporal blocking of stencil codes for shared and distributed memory
CoRR, 2009

Multi-core architectures: Complexities of performance prediction and the impact of cache topology
CoRR, 2009

Performance limitations for sparse matrix-vector multiplications on current multicore environments
CoRR, 2009

Introducing a Performance Model for Bandwidth-Limited Loop Kernels
CoRR, 2009

A Proof of Concept for Optimizing Task Parallelism by Locality Queues
CoRR, 2009

Introducing a Performance Model for Bandwidth-Limited Loop Kernels.
Proceedings of the Parallel Processing and Applied Mathematics, 2009

Hybrid MPI/OpenMP Parallel Programming on Clusters of Multi-Core SMP Nodes.
Proceedings of the 17th Euromicro International Conference on Parallel, 2009

The world's fastest CPU and SMP node: Some performance results from the NEC SX-9.
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

Efficient Temporal Blocking for Stencil Computations by Multicore-Aware Wavefront Parallelization.
Proceedings of the 33rd Annual IEEE International Computer Software and Applications Conference, 2009

2008
Data Access Characteristics and Optimizations for Sun UltraSPARC T2 and T2+ Systems.
Parallel Processing Letters, 2008

Data access optimizations for highly threaded multi-core CPUs with multiple memory controllers.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

2007
RZBENCH: Performance evaluation of current HPC architectures using low-level and application benchmarks
CoRR, 2007

Data access optimizations for highly threaded multi-core CPUs with multiple memory controllers
CoRR, 2007

2006
Hybrid MPI and OpenMP Parallel Programming.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2006

2003
Exact Numerical Treatment of Finite Quantum Systems Using Leading-Edge Supercomputers.
Proceedings of the Modeling, 2003

2002
Fast Sparse Matrix-Vector Multiplication for TeraFlop/s Computers.
Proceedings of the High Performance Computing for Computational Science, 2002


  Loading...