Georg Hager

Orcid: 0000-0002-8723-2781

Affiliations:
  • Erlangen National High Performance Computing Center, Germany


According to our database1, Georg Hager authored at least 130 papers between 2002 and 2023.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2023
MD-Bench: A performance-focused prototyping harness for state-of-the-art short-range molecular dynamics algorithms.
Future Gener. Comput. Syst., December, 2023

Making applications faster by asynchronous execution: Slowing down processes or relaxing MPI collectives.
Future Gener. Comput. Syst., November, 2023

Orthogonal Layers of Parallelism in Large-Scale Eigenvalue Computations.
ACM Trans. Parallel Comput., September, 2023

Analytical performance estimation during code generation on modern GPUs.
J. Parallel Distributed Comput., March, 2023

Level-Based Blocking for Sparse Matrices: Sparse Matrix-Power-Vector Multiplication.
IEEE Trans. Parallel Distributed Syst., February, 2023

The Role of Idle Waves, Desynchronization, and Bottleneck Evasion in the Performance of Parallel Programs.
IEEE Trans. Parallel Distributed Syst., February, 2023

CloverLeaf on Intel Multi-Core CPUs: A Case Study in Write-Allocate Evasion.
CoRR, 2023

Algebraic Temporal Blocking for Sparse Iterative Solvers on Multi-Core CPUs.
CoRR, 2023

MD-Bench: Engineering the in-core performance of short-range molecular dynamics kernels from state-of-the-art simulation packages.
CoRR, 2023

Core-Level Performance Engineering with the Open-Source Architecture Code Analyzer (OSACA) and the Compiler Explorer.
Proceedings of the Companion of the 2023 ACM/SPEC International Conference on Performance Engineering, 2023

Application Knowledge Required: Performance Modeling for Fun and Profit.
Proceedings of the 2023 ACM/SPEC International Conference on Performance Engineering, 2023

SPEChpc 2021 Benchmarks on Ice Lake and Sapphire Rapids Infiniband Clusters: A Performance and Energy Case Study.
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

Physical Oscillator Model for Supercomputing.
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

2022
Execution-Cache-Memory modeling and performance tuning of sparse matrix-vector multiplication and Lattice quantum chromodynamics on A64FX.
Concurr. Comput. Pract. Exp., 2022

Analytic performance model for parallel overlapping memory-bound kernels.
Concurr. Comput. Pract. Exp., 2022

Exploring Techniques for the Analysis of Spontaneous Asynchronicity in MPI-Parallel Applications.
Proceedings of the Parallel Processing and Applied Mathematics, 2022

Addressing White-box Modeling and Simulation Challenges in Parallel Computing.
Proceedings of the SIGSIM-PADS '22: SIGSIM Conference on Principles of Advanced Discrete Simulation, Atlanta, GA, USA, June 8, 2022

2021
A domain-specific language and matrix-free stencil code for investigating electronic properties of Dirac and topological materials.
Int. J. High Perform. Comput. Appl., 2021

Performance engineering for real and complex tall & skinny matrix multiplication kernels on GPUs.
Int. J. High Perform. Comput. Appl., 2021

ECM modeling and performance tuning of SpMV and Lattice QCD on A64FX.
CoRR, 2021

Analytic Modeling of Idle Waves in Parallel Programs: Communication, Cluster Topology, and Noise Impact.
Proceedings of the High Performance Computing - 36th International Conference, 2021

Opening the Black Box: Performance Estimation during Code Generation for GPUs.
Proceedings of the 33rd IEEE International Symposium on Computer Architecture and High Performance Computing, 2021

YaskSite: Stencil Optimization Techniques Applied to Explicit ODE Methods on Modern Architectures.
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2021

2020

A Recursive Algebraic Coloring Technique for Hardware-efficient Symmetric Sparse Matrix-vector Multiplication.
ACM Trans. Parallel Comput., 2020

PHIST: A Pipelined, Hybrid-Parallel Iterative Solver Toolkit.
ACM Trans. Math. Softw., 2020

Bridging the Architecture Gap: Abstracting Performance-Relevant Properties of Modern Server Processors.
Supercomput. Front. Innov., 2020

Analytic performance modeling and analysis of detailed neuron simulations.
Int. J. High Perform. Comput. Appl., 2020

An analytic performance model for overlapping execution of memory-bound loop kernels on multicore CPUs.
CoRR, 2020

Understanding HPC Benchmark Performance on Intel Broadwell and Cascade Lake Processors.
Proceedings of the High Performance Computing - 35th International Conference, 2020

Desynchronization and Wave Pattern Formation in MPI-Parallel and Hybrid Memory-Bound Programs.
Proceedings of the High Performance Computing - 35th International Conference, 2020

Performance Modeling of Streaming Kernels and Sparse Matrix-Vector Multiplication on A64FX.
Proceedings of the 2020 IEEE/ACM Performance Modeling, 2020

2019
CRAFT: A Library for Easier Application-Level Checkpoint/Restart and Automatic Fault Tolerance.
IEEE Trans. Parallel Distributed Syst., 2019

Collecting and Presenting Reproducible Intranode Stencil Performance: INSPECT.
Supercomput. Front. Innov., 2019

Delay Propagation and Overlapping Mechanisms on Clusters: A Case Study of Idle Periods based on Workload, Communication, and Delay Granularity.
CoRR, 2019

Performance Engineering for a Tall & Skinny Matrix Multiplication Kernel on GPUs.
CoRR, 2019

Automatic Throughput and Critical Path Analysis of x86 and ARM Assembly Kernels.
Proceedings of the 2019 IEEE/ACM Performance Modeling, 2019

Performance Engineering for a Tall & Skinny Matrix Multiplication Kernels on GPUs.
Proceedings of the Parallel Processing and Applied Mathematics, 2019

Propagation and Decay of Injected One-Off Delays on Clusters: A Case Study.
Proceedings of the 2019 IEEE International Conference on Cluster Computing, 2019

2018
Multidimensional Intratile Parallelization for Memory-Starved Stencil Computations.
ACM Trans. Parallel Comput., 2018

Performance Engineering.
Inform. Spektrum, 2018

Building and utilizing fault tolerance support tools for the GASPI applications.
Int. J. High Perform. Comput. Appl., 2018

Chebyshev Filter Diagonalization on Modern Manycore Processors and GPGPUs.
CoRR, 2018

Chebyshev Filter Diagonalization on Modern Manycore Processors and GPGPUs.
Proceedings of the High Performance Computing - 33rd International Conference, 2018

On the Accuracy and Usefulness of Analytic Energy Models for Contemporary Multicore Processors.
Proceedings of the High Performance Computing - 33rd International Conference, 2018

Automated Instruction Stream Throughput Prediction for Intel and AMD Microarchitectures.
Proceedings of the 2018 IEEE/ACM Performance Modeling, 2018

Multicore Performance Engineering of Sparse Triangular Solves Using a Modified Roofline Model.
Proceedings of the 30th International Symposium on Computer Architecture and High Performance Computing, 2018

2017
GHOST: Building Blocks for High Performance Sparse Linear Algebra on Heterogeneous Systems.
Int. J. Parallel Program., 2017

Validation of hardware events for successful performance pattern identification in High Performance Computing.
CoRR, 2017

PVSC-DTM: A domain-specific language and matrix-free stencil code for investigating electronic properties of Dirac and topological materials.
CoRR, 2017

Kerncraft: A Tool for Analytic Performance Modeling of Loop Kernels.
CoRR, 2017

Performance analysis of the Kahan-enhanced scalar product on current multi-core and many-core processors.
Concurr. Comput. Pract. Exp., 2017

An Analysis of Core- and Chip-Level Architectural Features in Four Generations of Intel Server Processors.
Proceedings of the High Performance Computing - 32nd International Conference, 2017

LIKWID Monitoring Stack: A Flexible Framework Enabling Job Specific Performance monitoring for the masses.
Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017

2016
Towards an Exascale Enabled Sparse Solver Repository.
Proceedings of the Software for Exascale Computing - SPPEXA 2013-2015, 2016

Performance Engineering and Energy Efficiency of Building Blocks for Large, Sparse Eigenvalue Computations on Heterogeneous Supercomputers.
Proceedings of the Software for Exascale Computing - SPPEXA 2013-2015, 2016

High-performance implementation of Chebyshev filter diagonalization for interior eigenvalue computations.
J. Comput. Phys., 2016

Performance analysis of the Kahan-enhanced scalar product on current multi- and manycore processors.
CoRR, 2016

Chip-level and multi-node analysis of energy-optimized lattice Boltzmann CFD simulations.
Concurr. Comput. Pract. Exp., 2016

Exploring performance and power properties of modern multi-core chips via simple machine models.
Concurr. Comput. Pract. Exp., 2016

Performance and power for highly parallel systems.
Concurr. Comput. Pract. Exp., 2016

Optimization of an Electromagnetics Code with Multicore Wavefront Diamond Blocking and Multi-dimensional Intra-Tile Parallelization.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

Analysis of Intel's Haswell Microarchitecture Using the ECM Model and Microbenchmarks.
Proceedings of the Architecture of Computing Systems - ARCS 2016, 2016

2015
Increasing the Performance of the Jacobi-Davidson Method by Blocking.
SIAM J. Sci. Comput., 2015

Multicore-Optimized Wavefront Diamond Blocking for Optimizing Stencil Updates.
SIAM J. Sci. Comput., 2015

Short Note on Costs of Floating Point Operations on current x86-64 Architectures: Denormals, Overflow, Underflow, and Division by Zero.
CoRR, 2015

Multi-dimensional intra-tile parallelization for memory-starved stencil computations.
CoRR, 2015

Performance analysis of the Kahan-enhanced scalar product on current multicore processors.
CoRR, 2015

Automatic loop kernel analysis and performance modeling with Kerncraft.
Proceedings of the 6th International Workshop on Performance Modeling, 2015

Performance Analysis of the Kahan-Enhanced Scalar Product on Current Multicore Processors.
Proceedings of the Parallel Processing and Applied Mathematics, 2015

Performance Engineering of the Kernel Polynomal Method on Large-Scale CPU-GPU Systems.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

Quantifying Performance Bottlenecks of Stencil Computations Using the Execution-Cache-Memory Model.
Proceedings of the 29th ACM on International Conference on Supercomputing, 2015

Building a Fault Tolerant Application Using the GASPI Communication Layer.
Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

2014
A Unified Sparse Matrix Data Format for Efficient General Sparse Matrix-Vector Multiplication on Modern Processors with Wide SIMD Units.
SIAM J. Sci. Comput., 2014

Domain-Specific Optimization of Two Jacobi Smoother Kernels and Their Evaluation in the ECM Performance Model.
Parallel Process. Lett., 2014

Modeling and analyzing performance for highly optimized propagation steps of the lattice Boltzmann method on sparse lattices.
CoRR, 2014

Towards energy efficiency and maximum computational intensity for stencil algorithms using wavefront diamond temporal blocking.
CoRR, 2014

Performance Engineering of the Kernel Polynomial Method on Large-Scale CPU-GPU Systems.
CoRR, 2014

Comparing the performance of different x86 SIMD instruction sets for a medical imaging application on modern multi- and manycore chips.
Proceedings of the 2014 Workshop on Programming models for SIMD/Vector processing, 2014

Overhead Analysis of Performance Counter Measurements.
Proceedings of the 43rd International Conference on Parallel Processing Workshops, 2014

ESSEX: Equipping Sparse Solvers for Exascale.
Proceedings of the Euro-Par 2014: Parallel Processing Workshops, 2014

Performance Engineering for a Medical Imaging Application on the Intel Xeon Phi Accelerator.
Proceedings of the ARCS 2014, 2014

2013
A Survey of Checkpoint/Restart Techniques on Distributed Memory Systems.
Parallel Process. Lett., 2013

Pushing the limits for medical image reconstruction on recent standard multicore processors.
Int. J. High Perform. Comput. Appl., 2013

An analysis of energy-optimized lattice-Boltzmann CFD simulations from the chip to the highly parallel level
CoRR, 2013

Optimization of FASTEST-3D for Modern Multicore Systems
CoRR, 2013

Asynchronous MPI for the Masses
CoRR, 2013

A unified sparse matrix data format for modern processors with wide SIMD units.
CoRR, 2013

Comparison of different propagation steps for lattice Boltzmann methods.
Comput. Math. Appl., 2013

An Evaluation of Different I/O Techniques for Checkpoint/Restart.
Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013

Model-guided performance analysis of the sparse matrix-matrix multiplication.
Proceedings of the International Conference on High Performance Computing & Simulation, 2013

2012
Expression Templates Revisited: A Performance Analysis of Current Methodologies.
SIAM J. Sci. Comput., 2012

Exploring performance and power properties of modern multicore chips via simple machine models
CoRR, 2012

Best practices for HPM-assisted performance engineering on modern multicore processors
CoRR, 2012

Sparse Matrix-vector Multiplication on GPGPU Clusters: A New Storage Format and a Scalable Implementation.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

High performance smart expression template math libraries.
Proceedings of the 2012 International Conference on High Performance Computing & Simulation, 2012

Performance Patterns and Hardware Metrics on Modern Multicore Processors: Best Practices for Performance Engineering.
Proceedings of the Euro-Par 2012: Parallel Processing Workshops, 2012

Performance Engineering: From Numbers to Insight.
Proceedings of the Euro-Par 2012: Parallel Processing Workshops, 2012

2011
Hybrid-Parallel Sparse Matrix-Vector Multiplication with Explicit Communication Overlap on Current Multicore-Based Systems.
Parallel Process. Lett., 2011

A flexible Patch-based lattice Boltzmann parallelization approach for heterogeneous GPU-CPU clusters.
Parallel Comput., 2011

Efficient multicore-aware parallelization strategies for iterative stencil computations.
J. Comput. Sci., 2011

Performance engineering for the Lattice Boltzmann method on GPGPUs: Architectural requirements and performance results
CoRR, 2011

Domain decomposition and locality optimization for large-scale lattice Boltzmann simulations
CoRR, 2011

Comparison of different Propagation Steps for the Lattice Boltzmann Method
CoRR, 2011

Expression Templates Revisited: A Performance Analysis of the Current ET Methodology
CoRR, 2011

Optimizing ccNUMA locality for task-parallel execution under OpenMP and TBB on multicore-based systems
CoRR, 2011

Performance analysis and optimization strategies for a D3Q19 lattice Boltzmann kernel on nVIDIA GPUs using CUDA.
Adv. Eng. Softw., 2011

Poster: LIKWID: lightweight performance tools.
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, 2011

likwid-bench: An Extensible Microbenchmarking Platform for x86 Multicore Compute Nodes.
Proceedings of the Tools for High Performance Computing 2011, 2011

Parallel Sparse Matrix-Vector Multiplication as a Test Case for Hybrid MPI+OpenMP Programming.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Introduction to High Performance Computing for Scientists and Engineers.
Chapman and Hall / CRC computational science series, CRC Press, ISBN: 978-1-439-81192-4, 2011

2010
Leveraging Shared Caches for Parallel Temporal Blocking of Stencil Codes on Multicore Processors and Clusters.
Parallel Process. Lett., 2010

Multicore-aware parallel temporal blocking of stencil codes for shared and distributed memory.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

LIKWID: A Lightweight Performance-Oriented Tool Suite for x86 Multicore Environments.
Proceedings of the 39th International Conference on Parallel Processing, 2010

LIKWID: Lightweight Performance Tools.
Proceedings of the Competence in High Performance Computing 2010, 2010

2009
Benchmark Analysis and Application Results for Lattice Boltzmann Simulations on NEC SX Vector and Intel Nehalem Systems.
Parallel Process. Lett., 2009

Multi-core architectures: Complexities of performance prediction and the impact of cache topology
CoRR, 2009

Performance limitations for sparse matrix-vector multiplications on current multicore environments
CoRR, 2009

A Proof of Concept for Optimizing Task Parallelism by Locality Queues
CoRR, 2009

Introducing a Performance Model for Bandwidth-Limited Loop Kernels.
Proceedings of the Parallel Processing and Applied Mathematics, 2009

Hybrid MPI/OpenMP Parallel Programming on Clusters of Multi-Core SMP Nodes.
Proceedings of the 17th Euromicro International Conference on Parallel, 2009

The world's fastest CPU and SMP node: Some performance results from the NEC SX-9.
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

Efficient Temporal Blocking for Stencil Computations by Multicore-Aware Wavefront Parallelization.
Proceedings of the 33rd Annual IEEE International Computer Software and Applications Conference, 2009

2008
Data Access Characteristics and Optimizations for Sun UltraSPARC T2 and T2+ Systems.
Parallel Process. Lett., 2008

Data access optimizations for highly threaded multi-core CPUs with multiple memory controllers.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Vector Computers in a World of Commodity Clusters, Massively Parallel Systems and Many-Core Many-Threaded CPUs: Recent Experience Based on an Advanced Lattice Boltzmann Flow Solver.
Proceedings of the High Performance Computing in Science and Engineering '08, 2008

2007
RZBENCH: Performance evaluation of current HPC architectures using low-level and application benchmarks
CoRR, 2007

2006
Hybrid MPI and OpenMP Parallel Programming.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2006

2003
Exact Numerical Treatment of Finite Quantum Systems Using Leading-Edge Supercomputers.
Proceedings of the Modeling, 2003

2002
Fast Sparse Matrix-Vector Multiplication for TeraFlop/s Computers.
Proceedings of the High Performance Computing for Computational Science, 2002


  Loading...