Alexander Heinecke

According to our database1, Alexander Heinecke authored at least 53 papers between 2007 and 2019.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Other 

Links

On csauthors.net:

Bibliography

2019
High-Performance Deep Learning via a Single Building Block.
CoRR, 2019

A Study of BFLOAT16 for Deep Learning Training.
CoRR, 2019

Leveraging the bfloat16 Artificial Intelligence Datatype For Higher-Precision Computations.
CoRR, 2019

Petaflop Seismic Simulations in the Public Cloud.
Proceedings of the High Performance Computing - 34th International Conference, 2019

ISA mapper: a compute and hardware agnostic deep learning compiler.
Proceedings of the 16th ACM International Conference on Computing Frontiers, 2019

2018
ISA Mapper: A Compute and Hardware Agnostic Deep Learning Compiler.
CoRR, 2018

Anatomy Of High-Performance Deep Learning Convolutions On SIMD Architectures.
CoRR, 2018

Mixed Precision Training of Convolutional Neural Networks using Integer Operations.
CoRR, 2018

Anatomy of high-performance deep learning convolutions on SIMD architectures.
Proceedings of the International Conference for High Performance Computing, 2018

Mixed Precision Training of Convolutional Neural Networks using Integer Operations.
Proceedings of the 6th International Conference on Learning Representations, 2018

2017
Accelerating Seismic Simulations Using the Intel Xeon Phi Knights Landing Processor.
Proceedings of the High Performance Computing - 32nd International Conference, 2017

EDGE: Extreme Scale Fused Seismic Simulations with the Discontinuous Galerkin Method.
Proceedings of the High Performance Computing - 32nd International Conference, 2017

2016
Optimizations in a high-performance conjugate gradient benchmark for IA-based multi- and many-core processors.
IJHPCA, 2016

Data mining on vast data sets as a cluster system benchmark.
Concurrency and Computation: Practice and Experience, 2016

Efficiency of High Order Spectral Element Methods on Petascale Architectures.
Proceedings of the High Performance Computing - 31st International Conference, 2016

High Order Seismic Simulations on the Intel Xeon Phi Processor (Knights Landing).
Proceedings of the High Performance Computing - 31st International Conference, 2016

LIBXSMM: accelerating small matrix multiplications by runtime code generation.
Proceedings of the International Conference for High Performance Computing, 2016

Petascale Local Time Stepping for the ADER-DG Finite Element Method.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

2015
Supercomputing for Molecular Dynamics Simulations - Handling Multi-Trillion Particles in Nanofluidics
Springer Briefs in Computer Science, Springer, ISBN: 978-3-319-17148-7, 2015

Beacon: Deployment and Application of Intel Xeon Phi Coprocessorsfor Scientific Computing.
Computing in Science and Engineering, 2015

Cache-oblivious matrix algorithms in the age of multicores and many cores.
Concurrency and Computation: Practice and Experience, 2015

High-Order ADER-DG Minimizes Energy- and Time-to-Solution of SeisSol.
Proceedings of the High Performance Computing - 30th International Conference, 2015

Full correlation matrix analysis of fMRI data on Intel® Xeon Phi™ coprocessors.
Proceedings of the International Conference for High Performance Computing, 2015

Exploring Shared-Memory Optimizations for an Unstructured Mesh CFD Application on Modern Parallel Systems.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

Optimized Force Calculation in Molecular Dynamics Simulations for the Intel Xeon Phi.
Proceedings of the Euro-Par 2015: Parallel Processing Workshops, 2015

2014
Boosting Scientific Computing Applications through Leveraging Data Parallel Architectures.
PhD thesis, 2014

ls1 mardyn: The massively parallel molecular dynamics code for large systems.
CoRR, 2014

Parallelizing a Black-Scholes solver based on finite elements and sparse grids.
Concurrency and Computation: Practice and Experience, 2014

Sustained Petascale Performance of Seismic Simulations with SeisSol on SuperMUC.
Proceedings of the Supercomputing - 29th International Conference, 2014

Efficient Shared-Memory Implementation of High-Performance Conjugate Gradient Benchmark and its Application to Unstructured Matrices.
Proceedings of the International Conference for High Performance Computing, 2014

Petascale High Order Dynamic Rupture Earthquake Simulations on Heterogeneous Supercomputers.
Proceedings of the International Conference for High Performance Computing, 2014

Improving Communication Performance and Scalability of Native Applications on Intel Xeon Phi Coprocessor Clusters.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

2013
Emerging Architectures Enable to Boost Massively Parallel Data Mining Using Adaptive Sparse Grids.
International Journal of Parallel Programming, 2013

591 TFLOPS Multi-trillion Particles Simulation on SuperMUC.
Proceedings of the Supercomputing - 28th International Supercomputing Conference, 2013

Many-core architectures boost the pricing of basket options on adaptive sparse grids.
Proceedings of WHPCF'13: 6th Workshop on High Performance Computational Finance, 2013

Accelerating SeisSol by Generating Vectorized Code for Sparse Matrix Operators.
Proceedings of the Parallel Computing: Accelerating Computational Science and Engineering (CSE), 2013

Design and Implementation of the Linpack Benchmark for Single and Multi-node Systems Based on Intel® Xeon Phi Coprocessor.
Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

Accelerators in scientific computing is it worth the effort?
Proceedings of the International Conference on High Performance Computing & Simulation, 2013

2012
Option pricing with a direct adaptive sparse grid approach.
J. Computational Applied Mathematics, 2012

A highly parallel Black-Scholes solver based on adaptive sparse grids.
Int. J. Comput. Math., 2012

From GPGPU to Many-Core: Nvidia Fermi and Intel Many Integrated Core Architecture.
Computing in Science and Engineering, 2012

Exploiting State-of-the-Art x86 Architectures in Scientific Computing.
Proceedings of the 11th International Symposium on Parallel and Distributed Computing, 2012

HPCS 2012 panels: Panel I: Energy efficient systems in next generation high performance data and compute centers.
Proceedings of the 2012 International Conference on High Performance Computing & Simulation, 2012

Sparse grid classifiers as base learners for AdaBoost.
Proceedings of the 2012 International Conference on High Performance Computing & Simulation, 2012

Solving High-Dimensional Problems on Processors with Integrated GPU.
Proceedings of the Facing the Multicore-Challenge, 2012

An efficient vectorization of linked-cell particle simulations.
Proceedings of the Computing Frontiers Conference, CF'12, 2012

2011
Making TifaMMy fit for tomorrow: Towards future shared memory systems and beyond.
Proceedings of the 2011 International Conference on High Performance Computing & Simulation, 2011

Towards High-Performance Implementations of a Custom HPC Kernel Using ® Array Building Blocks.
Proceedings of the Facing the Multicore - Challenge II, 2011

Extending a Highly Parallel Data Mining Algorithm to the Intel ® Many Integrated Core Architecture.
Proceedings of the Euro-Par 2011: Parallel Processing Workshops - CCPI, CGWS, HeteroPar, HiBB, HPCVirt, HPPC, HPSS, MDGS, ProPer, Resilience, UCHPC, VHPC, Bordeaux, France, August 29, 2011

Multi- and many-core data mining with adaptive sparse grids.
Proceedings of the 8th Conference on Computing Frontiers, 2011

2010
Parallelizing a Black-Scholes solver based on finite elements and sparse grids.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Porting existing cache-oblivious linear algebra HPC modules to larrabee architecture.
Proceedings of the 7th Conference on Computing Frontiers, 2010

2007
Hardware-Oriented Implementation of Cache Oblivious Matrix Operations Based on Space-Filling Curves.
Proceedings of the Parallel Processing and Applied Mathematics, 2007


  Loading...