Christoph W. Kessler

Orcid: 0000-0001-5241-0026

Affiliations:
  • Linköping University, Department for Computer and Information Science, Sweden
  • University of Trier, Germany (former)
  • Saarland University, Saarbrücken, Germany (former)


According to our database1, Christoph W. Kessler authored at least 151 papers between 1991 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
High-Level Programming of FPGA-Accelerated Systems with Parallel Patterns.
Int. J. Parallel Program., August, 2024

Packet-Type Aware Scheduling of Moldable Streaming Tasks on Multicore Systems with DVFS.
Proceedings of the 39th ACM/SIGAPP Symposium on Applied Computing, 2024

2023
Assessing Application Efficiency and Performance Portability in Single-Source Programming for Heterogeneous Parallel Systems.
Int. J. Parallel Program., February, 2023

Packing Multiple Types of Cores for Energy-Optimized Heterogeneous Hardware-Software Co-Design of Moldable Streaming Computations.
IEEE Access, 2023

2022
Integrating Energy-Optimizing Scheduling of Moldable Streaming Tasks with Design Space Exploration for Multiple Core Types on Configurable Platforms.
J. Signal Process. Syst., 2022

EXA2PRO: A Framework for High Development Productivity on Heterogeneous Computing Systems.
IEEE Trans. Parallel Distributed Syst., 2022

A Deterministic Portable Parallel Pseudo-Random Number Generator for Pattern-Based Programming of Heterogeneous Parallel Systems.
Int. J. Parallel Program., 2022

Analyzing Programming Effort Model Accuracy of High-Level Parallel Programs for Stream Processing.
Proceedings of the 48th Euromicro Conference on Software Engineering and Advanced Applications, 2022

2021
Crown-scheduling of sets of parallelizable tasks for robustness and energy-elasticity on many-core systems with discrete dynamic voltage and frequency scaling.
J. Syst. Archit., 2021

SkePU 3: Portable High-Level Programming of Heterogeneous Systems and HPC Clusters.
Int. J. Parallel Program., 2021

Temperature-Aware Energy-Optimal Scheduling of Moldable Streaming Tasks onto 2D-Mesh-Based Many-Core CPUs with DVFS.
Proceedings of the Job Scheduling Strategies for Parallel Processing, 2021

Combining Design Space Exploration with Task Scheduling of Moldable Streaming Tasks on Reconfigurable Platforms.
Proceedings of the Applied Reconfigurable Computing. Architectures, Tools, and Applications, 2021

2020
Hybrid CPU-GPU execution support in the skeleton programming framework SkePU.
J. Supercomput., 2020

Static Scheduling of Moldable Streaming Tasks With Task Fusion for Parallel Systems With DVFS.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2020

Programming languages for data-Intensive HPC applications: A systematic mapping study.
Parallel Comput., 2020

Leveraging access mode declarations in a model for memory consistency in heterogeneous systems.
J. Log. Algebraic Methods Program., 2020

Guest Editor's Note: High-Level Parallel Programming 2019.
Int. J. Parallel Program., 2020

Portable exploitation of parallel and heterogeneous HPC architectures in neural simulation using SkePU.
Proceedings of the SCOPES '20: 23rd International Workshop on Software and Compilers for Embedded Systems, 2020

Voltage Island-Aware Energy-Efficient Scheduling of Parallel Streaming Tasks on Many-Core CPUs.
Proceedings of the 28th Euromicro International Conference on Parallel, 2020

Maximizing Profit in Energy-Efficient Moldable Task Execution with Deadline.
Proceedings of the 28th Euromicro International Conference on Parallel, 2020

Robustness and Energy-elasticity of Crown Schedules for Sets of Parallelizable Tasks on Many-core Systems with DVFS.
Proceedings of the 28th Euromicro International Conference on Parallel, 2020

2019
Parallelization of Hierarchical Matrix Algorithms for Electromagnetic Scattering Problems.
Proceedings of the High-Performance Modelling and Simulation for Big Data Applications, 2019

Extending smart containers for data locality-aware skeleton programming.
Concurr. Comput. Pract. Exp., 2019

Global optimization of operand transfer fusion in heterogeneous computing.
Proceedings of the 22nd International Workshop on Software and Compilers for Embedded Systems, 2019

Multi-Variant User Functions for Platform-Aware Skeleton Programming.
Proceedings of the Parallel Computing: Technology Trends, 2019

Scheduling Moldable Parallel Streaming Tasks on Heterogeneous Platforms with Frequency Scaling.
Proceedings of the 27th European Signal Processing Conference, 2019

Adaptive Crown Scheduling for Streaming Tasks on Many-Core Systems with Discrete DVFS.
Proceedings of the Euro-Par 2019: Parallel Processing Workshops, 2019

Co-Optimizing Core Allocation, Mapping and DVFS in Streaming Programs with Moldable Tasks for Energy Efficient Execution on Manycore Architectures.
Proceedings of the 19th International Conference on Application of Concurrency to System Design, 2019

2018
MeterPU: a generic measurement abstraction API - Enabling energy-tuned skeleton backend selection.
J. Supercomput., 2018

SkePU 2: Flexible and Type-Safe Skeleton Programming for Heterogeneous Parallel Systems.
Int. J. Parallel Program., 2018

EXA2PRO programming environment: architecture and applications.
Proceedings of the 18th International Conference on Embedded Computer Systems: Architectures, 2018

Lazy Allocation and Transfer Fusion Optimization for GPU-Based Heterogeneous Systems.
Proceedings of the 26th Euromicro International Conference on Parallel, 2018

Ensuring Memory Consistency in Heterogeneous Systems Based on Access Mode Declarations.
Proceedings of the 2018 International Conference on High Performance Computing & Simulation, 2018

2017
Benchmarking OpenCL, OpenACC, OpenMP, and CUDA: Programming Productivity, Performance, and Energy Consumption.
Proceedings of the 2017 Workshop on Adaptive Resource Management and Scheduling for Cloud Computing, 2017

Asymmetric Crown Scheduling.
Proceedings of the 25th Euromicro International Conference on Parallel, 2017

VectorPU: A Generic and Efficient Data-container and Component Model for Transparent Data Transfer on GPU-based Heterogeneous Systems.
Proceedings of the 8th Workshop and 6th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures and Design Tools and Architectures for Multicore Embedded Computing Platforms, 2017

2016
Pruning strategies in adaptive off-line tuning for optimized composition of components on heterogeneous systems.
Parallel Comput., 2016

Smart Containers and Skeleton Programming for GPU-Based Systems.
Int. J. Parallel Program., 2016

Energy-Optimized Static Scheduling for Many-Cores with Task Parallelization, DVFS and Core Consolidation.
Proceedings of the 19th International Workshop on Software and Compilers for Embedded Systems, 2016

An Extensible Platform Description Language Supporting Retargetable Toolchains and Adaptive Execution.
Proceedings of the 19th International Workshop on Software and Compilers for Embedded Systems, 2016

Efficient Execution of SkePU Skeleton Programs on the Low-Power Multicore Processor Myriad2.
Proceedings of the 24th Euromicro International Conference on Parallel, 2016

2015
Performance-aware composition framework for GPU-based systems.
J. Supercomput., 2015

Fast Crown Scheduling Heuristics for Energy-Efficient Mapping and Scaling of Moldable Streaming Tasks on Many-Core Systems.
Proceedings of the 18th International Workshop on Software and Compilers for Embedded Systems, 2015

Portable Parallelization of the EDGE CFD Application for GPU-based Systems using the SkePU Skeleton Programming Library.
Proceedings of the Parallel Computing: On the Road to Exascale, 2015

Improving Energy-Efficiency of Static Schedules by Core Consolidation and Switching Off Unused Cores.
Proceedings of the Parallel Computing: On the Road to Exascale, 2015

Optimized variant-selection code generation for loops on heterogeneous multicore systems.
Proceedings of the Parallel Computing: On the Road to Exascale, 2015

Mimer and Schedeval: Tools for Comparing Static Schedulers for Streaming Applications on Manycore Architectures.
Proceedings of the 44th International Conference on Parallel Processing Workshops, 2015

XPDL: Extensible Platform Description Language to Support Energy Modeling and Optimization.
Proceedings of the 44th International Conference on Parallel Processing Workshops, 2015

2014
Fast Crown Scheduling Heuristics for Energy-Efficient Mapping and Scaling of Moldable Streaming Tasks on Manycore Systems.
ACM Trans. Archit. Code Optim., 2014

NUMA Computing with Hardware and Software Co-Support on Configurable Emulated Shared Memory Architectures.
Int. J. Netw. Comput., 2014

Optimized Composition: Generating Efficient Code for Heterogeneous Systems from Multi-Variant Components, Skeletons and Containers.
CoRR, 2014

The PEPPHER composition tool: performance-aware composition for GPU-based systems.
Computing, 2014

Global Optimization of Execution Mode Selection for the Reconfigurable PRAM-NUMA Multicore Architecture REPLICA.
Proceedings of the Second International Symposium on Computing and Networking, 2014

Optimized Selection of Runtime Mode for the Reconfigurable PRAM-NUMA Architecture REPLICA Using Machine-Learning.
Proceedings of the Euro-Par 2014: Parallel Processing Workshops, 2014

A Quantitative Comparison of PRAM based Emulated Shared Memory Architectures to Current Multicore CPUs and GPUs.
Proceedings of the ARCS 2014, 2014

2013
Compiling for VLIW DSPs.
Proceedings of the Handbook of Signal Processing Systems, 2013

Extensible Recognition of Algorithmic Patterns in DSP Programs for Automatic Parallelization.
Int. J. Parallel Program., 2013

Crown scheduling: Energy-efficient resource allocation, mapping and discrete frequency scaling for collections of malleable streaming tasks.
Proceedings of the 2013 23rd International Workshop on Power and Timing Modeling, 2013

Hardware and Software Support for NUMA Computing on Configurable Emulated Shared Memory Architectures.
Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013

A Framework for Performance-Aware Composition of Applications for GPU-Based Systems.
Proceedings of the 42nd International Conference on Parallel Processing, 2013

Adaptive Implementation Selection in the SkePU Skeleton Programming Library.
Proceedings of the Advanced Parallel Processing Technologies, 2013

2012
Integrated Code Generation for Loops.
ACM Trans. Embed. Comput. Syst., 2012

Engineering Parallel Sorting for the Intel SCC.
Proceedings of the International Conference on Computational Science, 2012

Executing PRAM Programs on GPUs.
Proceedings of the International Conference on Computational Science, 2012

Optimized On-Chip-Pipelining for Memory-Intensive Computations on Multi-Core Processors with Explicit Memory Hierarchy.
J. Univers. Comput. Sci., 2012

Optimized composition of performance-aware parallel components.
Concurr. Comput. Pract. Exp., 2012

Adaptive Off-Line Tuning for Optimized Composition of Components for Heterogeneous Many-Core Systems.
Proceedings of the High Performance Computing for Computational Science, 2012

Poster: Leveraging PEPPHER Technology for Performance Portable Supercomputing.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012

Abstract: Leveraging PEPPHER Technology for Performance Portable Supercomputing.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012

The PEPPHER Composition Tool: Performance-Aware Dynamic Composition of Applications for GPU-Based Systems.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012

Modelling Power Consumption of the Intel SCC.
Proceedings of the 6th Many-core Applications Research Community (MARC) Symposium. Proceedings of the 6th MARC Symposium, 2012

Design of the Language Replica for Hybrid PRAM-NUMA Many-core Architectures.
Proceedings of the 10th IEEE International Symposium on Parallel and Distributed Processing with Applications, 2012

Programmability and performance portability aspects of heterogeneous multi-/manycore systems.
Proceedings of the 2012 Design, Automation & Test in Europe Conference & Exhibition, 2012

Flexible Scheduling and Thread Allocation for Synchronous Parallel Tasks.
Proceedings of the ARCS 2012 Workshops, 28. Februar - 2. März 2012, München, Germany, 2012

Programming the Cell Processor.
Fundamentals of Multicore Software Development, 2012

2011
PEPPHER: Efficient and Productive Usage of Hybrid Computing Systems.
IEEE Micro, 2011

Programmiertechniken für den Cell-Prozessor (Programming Techniques for the Cell Processor).
it Inf. Technol., 2011

Comparing Machine Learning Approaches for Context-Aware Composition.
Proceedings of the Software Composition - 10th International Conference, 2011

Balancing CPU Load for Irregular MPI Applications.
Proceedings of the Applications, Tools and Techniques on the Road to Exascale Computing, Proceedings of the conference ParCo 2011, 31 August, 2011

Flexible Runtime Support for Efficient Skeleton Programming on Heterogeneous GPU-based Systems.
Proceedings of the Applications, Tools and Techniques on the Road to Exascale Computing, Proceedings of the conference ParCo 2011, 31 August, 2011

The PEPPHER Approach to Programmability and Performance Portability for Heterogeneous many-core Architectures.
Proceedings of the Applications, Tools and Techniques on the Road to Exascale Computing, Proceedings of the conference ParCo 2011, 31 August, 2011

Investigation of main memory bandwidth on Intel Single-Chip Cloud Computer.
Proceedings of the 3rd Many-core Applications Research Community (MARC) Symposium. Proceedings of the 3rd MARC Symposium, 2011

Auto-tuning SkePU: a multi-backend skeleton programming framework for multi-GPU systems.
Proceedings of the 4th International Workshop on Multicore Software Engineering, 2011

Case Study of Efficient Parallel Memory Access Programming for the Embedded Heterogeneous Multicore DSP Architecture ePUMA.
Proceedings of the International Conference on Complex, 2011

2010
Theory and Algorithms for Parallel Computation.
Proceedings of the Euro-Par 2010 - Parallel Processing, 16th International Euro-Par Conference, Ischia, Italy, August 31, 2010

Optimized On-Chip-Pipelined Mergesort on the Cell/B.E.
Proceedings of the Euro-Par 2010 - Parallel Processing, 16th International Euro-Par Conference, Ischia, Italy, August 31, 2010

Program Composition and Optimization: An Introduction.
Proceedings of the Program Composition and Optimization: Autotuning, Scheduling, Metaprogramming and Beyond, 09.05., 2010

10191 Executive Summary - Program Composition and Optimization : Autotuning, Scheduling, Metaprogramming and Beyond.
Proceedings of the Program Composition and Optimization: Autotuning, Scheduling, Metaprogramming and Beyond, 09.05., 2010

10191 Abstracts Collection - Program Composition and Optimization : Autotuning, Scheduling, Metaprogramming and Beyond.
Proceedings of the Program Composition and Optimization: Autotuning, Scheduling, Metaprogramming and Beyond, 09.05., 2010

Platform-independent modeling of explicitly parallel programs.
Proceedings of the ARCS '10, 2010

Compiling for VLIW DSPs.
Proceedings of the Handbook of Signal Processing Systems, 2010

2009
Message from the PDSEC-09 workshop chairs.
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

Integrated Modulo Scheduling for Clustered VLIW Architectures.
Proceedings of the High Performance Embedded Architectures and Compilers, 2009

2008
Automatic parallelization of simulation code for equation-based models with software pipelining and measurements on three platforms.
SIGARCH Comput. Archit. News, 2008

Optimized on-chip pipelining of memory-intensive computations on the cell BE.
SIGARCH Comput. Archit. News, 2008

Profile-Guided Composition.
Proceedings of the Software Composition - 7th International Symposium, 2008

Optimal vs. heuristic integrated code generation for clustered VLIW architectures.
Proceedings of the 11th International Workshop on Software and Compilers for Embedded Systems, 2008

BlockLib: a skeleton library for cell broadband engine.
Proceedings of the 1st International Workshop on Multicore Software Engineering, 2008

Optimized Pipelined Parallel Merge Sort on the Cell BE.
Proceedings of the Euro-Par 2008 Workshops, 2008

Hybrid Parallel Sort on the Cell Processor.
Proceedings of the 9th Workshop on Parallel Systems and Algorithms (PASA) held at the 21st Conference on the Architecture of Computing Systems (ARCS), 2008

2007
Classification and generation of schedules for VLIW processors.
Concurr. Comput. Pract. Exp., 2007

A Survey of Reasoning in Parallelization.
Proceedings of the 8th ACIS International Conference on Software Engineering, 2007

A Framework for Performance-Aware Composition of Explicitly Parallel Components.
Proceedings of the Parallel Computing: Architectures, 2007

A Formal Framework for Automated Round-Trip Software Engineering in Static Aspect Weaving and Transformations.
Proceedings of the 29th International Conference on Software Engineering (ICSE 2007), 2007

2006
Optimal integrated code generation for VLIW architectures.
Concurr. Comput. Pract. Exp., 2006

NestStepModelica - Mathematical Modeling and Bulk-Synchronous Parallel Simulation.
Proceedings of the Applied Parallel Computing. State of the Art in Scientific Computing, 2006

Automated Round-trip Software Engineering in Aspect Weaving Systems.
Proceedings of the 21st IEEE/ACM International Conference on Automated Software Engineering (ASE 2006), 2006

Crosscutting Concerns in Parallelization by Invasive Software Composition and Aspect Weaving.
Proceedings of the 39th Hawaii International International Conference on Systems Science (HICSS-39 2006), 2006

Optimal Integrated VLIW Code Generation with Integer Linear Programming.
Proceedings of the Euro-Par 2006, Parallel Processing, 12th International Euro-Par Conference, Dresden, Germany, August 28, 2006

Load balancing of irregular parallel divide-and-conquer algorithms in group-SPMD programming environments.
Proceedings of the ARCS 2006, 2006

2005
05101 Abstracts Collection - Scheduling for Parallel Architectures: Theory, Applications, Challenges.
Proceedings of the Scheduling for Parallel Architectures: Theory, Applications, Challenges, 2005

05101 Executive Summary - Scheduling for Parallel Architectures: Theory, Applications, Challenges.
Proceedings of the Scheduling for Parallel Architectures: Theory, Applications, Challenges, 2005

Parallelisation of Sequential Programs by Invasive Composition and Aspect Weaving.
Proceedings of the Advanced Parallel Processing Technologies, 6th International Workshop, 2005

2004
Managing distributed shared arrays in a bulk-synchronous parallel programming environment.
Concurr. Comput. Pract. Exp., 2004

A practical access to the theory of parallel algorithms.
Proceedings of the 35th SIGCSE Technical Symposium on Computer Science Education, 2004

Towards a Bulk-Synchronous Distributed Shared Memory Programming Environment for Grids.
Proceedings of the Applied Parallel Computing, 2004

Topic 10: Parallel Programming: Models, Methods and Programming Languages.
Proceedings of the Euro-Par 2004 Parallel Processing, 2004

Exploiting Symmetries for Optimal Integrated Code Generation.
Proceedings of the International Conference on Embedded Systems and Applications, 2004

2002
Optimal integrated code generation for clustered VLIW architectures.
Proceedings of the 2002 Joint Conference on Languages, 2002

Mid-term course evaluations with muddy cards.
Proceedings of the 7th Annual SIGCSE Conference on Innovation and Technology in Computer Science Education, 2002

A dialog between authors and teachers.
Proceedings of the 7th Annual SIGCSE Conference on Innovation and Technology in Computer Science Education, 2002

2001
A Dynamic Programming Approach to Optimal Integrated Code Generation.
Proceedings of the 2001 ACM SIGPLAN Workshop on Optimization of Middleware and Distributed Systems, 2001

Practical PRAM programming.
Wiley series on parallel and distributed computing, Wiley, 2001

2000
NestStep: Nested Parallelism and Virtual Shared Memory for the BSP Model.
J. Supercomput., 2000

Two program comprehension tools for automatic parallelization.
IEEE Concurr., 2000

1999
The SPARAMAT Approach to Automatic Comprehension of Sparse Matrix Computations
Universität Trier, Mathematik/Informatik, Forschungsbericht, 1999

Language and library support for practical PRAM programming.
Parallel Comput., 1999

NestStep: Nested Parallelism and Virtual Memory for the BSP Model.
Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 1999

The SPARAMAT Approach to Automatic Comprehension of Sparse Matrix Computations.
Proceedings of the 7th International Workshop on Program Comprehension (IWPC '99), May 5-7, 1999, 1999

1998
ForkLight: A Control-Synchronous Parallel Programming Language
Universität Trier, Mathematik/Informatik, Forschungsbericht, 1998

Scheduling Expression DAGs for Minimal Register Need.
Comput. Lang., 1998

1997
Two Program Comprehension Tools for Automatic Parallelization: A Comparative Study
Universität Trier, Mathematik/Informatik, Forschungsbericht, 1997

Practical PRAM Programming with Fork95 - A Tutorial
Universität Trier, Mathematik/Informatik, Forschungsbericht, 1997

The Fork95 parallel programming language: Design, implementation, application.
Int. J. Parallel Program., 1997

Applicability of Program Comprehension to Sparse Matrix Computations.
Proceedings of the Euro-Par '97 Parallel Processing, 1997

1996
Pattern-Driven Automatic Parallelization.
Sci. Program., 1996

A Library of Basic PRAM Algorithms and its Implementation in FORK.
Proceedings of the 8th Annual ACM Symposium on Parallel Algorithms and Architectures, 1996

Program comprehension engines for automatic parallelization: a comparative study.
Proceedings of the Software Engineering for Parallel and Distributed Systems, 1996

Parallel Fourier-Motzkin Elimination.
Proceedings of the Euro-Par '96 Parallel Processing, 1996

1995
Language Support for Synchronous Parallel Critical Sections
Universität Trier, Mathematik/Informatik, Forschungsbericht, 1995

Integrating Synchronous and Asynchronous Paradigms: The Fork95 Parallel Programming Language
Universität Trier, Mathematik/Informatik, Forschungsbericht, 1995

Generating Optimal Contiguous Evaluations for Expression DAGs.
Comput. Lang., 1995

Pattern-driven automatic program transformation and parallelization.
Proceedings of the 3rd Euromicro Workshop on Parallel and Distributed Processing (PDP '95), 1995

Optimal Continguous Expression DAG Evaluations.
Proceedings of the Fundamentals of Computation Theory, 10th International Symposium, 1995

1994
Integrating Scalable Parallel Libraries and Automatically Parallelizing Compilers
Universität Trier, Mathematik/Informatik, Forschungsbericht, 1994

Automatische Parallelisierung numerischer Programme durch Mustererkennung.
PhD thesis, 1994

Knowledge-Based Automatic Parallelization by Pattern Recognition.
Proceedings of the Automatic Parallelization: New Approaches to Code Generation, 1994

1993
Efficient Register Allocation for Large Basic Blocks.
Proceedings of the Programming Language Implementation and Logic Programming, 1993

Automatic Parallelization by Pattern-Matching.
Proceedings of the Parallel Computation, 1993

1991
A Randomized Heuristic Approach to Register Allocation
Proceedings of the Programming Language Implementation and Logic Programming, 1991

Scheduling Vector Straight Line Code on Vector Processors.
Proceedings of the Code Generation, 1991


  Loading...