Ben H. H. Juurlink

According to our database1, Ben H. H. Juurlink
  • authored at least 129 papers between 1993 and 2018.
  • has a "Dijkstra number"2 of three.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Other 

Links

Homepage:

On csauthors.net:

Bibliography

2018
Highly parallel HEVC decoding for heterogeneous systems with CPU and GPU.
Sig. Proc.: Image Comm., 2018

Local memory-aware kernel perforation.
Proceedings of the 2018 International Symposium on Code Generation and Optimization, 2018

2017
GPU Parallelization of HEVC In-Loop Filters.
International Journal of Parallel Programming, 2017

Application-Specific Cache and Prefetching for HEVC CABAC Decoding.
IEEE MultiMedia, 2017

The LPGPU2 Project: Low-Power Parallel Computing on GPUs: Extended Abstract.
Proceedings of the 20th International Workshop on Software and Compilers for Embedded Systems, 2017

Stencil Autotuning with Ordinal Regression: Extended Abstract.
Proceedings of the 20th International Workshop on Software and Compilers for Embedded Systems, 2017

E^2MC: Entropy Encoding Based Memory Compression for GPUs.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

Autotuning Stencil Computations with Structural Ordinal Regression Learning.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

Syntax Element Partitioning for high-throughput HEVC CABAC decoding.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

A Methodology for Predicting Application-Specific Achievable Memory Bandwidth for HW/SW-Codesign.
Proceedings of the Euromicro Conference on Digital System Design, 2017

Enabling GPU software developers to optimize their applications - The LPGPU2 approach.
Proceedings of the 2017 Conference on Design and Architectures for Signal and Image Processing, 2017

Static optimization in PHP 7.
Proceedings of the 26th International Conference on Compiler Construction, 2017

A Quantitative Analysis of the Memory Architecture of FPGA-SoCs.
Proceedings of the Applied Reconfigurable Computing - 13th International Symposium, 2017

2016
An evaluation of current SIMD programming models for C++.
Proceedings of the 3rd Workshop on Programming Models for SIMD/Vector Processing, 2016

Efficient HEVC decoder for heterogeneous CPU with GPU systems.
Proceedings of the 18th IEEE International Workshop on Multimedia Signal Processing, 2016

ALUPower: Data Dependent Power Consumption in GPUs.
Proceedings of the 24th IEEE International Symposium on Modeling, 2016

FPGA based hardware accelerator for KAZE feature extraction algorithm.
Proceedings of the 2016 International Conference on Field-Programmable Technology, 2016

The neuro vector engine: Flexibility to improve convolutional net efficiency for wearable vision.
Proceedings of the 2016 Design, Automation & Test in Europe Conference & Exhibition, 2016

2015
Parallel H.264/AVC Motion Compensation for GPUs Using OpenCL.
IEEE Trans. Circuits Syst. Video Techn., 2015

SIMD Acceleration for HEVC Decoding.
IEEE Trans. Circuits Syst. Video Techn., 2015

Spatiotemporal SIMT and Scalarization for Improving GPU Efficiency.
TACO, 2015

Reducing HEVC encoding complexity using two-stage motion estimation.
Proceedings of the 2015 Visual Communications and Image Processing, 2015

On latency in GPU throughput microarchitectures.
Proceedings of the 2015 IEEE International Symposium on Performance Analysis of Systems and Software, 2015

Optimizing HEVC CABAC Decoding with a Context Model Cache and Application-Specific Prefetching.
Proceedings of the 2015 IEEE International Symposium on Multimedia, 2015

Nexus#: A Distributed Hardware Task Manager for Task-Based Programming Models.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

An Efficient and Flexible FPGA Implementation of a Face Detection System (Abstract Only).
Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2015

High Performance Memory Accesses on FPGA-SoCs: A Quantitative Analysis.
Proceedings of the 23rd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2015

Multi/many-core programming: where are we standing?
Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, 2015

An Efficient and Flexible FPGA Implementation of a Face Detection System.
Proceedings of the Applied Reconfigurable Computing - 11th International Symposium, 2015

2014
Low-Power High-Efficiency Video Decoding using General-Purpose Processors.
TACO, 2014

TACO: A scheduling scheme for parallel applications on multicore architectures.
Scientific Programming, 2014

GPGPU workload characteristics and performance analysis.
Proceedings of the XIVth International Conference on Embedded Computer Systems: Architectures, 2014

An Integrated Hardware-Software Approach to Task Graph Management.
Proceedings of the 2014 IEEE International Conference on High Performance Computing and Communications, 2014

A generic implementation of a quantified predictor on FPGAs.
Proceedings of the Great Lakes Symposium on VLSI 2014, GLSVLSI '14, Houston, TX, USA - May 21, 2014

2013
Parallel HEVC Decoding on Multi- and Many-core Architectures - A Power and Performance Analysis.
Signal Processing Systems, 2013

How a single chip causes massive power bills GPUSimPow: A GPGPU power simulator.
Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software, 2013

2012
Scalable Parallel Programming Applied to H.264/AVC Decoding.
Springer Briefs in Computer Science, Springer, ISBN: 978-1-4614-2230-3, 2012

Parallel Scalability and Efficiency of HEVC Parallelization Approaches.
IEEE Trans. Circuits Syst. Video Techn., 2012

Amdahl's law for predicting the future of multicores considered harmful.
SIGARCH Computer Architecture News, 2012

Using OpenMP superscalar for parallelization of embedded and consumer applications.
Proceedings of the 2012 International Conference on Embedded Computer Systems: Architectures, 2012

Programming parallel embedded and consumer applications in OpenMP superscalar.
Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2012

SynZEN: A hybrid TTA/VLIW architecture with a distributed register file.
Proceedings of the NORCHIP 2012, Copenhagen, Denmark, November 12-13, 2012, 2012

Hardware-Based Task Dependency Resolution for the StarSs Programming Model.
Proceedings of the 41st International Conference on Parallel Processing Workshops, 2012

Improving the parallelization efficiency of HEVC decoding.
Proceedings of the 19th IEEE International Conference on Image Processing, 2012

Parallel video decoding in the emerging HEVC standard.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

An Optimized Parallel IDCT on Graphics Processing Units.
Proceedings of the Euro-Par 2012: Parallel Processing Workshops, 2012

A Predictor-Based Power-Saving Policy for DRAM Memories.
Proceedings of the 15th Euromicro Conference on Digital System Design, 2012

2011
A Highly Scalable Parallel Implementation of H.264.
Trans. HiPEAC, 2011

Multi-Core - the Future of Embedded Systems.
Proceedings of the Methoden und Beschreibungssprachen zur Modellierung und Verifikation von Schaltungen und Systemen (MBMV), 2011

Poster: implications of merging phases on scalability of multi-core architectures.
Proceedings of the 25th International Conference on Supercomputing, 2011, Tucson, AZ, USA, May 31, 2011

A QHD-capable parallel H.264 decoder.
Proceedings of the 25th International Conference on Supercomputing, 2011, Tucson, AZ, USA, May 31, 2011

Implications of Merging Phases on Scalability of Multi-core Architectures.
Proceedings of the International Conference on Parallel Processing, 2011

Nexus: Hardware Support for Task-Based Programming.
Proceedings of the 14th Euromicro Conference on Digital System Design, 2011

Composable local memory organisation for streaming applications on embedded MPSoCs.
Proceedings of the 8th Conference on Computing Frontiers, 2011

An Instruction to Accelerate Software Caches.
Proceedings of the Architecture of Computing Systems - ARCS 2011, 2011

2010
The SARC Architecture.
IEEE Micro, 2010

A Multidimensional Software Cache for Scratchpad-Based Systems.
IJERTCS, 2010

Evaluation of parallel H.264 decoding strategies for the Cell Broadband Engine.
Proceedings of the 24th International Conference on Supercomputing, 2010

Extending the Cell SPE with Energy Efficient Branch Prediction.
Proceedings of the Euro-Par 2010 - Parallel Processing, 16th International Euro-Par Conference, Ischia, Italy, August 31, 2010

A Case for Hardware Task Management Support for the StarSS Programming Model.
Proceedings of the 13th Euromicro Conference on Digital System Design, 2010

Instruction precomputation with memoization for fault detection.
Proceedings of the Design, Automation and Test in Europe, 2010

Protective redundancy overhead reduction using instruction vulnerability factor.
Proceedings of the 7th Conference on Computing Frontiers, 2010

2009
Parallel Scalability of Video Decoders.
Signal Processing Systems, 2009

Leakage-Aware Multiprocessor Scheduling.
Signal Processing Systems, 2009

Instruction-Level Fault Tolerance Configurability.
Signal Processing Systems, 2009

Scalability of Macroblock-level Parallelism for H.264 Decoding.
Proceedings of the 15th IEEE International Conference on Parallel and Distributed Systems, 2009

Intra-vector SIMD instructions for core specialization.
Proceedings of the 27th International Conference on Computer Design, 2009

Parallel H.264 Decoding on an Embedded Multicore Processor.
Proceedings of the High Performance Embedded Architectures and Compilers, 2009

Introduction.
Proceedings of the Euro-Par 2009 Parallel Processing, 2009

SIMD Architectural Enhancements to Improve the Performance of the 2D Discrete Wavelet Transform.
Proceedings of the 12th Euromicro Conference on Digital System Design, 2009

Instruction Precomputation for Fault Detection.
Proceedings of the 12th Euromicro Conference on Digital System Design, 2009

Limiting the number of dirty cache lines.
Proceedings of the Design, Automation and Test in Europe, 2009

Specialization of the Cell SPE for Media Applications.
Proceedings of the 20th IEEE International Conference on Application-Specific Systems, 2009

Scalar Processing Overhead on SIMD-Only Architectures.
Proceedings of the 20th IEEE International Conference on Application-Specific Systems, 2009

Performance Improvement of Multimedia Kernels by Alleviating Overhead Instructions on SIMD Devices.
Proceedings of the Advanced Parallel Processing Technologies, 8th International Symposium, 2009

2008
Implementing the 2-D Wavelet Transform on SIMD-Enhanced General-Purpose Processors.
IEEE Trans. Multimedia, 2008

Versatility of extended subwords and the matrix register file.
TACO, 2008

GRAAL: A Framework for Low-Power 3D Graphics Accelerators.
IEEE Computer Graphics and Applications, 2008

Optimization of Content-Based Image Retrieval Functions.
Proceedings of the Tenth IEEE International Symposium on Multimedia (ISM2008), 2008

Analysis of video filtering on the cell processor.
Proceedings of the International Symposium on Circuits and Systems (ISCAS 2008), 2008

(When) Will CMPs Hit the Power Wall?.
Proceedings of the Euro-Par 2008 Workshops, 2008

Analyzing Scalability of Deblocking Filter of H.264 via TLP Exploitation in a New Many-Core Architecture.
Proceedings of the 11th Euromicro Conference on Digital System Design: Architectures, 2008

A Low-Cost Cache Coherence Verification Method for Snooping Systems.
Proceedings of the 11th Euromicro Conference on Digital System Design: Architectures, 2008

Memory copies in multi-level memory systems.
Proceedings of the 19th IEEE International Conference on Application-Specific Systems, 2008

2007
Trade-Offs Between Voltage Scaling and Processor Shutdown for Low-Energy Embedded Multiprocessors.
Proceedings of the Embedded Computer Systems: Architectures, 2007

Instruction-Level Fault Tolerance Configurability.
Proceedings of the 2007 International Conference on Embedded Computer Systems: Architectures, 2007

Optimizing Cache Performance of the Discrete Wavelet Transform Using a Visualization Tool.
Proceedings of the Ninth IEEE International Symposium on Multimedia, 2007

SIMD Vectorization of Histogram Functions.
Proceedings of the IEEE International Conference on Application-Specific Systems, 2007

2006
Avoiding Conversion and Rearrangement Overhead in SIMD Architectures.
International Journal of Parallel Programming, 2006

Accelerating Color Space Conversion Using Extended Subwords and the Matrix Register File.
Proceedings of the Eigth IEEE International Symposium on Multimedia (ISM 2006), 2006

Leakage-aware multiprocessor scheduling for low power.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

Improving the memory behavior of vertical filtering in the discrete wavelet transform.
Proceedings of the Third Conference on Computing Frontiers, 2006

Limitations of special-purpose instructions for similarity measurements in media SIMD extensions.
Proceedings of the 2006 International Conference on Compilers, 2006

2005
The CSI multimedia architecture.
IEEE Trans. VLSI Syst., 2005

Avoiding data conversions in embedded media processors.
Proceedings of the 2005 ACM Symposium on Applied Computing (SAC), 2005

Implementing Hardware Multithreading in a VLIW Architecture.
Proceedings of the International Conference on Parallel and Distributed Computing Systems, 2005

Matrix register file and extended subwords: two techniques for embedded media processors.
Proceedings of the Second Conference on Computing Frontiers, 2005

Performance Comparison of SIMD Implementations of the Discrete Wavelet Transform.
Proceedings of the 16th IEEE International Conference on Application-Specific Systems, 2005

2004
Memory Bandwidth Requirements of Tile-Based Rendering.
Proceedings of the Computer Systems: Architectures, 2004

GraalBench: a 3D graphics benchmark suite for mobile phones.
Proceedings of the 2004 ACM SIGPLAN/SIGBED Conference on Languages, 2004

Sparse Matrix Transpose Unit.
Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), 2004

Scene Management Models and Overlap Tests for Tile-Based Rendering.
Proceedings of the 2004 Euromicro Symposium on Digital Systems Design (DSD 2004), Architectures, Methods and Tools, 31 August, 2004

Reducing traffic generated by conflict misses in caches.
Proceedings of the First Conference on Computing Frontiers, 2004

Dynamic techniques to reduce memory traffic in embedded systems.
Proceedings of the First Conference on Computing Frontiers, 2004

Approximating the optimal replacement algorithm.
Proceedings of the First Conference on Computing Frontiers, 2004

Accelerating the secure remote password protocol using reconfigurable hardware.
Proceedings of the First Conference on Computing Frontiers, 2004

2003
The Paderborn University BSP (PUB) library.
Parallel Computing, 2003

Implementation of a streaming execution unit.
Journal of Systems Architecture, 2003

Optimal broadcast on parallel locality models.
J. Discrete Algorithms, 2003

Unified Dual Data Caches.
Proceedings of the 2003 Euromicro Symposium on Digital Systems Design (DSD 2003), 2003

2002
Architectural Support for 3D Graphics in the Complex Streamed Instruction Set.
Proceedings of the International Conference on Parallel and Distributed Computing Systems, 2002

Performance Scalability of Multimedia Instruction Set Extensions.
Proceedings of the Euro-Par 2002, 2002

Implementation of a Streaming Execution Unit.
Proceedings of the 2002 Euromicro Symposium on Digital Systems Design (DSD 2002), 2002

2001
Performance of the Complex Streamed Instruction Set on Image Processing Kernels.
Proceedings of the Euro-Par 2001: Parallel Processing, 2001

Implementation and Evaluation of the Complex Streamed Instruction Set.
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques (PACT 2001), 2001

2000
Optimal broadcast on parallel locality models.
Proceedings of the SIROCCO 7, 2000

Complex Streamed Instructions: Introduction and Initial Evaluatio.
Proceedings of the 26th EUROMICRO 2000 Conference, 2000

Counter Based Superscalar Instruction Issuing.
Proceedings of the 26th EUROMICRO 2000 Conference, 2000

1999
The Paderborn University BSP (PUB) Library - Design, Implementation and Performance.
Proceedings of the 13th International Parallel Processing Symposium / 10th Symposium on Parallel and Distributed Processing (IPPS / SPDP '99), 1999

1998
Gossiping on Meshes and Tori.
IEEE Trans. Parallel Distrib. Syst., 1998

A Quantitative Comparison of Parallel Computation Models.
ACM Trans. Comput. Syst., 1998

Communication-Optimal Parallel Minimum Spanning Tree Algorithms (Extended Abstract).
SPAA, 1998

Experimental Validation of Parallel Computation Models on the Intel Paragon.
IPPS/SPDP, 1998

1996
Communication Primitives for BSP Computers.
Inf. Process. Lett., 1996

A Quantitative Comparison of Parallel Computation Models.
SPAA, 1996

The E-BSP Model: Incorporating General Locality and Unbalanced Communication into the BSP Model.
Proceedings of the Euro-Par '96 Parallel Processing, 1996

Worm-Hole Gossiping on Meshes.
Proceedings of the Euro-Par '96 Parallel Processing, 1996

1994
The Parallel Hierarchical Memory Model.
Proceedings of the Algorithm Theory, 1994

1993
Experiences with a Model for Parallel Computation.
Proceedings of the Twelth Annual ACM Symposium on Principles of Distributed Computing, 1993


  Loading...