H. Peter Hofstee

  • TU Delft, The Netherlands
  • IBM Research Austin, TX, USA

According to our database1, H. Peter Hofstee authored at least 71 papers between 1990 and 2022.

Collaborative distances:
  • Dijkstra number2 of two.
  • Erdős number3 of three.



In proceedings 
PhD thesis 


Online presence:

On csauthors.net:


Benchmarking Apache Arrow Flight - A wire-speed protocol for data transfer, querying and microservices.
CoRR, 2022

Generating High-Performance FPGA Accelerator Designs for Big Data Analytics with Fletcher and Apache Arrow.
J. Signal Process. Syst., 2021

Low Latency and High Throughput Write-Ahead Logging Using CAPI-Flash.
IEEE Trans. Cloud Comput., 2021

AutoReCon: Neural Architecture Search-based Reconstruction for Data-free Compression.
Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, 2021

An Attention Module for Convolutional Neural Networks.
Proceedings of the Artificial Neural Networks and Machine Learning - ICANN 2021, 2021

An Efficient High-Throughput LZ77-Based Decompressor in Reconfigurable Logic.
J. Signal Process. Syst., 2020

In-memory database acceleration on FPGAs: a survey.
VLDB J., 2020

Tydi: An Open Specification for Complex Data Structures Over Hardware Streams.
IEEE Micro, 2020

SoFAr: Shortcut-based Fractal Architectures for Binary Convolutional Neural Networks.
CoRR, 2020

Optimizing performance of GATK workflows using Apache Arrow In-Memory data framework.
BMC Genom., 2020

REAF: Reducing Approximation of Channels by Reducing Feature Reuse Within Convolution.
IEEE Access, 2020

ThymesisFlow: A Software-Defined, HW/SW co-Designed Interconnect Stack for Rack-Scale Memory Disaggregation.
Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020

NASB: Neural Architecture Search for Binary Convolutional Neural Networks.
Proceedings of the 2020 International Joint Conference on Neural Networks, 2020

Battling the CPU Bottleneck in Apache Parquet to Arrow Conversion Using FPGA.
Proceedings of the International Conference on Field-Programmable Technology, 2020

Video-Text Compliance: Activity Verification Based on Natural Language Instructions.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshops, 2019

Fletcher: A Framework to Efficiently Integrate FPGA Accelerators with Apache Arrow.
Proceedings of the 29th International Conference on Field Programmable Logic and Applications, 2019

A Fine-Grained Parallel Snappy Decompressor for FPGAs Using a Relaxed Execution Model.
Proceedings of the 27th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2019

Refine and Recycle: A Method to Increase Decompression Parallelism.
Proceedings of the 30th IEEE International Conference on Application-specific Systems, 2019

Supporting Columnar In-memory Formats on FPGA: The Hardware Design of Fletcher for Apache Arrow.
Proceedings of the Applied Reconfigurable Computing - 15th International Symposium, 2019

A hardware compilation framework for text analytics queries.
J. Parallel Distributed Comput., 2018

A 64-GB Sort at 28 GB/s on a 4-GPU POWER9 Node for Uniformly-Distributed 16-Byte Records with 8-Byte Keys.
Proceedings of the High Performance Computing, 2018

A high-bandwidth snappy decompressor in reconfigurable logic: work-in-progress.
Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis, 2018

CAPI-Flash Accelerated Persistent Read Cache for Apache Cassandra.
Proceedings of the 11th IEEE International Conference on Cloud Computing, 2018

ExtraV: Boosting Graph Processing Near Storage with a Coherent Accelerator.
Proc. VLDB Endow., 2017

Analyzing In-Memory Hash Join: Granularity Matters.
Proceedings of the International Workshop on Accelerating Analytics and Data Management Systems Using Modern Processor and Storage Architectures, 2017

SparkGA: A Spark Framework for Cost Effective, Fast and Accurate DNA Analysis at Scale.
Proceedings of the 8th ACM International Conference on Bioinformatics, 2017

PATer: A Hardware Prefetching Automatic Tuner on IBM POWER8 Processor.
IEEE Comput. Archit. Lett., 2016

RAW 2016 Keynotes.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

Auto-tuning Spark Big Data Workloads on POWER8: Prediction-Based Dynamic SMT Threading.
Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

Optimized Durable Commitlog for Apache Cassandra Using CAPI-Flash.
Proceedings of the 9th IEEE International Conference on Cloud Computing, 2016

Feature detection for image analytics via FPGA acceleration.
IBM J. Res. Dev., 2015

Second-Generation Big Data Systems.
Computer, 2015

Giving Text Analytics a Boost.
IEEE Micro, 2014

Hardware-accelerated text analytics.
Proceedings of the 2014 IEEE Hot Chips 26 Symposium (HCS), 2014

True hardware random number generation implemented in the 32-nm SOI POWER7+ processor.
IBM J. Res. Dev., 2013

Understanding system design for Big Data workloads.
IBM J. Res. Dev., 2013

Big Data text-oriented benchmark creation for Hadoop.
IBM J. Res. Dev., 2013

Cell Broadband Engine Processor.
Proceedings of the Encyclopedia of Parallel Computing, 2011

Heterogeneous Multi-core Processors: The Cell Broadband Engine.
Proceedings of the Multicore Processors and Systems, 2009

The Next 25 Years of Computer Architecture?
Proceedings of the Euro-Par 2009, 2009

HPPC 2009 Panel: Are Many-Core Computer Vendors on Track?
Proceedings of the Euro-Par 2009, 2009

Rome Reborn.
Proceedings of the International Conference on Computer Graphics and Interactive Techniques, 2008

Cell Broadband Engine processor vault security architecture.
IBM J. Res. Dev., 2007

IBM J. Res. Dev., 2007

Microarchitecture and implementation of the synergistic processor in 65-nm and 90-nm SOI.
IBM J. Res. Dev., 2007

The future of multi-core technologies.
Proceedings of the 2007 IEEE International Conference on Cluster Computing, 2007

Cell Broadband Engine Processor Design Methodology.
Proceedings of the IEEE 2007 Custom Integrated Circuits Conference, 2007

Synergistic Processing in Cell's Multicore Architecture.
IEEE Micro, 2006

Overview of the architecture, circuit design, and physical implementation of a first-generation cell processor.
IEEE J. Solid State Circuits, 2006

The microarchitecture of the synergistic processor for a cell processor.
IEEE J. Solid State Circuits, 2006

Invited speakers II - Real-time supercomputing and technology for games and entertainment.
Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

Key features of the design methodology enabling a multi-core SoC implementation of a first-generation CELL processor.
Proceedings of the 2006 Conference on Asia South Pacific Design Automation: ASP-DAC 2006, 2006

Introduction to the Cell multiprocessor.
IBM J. Res. Dev., 2005

Communication and Synchronization in the Cell Processor - Invited Talk.
Proceedings of the 28th Communicating Process Architectures Conference, 2005

Power Efficient Processor Architecture and The Cell Processor.
Proceedings of the 11th International Conference on High-Performance Computer Architecture (HPCA-11 2005), 2005

Hardware and software architectures for the CELL processor.
Proceedings of the 3rd IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, 2005

The design methodology and implementation of a first-generation CELL processor: a multi-core SoC.
Proceedings of the IEEE 2005 Custom Integrated Circuits Conference, 2005

Power-Constrained Microprocessor Design.
Proceedings of the 20th International Conference on Computer Design (ICCD 2002), 2002

Timed circuit verification using TEL structures.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2001

Derivation of a rotator circuit with homogeneous interconnect.
Inf. Process. Lett., 2001

Custom circuit design as a driver of microprocessor performance.
IBM J. Res. Dev., 2000

"Timing closure by design, " a high frequency microprocessor design methodology.
Proceedings of the 37th Conference on Design Automation, 2000

Verification of Delayed-Reset Domino Circuits Using ATACS.
Proceedings of the 5th International Symposium on Advanced Research in Asynchronous Circuits and Systems (ASYNC '99), 1999

Designing for a gigahertz [guTS integer processor].
IEEE Micro, 1998

High-Speed Serializing/De-Serializing Design-For-Test Method for Evaluating a 1 GHz Microprocessor.
Proceedings of the 16th IEEE VLSI Test Symposium (VTS '98), 28 April, 1998

A 690 ps read-access latency register file for a GHz integer microprocessor.
Proceedings of the International Conference on Computer Design: VLSI in Computers and Processors, 1998

Design methodology for a 1.0 GHz microprocessor.
Proceedings of the International Conference on Computer Design: VLSI in Computers and Processors, 1998

Circuits and Microarchitecture for Gigahertz VLSI Designs.
Proceedings of the 17th Conference on Advanced Research in VLSI (ARVLSI '97), 1997

Distributing a Class of Sequential Programs.
Sci. Comput. Program., 1994

A Distributed Implementation of a Task Pool.
Proceedings of the Research Directions in High-Level Parallel Programming Languages, 1991

Distributed Sorting.
Sci. Comput. Program., 1990