Vassilis Papaefstathiou

Orcid: 0000-0002-5443-6470

  • University of Crete, Department of Computer Science, Heraklion, Greece
  • Foundation for Research & Technology - Hellas (FORTH), Heraklion, Greece

According to our database1, Vassilis Papaefstathiou authored at least 55 papers between 2004 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.



In proceedings 
PhD thesis 


Online presence:



Low-latency Communication in RISC-V Clusters.
Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, 2024

The ExaNeSt Prototype: Evaluation of Efficient HPC Communication Hardware in an ARM-based Multi-FPGA Rack.
CoRR, 2023

Software Development Vehicles to Enable Extended and Early Co-design: A RISC-V and HPC Case of Study.
Proceedings of the High Performance Computing, 2023

Short Reasons for Long Vectors in HPC CPUs: A Study Based on RISC-V.
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

Optimized Page Fault Handling During RDMA.
IEEE Trans. Parallel Distributed Syst., 2022

HighwayNoC: Approaching Ideal NoC Performance With Dual Data Rate Routers.
IEEE/ACM Trans. Netw., 2021

UNILOGIC: A Novel Architecture for Highly Parallel Reconfigurable Systems.
ACM Trans. Reconfigurable Technol. Syst., 2020

On Architectural Support for Instruction Set Randomization.
ACM Trans. Archit. Code Optim., 2020

Proceedings of the Thirteenth International Workshop on Programmability and Architectures for Heterogeneous Multicores (MULTIPROG-2020).
CoRR, 2020

Performance and Energy Footprint Assessment of FPGAs and GPUs on HPC Systems Using Astrophysics Application.
Comput., 2020

PART: Pinning Avoidance in RDMA Technologies.
Proceedings of the 14th IEEE/ACM International Symposium on Networks-on-Chip, 2020

Hybrid2: Combining Caching and Migration in Hybrid Memory Systems.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2020

Prototyping a Configurable Cache/Scratchpad Memory with Virtualized User-Level RDMA Capability.
Trans. High Perform. Embed. Archit. Compil., 2019

Decoupled Fused Cache: Fusing a Decoupled LLC with a DRAM Cache.
ACM Trans. Archit. Code Optim., 2019

Direct N-body application on low-power and energy-efficient parallel architectures.
CoRR, 2019

Implementation and Impact of an Ultra-Compact Multi-FPGA Board for Large System Prototyping.
Proceedings of the 2019 IEEE/ACM International Workshop on Heterogeneous High-performance Reconfigurable Computing, 2019

Direct N-Body Application on Low-Power and Energy-Efficient Parallel Architectures.
Proceedings of the Parallel Computing: Technology Trends, 2019

LLC-Guided Data Migration in Hybrid Memory Systems.
Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium, 2019

Global Dead-Block Management for Task-Parallel Programs.
ACM Trans. Archit. Code Optim., 2018

DDRNoC: Dual Data-Rate Network-on-Chip.
ACM Trans. Archit. Code Optim., 2018

FreewayNoC: A DDR NoC with Pipeline Bypassing.
Proceedings of the Twelfth IEEE/ACM International Symposium on Networks-on-Chip, 2018

ProFess: A Probabilistic Hybrid Main Memory Management Framework for High Performance and Fairness.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018

FusionCache: Using LLC tags for DRAM cache.
Proceedings of the 2018 Design, Automation & Test in Europe Conference & Exhibition, 2018

SLOOP: QoS-Supervised Loop Execution to Reduce Energy on Heterogeneous Architectures.
ACM Trans. Archit. Code Optim., 2017

Runtime-Assisted Global Cache Management for Task-Based Parallel Programs.
IEEE Comput. Archit. Lett., 2017

Modeling energy-performance tradeoffs in ARM big.LITTLE architectures.
Proceedings of the 27th International Symposium on Power and Timing Modeling, 2017

Odd-ECC: on-demand DRAM error correcting codes.
Proceedings of the International Symposium on Memory Systems, 2017

Adaptive Row Addressing for Cost-Efficient Parallel Memory Protocols in Large-Capacity Memories.
Proceedings of the Second International Symposium on Memory Systems, 2016

RADAR: Runtime-assisted dead region management for last-level caches.
Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture, 2016

ECOSCALE: Reconfigurable computing and runtime system for future exascale systems.
Proceedings of the 2016 Design, Automation & Test in Europe Conference & Exhibition, 2016

A Systematic Evaluation of Emerging Mesh-like CMP NoCs.
Proceedings of the Eleventh ACM/IEEE Symposium on Architectures for networking and communications systems, 2015

Architectural support for software-guided energy reduction of manycore communication
PhD thesis, 2014

FPGA prototyping of emerging manycore architectures for parallel programming research using Formic boards.
J. Syst. Archit., 2014

Design space exploration for fair resource-allocated NoC architectures.
Proceedings of the XIVth International Conference on Embedded Computer Systems: Architectures, 2014

Design trade-offs in energy efficient NoC architectures.
Proceedings of the Eighth IEEE/ACM International Symposium on Networks-on-Chip, 2014

NP-SARC: Scalable network processing in the SARC multi-core FPGA platform.
J. Syst. Archit., 2013

Prefetching and cache management using task lifetimes.
Proceedings of the International Conference on Supercomputing, 2013

ASIST: architectural support for instruction set randomization.
Proceedings of the 2013 ACM SIGSAC Conference on Computer and Communications Security, 2013

Formic: Cost-efficient and Scalable Prototyping of Manycore Architectures.
Proceedings of the 2012 IEEE 20th Annual International Symposium on Field-Programmable Custom Computing Machines, 2012

Fine-grain OpenMP runtime support with explicit communication hardware primitives.
Proceedings of the Design, Automation and Test in Europe, 2011

Explicit Communication and Synchronization in SARC.
IEEE Micro, 2010

Network Processing in Multi-core FPGAs with Integrated Cache-Network Interface.
Proceedings of the ReConFig'10: 2010 International Conference on Reconfigurable Computing and FPGAs, 2010

FPGA implementation of a configurable cache/scratchpad memory with virtualized user-level RDMA capability.
Proceedings of the 2009 International Conference on Embedded Computer Systems: Architectures, 2009

Prototyping Efficient Interprocessor Communication Mechanisms.
Proceedings of the 2007 International Conference on Embedded Computer Systems: Architectures, 2007

Memory-Efficient 5D Packet Classification At 40 Gbps.
Proceedings of the INFOCOM 2007. 26th IEEE International Conference on Computer Communications, 2007

Optimization and bottleneck analysis of network block I/O in commodity storage systems.
Proceedings of the 21th Annual International Conference on Supercomputing, 2007

Efficient remote block-level I/O over an RDMA-capable NIC.
Proceedings of the 20th Annual International Conference on Supercomputing, 2006

An innovative low-cost Classification Scheme for combined multi-Gigabit IP and Ethernet Networks.
Proceedings of IEEE International Conference on Communications, 2006

A hardware-engine for layer-2 classification in low-storage, ultra-high bandwidth environments.
Proceedings of the Conference on Design, Automation and Test in Europe: Designers' Forum, 2006

Experiences from Debugging a PCIX-based RDMA-capable NIC.
Proceedings of the 2006 IEEE International Conference on Cluster Computing, 2006

A Memory Efficient, 100 Gb/sec MAC Classification Engine.
Proceedings of the 30th Annual IEEE Conference on Local Computer Networks (LCN 2005), 2005

Design-space exploration of the most widely used cryptography algorithms.
Microprocess. Microsystems, 2004
