Manolis Katevenis

Orcid: 0009-0008-5437-4709

Affiliations:
  • Foundation for Research & Technology - Hellas, Greece


According to our database1, Manolis Katevenis authored at least 72 papers between 1982 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Low-latency Communication in RISC-V Clusters.
Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, 2024

2023
The ExaNeSt Prototype: Evaluation of Efficient HPC Communication Hardware in an ARM-based Multi-FPGA Rack.
CoRR, 2023

2022
Optimized Page Fault Handling During RDMA.
IEEE Trans. Parallel Distributed Syst., 2022


2021
Using hls4ml to Map Convolutional Neural Networks on Interconnected FPGA Devices.
Proceedings of the 29th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2021

2020
PART: Pinning Avoidance in RDMA Technologies.
Proceedings of the 14th IEEE/ACM International Symposium on Networks-on-Chip, 2020

2019
Prototyping a Configurable Cache/Scratchpad Memory with Virtualized User-Level RDMA Capability.
Trans. High Perform. Embed. Archit. Compil., 2019

Shall numerical astrophysics step into the era of Exascale computing?
CoRR, 2019


Implementation and Impact of an Ultra-Compact Multi-FPGA Board for Large System Prototyping.
Proceedings of the 2019 IEEE/ACM International Workshop on Heterogeneous High-performance Reconfigurable Computing, 2019

Efficient Convolutional Neural Network Weight Compression for Space Data Classification on Multi-fpga Platforms.
Proceedings of the IEEE International Conference on Acoustics, 2019

Towards Exascale: Measuring the Energy Footprint of Astrophysics HPC Simulations.
Proceedings of the 15th International Conference on eScience, 2019

2018
Next generation of Exascale-class systems: ExaNeSt project and the status of its interconnect and storage development.
Microprocess. Microsystems, 2018

Accurate Congestion Control for RDMA Transfers.
Proceedings of the Twelfth IEEE/ACM International Symposium on Networks-on-Chip, 2018

2017
Modeling energy-performance tradeoffs in ARM big.LITTLE architectures.
Proceedings of the 27th International Symposium on Power and Timing Modeling, 2017


2016
Discharging the Network From Its Flow Control Headaches: Packet Drops and HOL Blocking.
IEEE/ACM Trans. Netw., 2016


2015
The Combined Input-Output Queued Crossbar Architecture for High-Radix On-Chip Switches.
IEEE Micro, 2015

A Systematic Evaluation of Emerging Mesh-like CMP NoCs.
Proceedings of the Eleventh ACM/IEEE Symposium on Architectures for networking and communications systems, 2015

2014
FPGA prototyping of emerging manycore architectures for parallel programming research using Formic boards.
J. Syst. Archit., 2014

Design space exploration for fair resource-allocated NoC architectures.
Proceedings of the XIVth International Conference on Embedded Computer Systems: Architectures, 2014

Design trade-offs in energy efficient NoC architectures.
Proceedings of the Eighth IEEE/ACM International Symposium on Networks-on-Chip, 2014

EUROSERVER: Energy Efficient Node for European Micro-Servers.
Proceedings of the 17th Euromicro Conference on Digital System Design, 2014

2013
NP-SARC: Scalable network processing in the SARC multi-core FPGA platform.
J. Syst. Archit., 2013

Prefetching and cache management using task lifetimes.
Proceedings of the International Conference on Supercomputing, 2013

2012
LP-NUCA: Networks-in-Cache for High-Performance Low-Power Embedded Processors.
IEEE Trans. Very Large Scale Integr. Syst., 2012

Crossbar NoCs Are Scalable Beyond 100 Nodes.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2012

Cache-Integrated Network Interfaces: Flexible On-Chip Communication and Synchronization for Large-Scale CMPs.
Int. J. Parallel Program., 2012

Formic: Cost-efficient and Scalable Prototyping of Manycore Architectures.
Proceedings of the 2012 IEEE 20th Annual International Symposium on Field-Programmable Custom Computing Machines, 2012

2011
Distributed WFQ scheduling converging to weighted max-min fairness.
Comput. Networks, 2011

VLSI micro-architectures for high-radix crossbar schedulers.
Proceedings of the NOCS 2011, 2011

An efficient sequential iterative matching algorithm for CIOQ switches.
Proceedings of the 16th IEEE Symposium on Computers and Communications, 2011

Fine-grain OpenMP runtime support with explicit communication hardware primitives.
Proceedings of the Design, Automation and Test in Europe, 2011

2010
Explicit Communication and Synchronization in SARC.
IEEE Micro, 2010

Network Processing in Multi-core FPGAs with Integrated Cache-Network Interface.
Proceedings of the ReConFig'10: 2010 International Conference on Reconfigurable Computing and FPGAs, 2010

A 128 x 128 x 24Gb/s Crossbar Interconnecting 128 Tiles in a Single Hop and Occupying 6% of Their Area.
Proceedings of the NOCS 2010, 2010

Efficient implementation of CIOQ switches with sequential iterative matching algorithms.
Proceedings of the International Conference on Field-Programmable Technology, 2010

On-chip communication and synchronization mechanisms with cache-integrated network interfaces.
Proceedings of the 7th Conference on Computing Frontiers, 2010

End-to-end congestion management for non-blocking multi-stage switching fabrics.
Proceedings of the 2010 ACM/IEEE Symposium on Architecture for Networking and Communications Systems, 2010

2009
FPGA implementation of a configurable cache/scratchpad memory with virtualized user-level RDMA capability.
Proceedings of the 2009 International Conference on Embedded Computer Systems: Architectures, 2009

2008
Building an FoC Using Large, Buffered Crossbar Cores.
IEEE Des. Test Comput., 2008

Towards unified mechanisms for inter-processor communication.
Proceedings of the 2008 International Conference on Embedded Computer Systems: Architectures, 2008

2007
Pipelined heap (priority queue) management for advanced scheduling in high-speed networks.
IEEE/ACM Trans. Netw., 2007

Prototyping Efficient Interprocessor Communication Mechanisms.
Proceedings of the 2007 International Conference on Embedded Computer Systems: Architectures, 2007

Approaching Ideal NoC Latency with Pre-Configured Routes.
Proceedings of the First International Symposium on Networks-on-Chips, 2007

2006
Scheduling in Non-Blocking Buffered Three-Stage Switching Fabrics.
Proceedings of the INFOCOM 2006. 25th IEEE International Conference on Computer Communications, 2006

2005
Benes switching fabrics with O(N)-complexity internal backpressure.
IEEE Commun. Mag., 2005

Variable-size multipacket segments in buffered crossbar (CICQ) architectures.
Proceedings of IEEE International Conference on Communications, 2005

Scheduling in switches with small internal buffers.
Proceedings of the Global Telecommunications Conference, 2005. GLOBECOM '05, St. Louis, Missouri, USA, 28 November, 2005

2004
Variable packet size buffered crossbar (CICQ) switches.
Proceedings of IEEE International Conference on Communications, 2004

Multiple priorities in a two-lane buffered crossbar.
Proceedings of the Global Telecommunications Conference, 2004. GLOBECOM '04, Dallas, Texas, USA, 29 November, 2004

2002
Web-conscious storage management for web proxies.
IEEE/ACM Trans. Netw., 2002

2001
Wormhole IP over (connectionless) ATM.
IEEE/ACM Trans. Netw., 2001

Efficient per-flow queueing in DRAM at OC-192 line rate using out-of-order execution techniques.
Proceedings of the IEEE International Conference on Communications, 2001

1999
ATLAS I: implementing a single-chip ATM switch with backpressure.
IEEE Micro, 1999

The Remote Enqueue Operation on Networks of Workstations.
Informatica (Slovenia), 1999

Secondary Storage Management for Web Proxies.
Proceedings of the 2nd USENIX Symposium on Internet Technologies and Systems, 1999

1998
ATLAS I: a single-chip, gigabit ATM switch with HIC/HS links arid multi-lane back-pressure.
Microprocess. Microsystems, 1998

Credit-Flow-Controlled ATM for MP Interconnection: The ATLAS I Single-Chip ATM Switch.
Proceedings of the Fourth International Symposium on High-Performance Computer Architecture, Las Vegas, Nevada, USA, January 31, 1998

The Remote Enqueue Operation on Networks of Workstations.
Proceedings of the Network-Based Parallel Computing: Communication, 1998

1997
Telegraphos: A Substrate for High-Performance Computing on Workstation Clusters.
J. Parallel Distributed Comput., 1997

User-Level DMA without Operating System Kernel Modification.
Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture (HPCA '97), 1997

ATLAS: A Single-Chip ATM Switch for NOWs.
Proceedings of the Communication and Architectural Support for Network-Based Parallel Computing, 1997

Pipelined Multi-Queue Management in a VLSI ATM Switch Chip with Credit-Based Flow-Control.
Proceedings of the 17th Conference on Advanced Research in VLSI (ARVLSI '97), 1997

1996
Telegraphos: High-Performance Networking for Parallel Processing on Workstation Clusters.
Proceedings of the Second International Symposium on High-Performance Computer Architecture, 1996

1995
Pipelined Memory Shared Buffer for VLSI Switches.
Proceedings of the ACM SIGCOMM 1995 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication, Cambridge, MA, USA, August 28, 1995

1991
Weighted Round-Robin Cell Multiplexing in a General-Purpose ATM Switch Chip.
IEEE J. Sel. Areas Commun., 1991

Reducing the Branch Penalty by Rearranging Instructions in Double-Width Memory.
Proceedings of the ASPLOS-IV Proceedings, 1991

1987
Fast switching and fair control of congested flow in broadband networks.
IEEE J. Sel. Areas Commun., 1987

A Vector Hardware Accelerator with Circuit Simulation Emphasis.
Proceedings of the 24th ACM/IEEE Design Automation Conference. Miami Beach, FL, USA, June 28, 1987

1982
A RISCy approach to VLSI.
SIGARCH Comput. Archit. News, 1982


  Loading...