Vasilios I. Kelefouras

Orcid: 0000-0001-9591-913X

According to our database1, Vasilios I. Kelefouras authored at least 41 papers between 2011 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
A Practical Approach for Employing Tensor Train Decomposition in Edge Devices.
Int. J. Parallel Program., April, 2024

2023
Design and Implementation of Deep Learning 2D Convolutions on Modern CPUs.
IEEE Trans. Parallel Distributed Syst., December, 2023

Contactless Camera-Based Heart Rate and Respiratory Rate Monitoring Using AI on Hardware.
Sensors, 2023

SDN-Based Routing Framework for Elephant and Mice Flows Using Unsupervised Machine Learning.
Network, 2023

Towards Highly Compressed CNN Models for Human Activity Recognition in Wearable Devices.
Proceedings of the Signal Processing: Algorithms, 2023

A Comparative Study of Neural Network Compilers on ARMv8 Architecture.
Proceedings of the Architecture of Computing Systems - 36th International Conference, 2023

2022
Design and Implementation of 2D Convolution on x86/x64 Processors.
IEEE Trans. Parallel Distributed Syst., 2022

Workflow simulation and multi-threading aware task scheduling for heterogeneous computing.
J. Parallel Distributed Comput., 2022

A Methodology for Efficient Tile Size Selection for Affine Loop Kernels.
Int. J. Parallel Program., 2022

Anatomy of Deep Learning Image Classification and Object Detection on Commercial Edge Devices: A Case Study on Face Mask Detection.
IEEE Access, 2022


A Design Space Exploration Methodology for Enabling Tensor Train Decomposition in Edge Devices.
Proceedings of the Embedded Computer Systems: Architectures, Modeling, and Simulation, 2022

Safety by Construction: Pattern-Based Application of Safety Mechanisms in XANDAR.
Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2022


Evaluation of Language Runtimes in Open-source Serverless Platforms.
Proceedings of the 12th International Conference on Cloud Computing and Services Science, 2022

2021
An Analytical Model for Loop Tiling Transformation.
Proceedings of the Embedded Computer Systems: Architectures, Modeling, and Simulation, 2021

Unsupervised Machine Learning-Based Elephant and Mice Flow Identification.
Proceedings of the Intelligent Computing, 2021

A Hierarchical Profiler of Intermediate Representation Code based on LLVM.
Proceedings of the 10th Mediterranean Conference on Embedded Computing, 2021


2019
A methodology correlating code optimizations with data memory accesses, execution time and energy consumption.
J. Supercomput., 2019

2018
Combining Software Cache Partitioning and Loop Tiling for Effective Shared Cache Management.
ACM Trans. Embed. Comput. Syst., 2018

Workflow Simulation Aware and Multi-threading Effective Task Scheduling for Heterogeneous Computing.
Proceedings of the 25th IEEE International Conference on High Performance Computing, 2018

A methodology for efficient code optimizations and memory management.
Proceedings of the 15th ACM International Conference on Computing Frontiers, 2018

2017
A methodology pruning the search space of six compiler transformations by addressing them together as one problem and by exploiting the hardware architecture details.
Computing, 2017

Cache Partitioning + Loop Tiling: A Methodology for Effective Shared Cache Management.
Proceedings of the 2017 IEEE Computer Society Annual Symposium on VLSI, 2017

2016
Array Size Computation under Uniform Overlapping and Irregular Accesses.
ACM Trans. Design Autom. Electr. Syst., 2016

A high-performance matrix-matrix multiplication methodology for CPU and GPU architectures.
J. Supercomput., 2016

Area-Throughput Trade-Offs for SHA-1 and SHA-256 Hash Functions' Pipelined Designs.
J. Circuits Syst. Comput., 2016

2015
A methodology for speeding up matrix vector multiplication for single/multi-core architectures.
J. Supercomput., 2015

A methodology for speeding up loop kernels by exploiting the software information and the memory architecture.
Comput. Lang. Syst. Struct., 2015

2014
A Methodology for Speeding up MVM for Regular, Toeplitz and Bisymmetric Toeplitz Matrices.
J. Signal Process. Syst., 2014

A Matrix-Matrix Multiplication methodology for single/multi-core architectures using SIMD.
J. Supercomput., 2014

A methodology for speeding up edge and line detection algorithms focusing on memory architecture utilization.
J. Supercomput., 2014

A scalable and near-optimal representation of access schemes for memory management.
ACM Trans. Archit. Code Optim., 2014

2013
Near-optimal and scalable intrasignal in-place optimization for non-overlapping and irregular access schemes.
ACM Trans. Design Autom. Electr. Syst., 2013

Near-Optimal Microprocessor and Accelerators Codesign with Latency and Throughput Constraints.
ACM Trans. Archit. Code Optim., 2013

A systematic approach to classify design-time global scheduling techniques.
ACM Comput. Surv., 2013

2012
On the exploitation of a high-throughput SHA-256 FPGA design for HMAC.
ACM Trans. Reconfigurable Technol. Syst., 2012

A data locality methodology for matrix-matrix multiplication algorithm.
J. Supercomput., 2012

A template-based methodology for efficient microprocessor and FPGA accelerator co-design.
Proceedings of the 2012 International Conference on Embedded Computer Systems: Architectures, 2012

2011
A Methodology for Speeding Up Fast Fourier Transform Focusing on Memory Architecture Utilization.
IEEE Trans. Signal Process., 2011


  Loading...