Manuel Prieto

Orcid: 0000-0003-0687-3737

Affiliations:
  • Complutense University of Madrid, Spain


According to our database1, Manuel Prieto authored at least 145 papers between 1997 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Exploiting Elasticity via OS-Runtime Cooperation to Improve CPU Utilization in Multicore Systems.
Proceedings of the 32nd Euromicro International Conference on Parallel, 2024

2023
Divide&Content: A Fair OS-Level Resource Manager for Contention Balancing on NUMA Multicores.
IEEE Trans. Parallel Distributed Syst., November, 2023

Big-PERCIVAL: Exploring the Native Use of 64-Bit Posit Arithmetic in Scientific Computing.
CoRR, 2023

Flexible system software scheduling for asymmetric multicore systems with PMCSched: A case for Intel Alder Lake.
Concurr. Comput. Pract. Exp., 2023

Comparing Performance and Portability Between CUDA and SYCL for Protein Database Search on NVIDIA, AMD, and Intel GPUs.
Proceedings of the 35th IEEE International Symposium on Computer Architecture and High Performance Computing, 2023

PERCIVAL: Deploying Posits and Quire Arithmetic into the CVA6 RISC-V Core.
Proceedings of the 20th ACM International Conference on Computing Frontiers, 2023

2022
PERCIVAL: Open-Source Posit RISC-V Core With Quire Capability.
IEEE Trans. Emerg. Top. Comput., 2022

LFOC+: A Fair OS-Level Cache-Clustering Policy for Commodity Multicore Systems.
IEEE Trans. Computers, 2022

Evaluation of Intel's DPC++ Compatibility Tool in heterogeneous computing.
J. Parallel Distributed Comput., 2022

Assessing Opportunities of SYCL and Intel oneAPI for Biological Sequence Alignment.
CoRR, 2022

Migrating CUDA to oneAPI: A Smith-Waterman Case Study.
Proceedings of the Bioinformatics and Biomedical Engineering, 2022

Rapid Development of OS Support with PMCSched for Scheduling on Asymmetric Multicore Systems.
Proceedings of the Euro-Par 2022: Parallel Processing Workshops, 2022

Customizing the CVA6 RISC-V Core to Integrate Posit and Quire Instructions.
Proceedings of the 37th Conference on Design of Circuits and Integrated Systems, 2022

Evaluation of the Intel thread director technology on an Alder Lake processor.
Proceedings of the APSys '22: 13th ACM SIGOPS Asia-Pacific Workshop on Systems, Virtual Event, Singapore, August 23, 2022

2020
STEEL-RT: combining single task-single executor model and expanded scheduling to ease heterogeneity exploitation.
J. Supercomput., 2020

HEVC optimization based on human perception for real-time environments.
Multim. Tools Appl., 2020

PBBCache: An open-source parallel simulator for rapid prototyping and evaluation of cache-partitioning and cache-clustering policies.
J. Comput. Sci., 2020

LiveChess2FEN: a Framework for Classifying Chess Pieces based on CNNs.
CoRR, 2020

Enabling performance portability of data-parallel OpenMP applications on asymmetric multicore processors.
Proceedings of the ICPP 2020: 49th International Conference on Parallel Processing, 2020

2019
Variable intra-task threading for power-constrained performance and energy optimization in DAG scheduling.
J. Supercomput., 2019

Portability Study of an OpenCL Algorithm for Automatic Target Detection in Hyperspectral Images.
IEEE Trans. Geosci. Remote. Sens., 2019

SWIMM 2.0: Enhanced Smith-Waterman on Intel's Multicore and Manycore Architectures Based on AVX-512 Vector Extensions.
Int. J. Parallel Program., 2019

LFOC: A Lightweight Fairness-Oriented Cache Clustering Policy for Commodity Multicores.
Proceedings of the 48th International Conference on Parallel Processing, 2019

2018
Contention-Aware Fair Scheduling for Asymmetric Single-ISA Multicore Systems.
IEEE Trans. Computers, 2018

Portable real-time DCT-based steganography using OpenCL.
J. Real Time Image Process., 2018

OSWALD.
Int. J. High Perform. Comput. Appl., 2018

Complexity reduction in the HEVC/H265 standard based on smooth region classification.
Digit. Signal Process., 2018

On the Interplay Between Throughput, Fairness and Energy Efficiency on Asymmetric Multicore Processors.
Comput. J., 2018

Reuse Detector: Improving the Management of STT-RAM SLLCs.
Comput. J., 2018

SWIFOLD: Smith-Waterman implementation on FPGA with OpenCL for long DNA sequences.
BMC Syst. Biol., 2018

A CPU-GPU Parallel Ant Colony Optimization Solver for the Vehicle Routing Problem.
Proceedings of the Applications of Evolutionary Computation, 2018

2017
Parallel Implementation of a Full Hyperspectral Unmixing Chain Using OpenCL.
IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens., 2017

Performance-Power Evaluation of an OpenCL Implementation of the Simplex Growing Algorithm for Hyperspectral Unmixing.
IEEE Geosci. Remote. Sens. Lett., 2017

Towards completely fair scheduling on asymmetric single-ISA multicore processors.
J. Parallel Distributed Comput., 2017

First Experiences Optimizing Smith-Waterman on Intel's Knights Landing Processor.
CoRR, 2017

PMCTrack: Delivering Performance Monitoring Counter Support to the OS Scheduler.
Comput. J., 2017

Accelerating Smith-Waterman Alignment of Long DNA Sequences with OpenCL on FPGA.
Proceedings of the Bioinformatics and Biomedical Engineering, 2017

Performance and Scalability Study of FMM Kernels on Novel Multi- and Many-core Architectures.
Proceedings of the International Conference on Computational Science, 2017

First Experiences Accelerating Smith-Waterman on Intel's Knights Landing Processor.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2017

Delivering Fairness on Asymmetric Multicore Systems via Contention-Aware Scheduling.
Proceedings of the Euro-Par 2017: Parallel Processing Workshops, 2017

2016
Code obfuscation using very long identifiers for FFT motion estimation models in embedded processors.
J. Real Time Image Process., 2016

Parallel implementation of a hyperspectral data geometry-based estimation of number of endmembers algorithm.
Proceedings of the Real-Time Image and Video Processing 2016, 2016

Parallel implementation of the simplex growing algorithm for hyperspectral unmixing using OpenCL.
Proceedings of the 2016 IEEE International Geoscience and Remote Sensing Symposium, 2016

HeSP: A Simulation Framework for Solving the Task Scheduling-Partitioning Problem on Heterogeneous Architectures.
Proceedings of the Euro-Par 2016: Parallel Processing, 2016

2015
Accelerating fluid-solid simulations (Lattice-Boltzmann & Immersed-Boundary) on heterogeneous architectures.
J. Comput. Sci., 2015

A power measurement environment for PCIe accelerators.
Comput. Sci. Res. Dev., 2015

An energy-aware performance analysis of SWIMM: <i>S</i>mith-<i>W</i>aterman implementation on <i>I</i>ntel's <i>M</i>ulticore and <i>M</i>anycore architectures.
Concurr. Comput. Pract. Exp., 2015

Non-negative Matrix Factorization on Low-Power Architectures and Accelerators: A Comparative Study.
Comput. Electr. Eng., 2015

Smith-Waterman Protein Search with OpenCL on an FPGA.
Proceedings of the 2015 IEEE TrustCom/BigDataSE/ISPA, 2015

ACFS: a completely fair scheduler for asymmetric single-isa multicore systems.
Proceedings of the 30th Annual ACM Symposium on Applied Computing, 2015

OpenACC-based GPU acceleration of an optical flow algorithm.
Proceedings of the 30th Annual ACM Symposium on Applied Computing, 2015

Parallel trajectory synchronization for aircraft conflicts resolution.
Proceedings of the 30th Annual ACM Symposium on Applied Computing, 2015

Customized Nios II multi-cycle instructions to accelerate block-matching techniques.
Proceedings of the Real-Time Image and Video Processing 2015, 2015

Fast-coding robust motion estimation model in a GPU.
Proceedings of the Real-Time Image and Video Processing 2015, 2015

Early Experiences with OpenCL on FPGAs: Convolution Case Study.
Proceedings of the 23rd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2015

An OS-Oriented Performance Monitoring Tool for Multicore Systems.
Proceedings of the Euro-Par 2015: Parallel Processing Workshops, 2015

A fast parallel hyperspectral coded aperture algorithm for compressive sensing using OpenCL.
Proceedings of the IEEE EUROCON 2015, 2015

2014
Fast finite difference Poisson solvers on heterogeneous architectures.
Comput. Phys. Commun., 2014

Accelerating Solid-fluid Interaction using Lattice-boltzmann and Immersed Boundary Coupled Simulations on Heterogeneous Platforms.
Proceedings of the International Conference on Computational Science, 2014

Exploring the Throughput-Fairness Trade-off on Asymmetric Multicore Systems.
Proceedings of the Euro-Par 2014: Parallel Processing Workshops, 2014

Smith-Waterman algorithm on heterogeneous systems: A case study.
Proceedings of the 2014 IEEE International Conference on Cluster Computing, 2014

2013
Survey of Energy-Cognizant Scheduling Techniques.
IEEE Trans. Parallel Distributed Syst., 2013

System-level memory management based on statistical variability compensation for frame-based applications.
ACM Trans. Embed. Comput. Syst., 2013

Offset Printing Plate Quality Sensor on a Low-Cost Processor.
Sensors, 2013

Robust motion estimation on a low-power multi-core DSP.
EURASIP J. Adv. Signal Process., 2013

Acceleration of block-matching algorithms using a custom instruction-based paradigm on a Nios II microprocessor.
EURASIP J. Adv. Signal Process., 2013

Multi-GPU based on multicriteria optimization for motion estimation system.
EURASIP J. Adv. Signal Process., 2013

GPU-based acceleration of bio-inspired motion estimation model.
Concurr. Comput. Pract. Exp., 2013

Implementation of a Low-Cost Mobile Devices to Support Medical Diagnosis.
Comput. Math. Methods Medicine, 2013

Range query processing on single and multi GPU environments.
Comput. Electr. Eng., 2013

Delivering fairness and priority enforcement on asymmetric multicore systems via OS scheduling.
Proceedings of the ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer Systems, 2013

Non-negative matrix factorization on low-power architectures: a comparative study.
Proceedings of the 20th European MPI Users's Group Meeting, 2013

Multi-level Clustering on Metric Spaces Using a Multi-GPU Platform.
Proceedings of the Euro-Par 2013 Parallel Processing, 2013

2012
Leveraging Core Specialization via OS Scheduling to Improve Performance on Asymmetric Multicore Systems.
ACM Trans. Comput. Syst., 2012

A Low Cost Matching Motion Estimation Sensor Based on the NIOS II Microprocessor.
Sensors, 2012

Survey of scheduling techniques for addressing shared resources in multicore processors.
ACM Comput. Surv., 2012

Block Tridiagonal Solvers on Heterogeneous Architectures.
Proceedings of the 10th IEEE International Symposium on Parallel and Distributed Processing with Applications, 2012

Range Query Processing in a Multi-GPU Environment.
Proceedings of the 10th IEEE International Symposium on Parallel and Distributed Processing with Applications, 2012

2011
Leveraging workload diversity through OS scheduling to maximize performance on single-ISA heterogeneous multicore systems.
J. Parallel Distributed Comput., 2011

Hybrid timing-address oriented load-store queue filtering for an x86 architecture.
IET Comput. Digit. Tech., 2011

Parallelism on the Nonnegative Matrix Factorization.
Proceedings of the Applications, Tools and Techniques on the Road to Exascale Computing, Proceedings of the conference ParCo 2011, 31 August, 2011

Biclustering and classification analysis in gene expression using Nonnegative Matrix Factorization on multi-GPU systems.
Proceedings of the 11th International Conference on Intelligent Systems Design and Applications, 2011

Introduction.
Proceedings of the Euro-Par 2011 Parallel Processing - 17th International Conference, 2011

kNN Query Processing in Metric Spaces Using GPUs.
Proceedings of the Euro-Par 2011 Parallel Processing - 17th International Conference, 2011

2010
Improving face recognition by combination of natural and Gabor faces.
Pattern Recognit. Lett., 2010

On-Line Multi-Threaded Processing of Web User-Clicks on Multi-Core Processors.
Proceedings of the High Performance Computing for Computational Science - VECPAR 2010, 2010

A comprehensive scheduler for asymmetric multicore systems.
Proceedings of the European Conference on Computer Systems, 2010

Statistical approach in a system level methodology to deal with process variation.
Proceedings of the 8th International Conference on Hardware/Software Codesign and System Synthesis, 2010

Building efficient multi-threaded search nodes.
Proceedings of the 19th ACM Conference on Information and Knowledge Management, 2010

Operating system support for mitigating software scalability bottlenecks on asymmetric multicore processors.
Proceedings of the 7th Conference on Computing Frontiers, 2010

2009
Replacing Associative Load Queues: A Timing-Centric Approach.
IEEE Trans. Computers, 2009

Using age registers for a simple load-store queue filtering.
J. Syst. Archit., 2009

Maximizing power efficiency with asymmetric multicore systems.
Commun. ACM, 2009

Endmember Extraction from Hyperspectral Imagery using a Parallel Ensemble Approach with Consensus Analysis.
Proceedings of the IEEE International Geoscience & Remote Sensing Symposium, 2009

Introduction.
Proceedings of the Euro-Par 2009 Parallel Processing, 2009

System-level process variability compensation on memory organizations: on the scalability of multi-mode memories.
Proceedings of the 14th Asia South Pacific Design Automation Conference, 2009

2008
Parallel Implementation of the 2D Discrete Wavelet Transform on Graphics Processing Units: Filter Bank versus Lifting.
IEEE Trans. Parallel Distributed Syst., 2008

Combining system scenarios and configurable memories to tolerate unpredictability.
ACM Trans. Design Autom. Electr. Syst., 2008

GPU for Parallel On-Board Hyperspectral Image Processing.
Int. J. High Perform. Comput. Appl., 2008

Energy reduction of the fetch mechanism through dynamic adaptation.
IET Comput. Digit. Tech., 2008

Improving Search Engines Performance on Multithreading Processors.
Proceedings of the High Performance Computing for Computational Science, 2008

Improving Priority Enforcement via Non-Work-Conserving Scheduling.
Proceedings of the 2008 International Conference on Parallel Processing, 2008

Exploiting Hybrid Parallelism in Web Search Engines.
Proceedings of the Euro-Par 2008, 2008

2007
Parallel Morphological Endmember Extraction Using Commodity Graphics Hardware.
IEEE Geosci. Remote. Sens. Lett., 2007

Multigrid Smoothers on Multicore Architectures.
Proceedings of the Parallel Computing: Architectures, 2007

2006
A Load-Store Queue Design Based on Predictive State Filtering.
J. Low Power Electron., 2006

Enhancing the Performance of Multigrid Smoothers in Simultaneous Multithreading Architectures.
Proceedings of the High Performance Computing for Computational Science, 2006

DMDC: Delayed Memory Dependence Checking through Age-Based Filtering.
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-39 2006), 2006

System-level process variability compensation on memory organizations of dynamic applications: a case study.
Proceedings of the 7th International Symposium on Quality of Electronic Design (ISQED 2006), 2006

Substituting associative load queue with simple hash tables in out-of-order microprocessors.
Proceedings of the 2006 International Symposium on Low Power Electronics and Design, 2006

Parallel Hyperspectral Image Processing on Commodity Graphics Hardware.
Proceedings of the 2006 International Conference on Parallel Processing Workshops (ICPP Workshops 2006), 2006

2005
A Power-Efficient and Scalable Load-Store Queue Design.
Proceedings of the Integrated Circuit and System Design, 2005

Pack Transposition: Enhancing Superword Level Parallelism Exploitation.
Proceedings of the Parallel Computing: Current & Future Issues of High-End Computing, 2005

JPEG2000 Optimization in General Purpose Microprocessors.
Proceedings of the Parallel Computing: Current & Future Issues of High-End Computing, 2005

A Speculative Parallel Algorithm for Self-Organizing Maps.
Proceedings of the Parallel Computing: Current & Future Issues of High-End Computing, 2005

Energy-aware fetch mechanism: trace cache and BTB customization.
Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005

Load-Store Queue Management: an Energy-Efficient Design Based on a State-Filtering Mechanism..
Proceedings of the 23rd International Conference on Computer Design (ICCD 2005), 2005

Improving superword level parallelism support in modern compilers.
Proceedings of the 3rd IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, 2005

2004
Exploiting Multilevel Parallelism Within Modern Microprocessors: DWT as a Case Study.
Proceedings of the High Performance Computing for Computational Science, 2004

2003
A parallel multigrid solver for viscous flows on anisotropic structured grids.
Parallel Comput., 2003

Customizing the Branch Predictor to Reduce Complexity and Energy Consumption.
IEEE Micro, 2003

Hybrid Parallelization of a Compact Genetic Algorithm.
Proceedings of the 11th Euromicro Workshop on Parallel, 2003

Branch prediction on demand: an energy-efficient solution.
Proceedings of the 2003 International Symposium on Low Power Electronics and Design, 2003

Vectorization of Multigrid Codes Using SIMD ISA Extensions.
Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), 2003

Vectorization of the 2D Wavelet Lifting Transform Using SIMD Extensions.
Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), 2003

2002
Analysis of simulation-adapted SPEC 2000 benchmarks.
SIGARCH Comput. Archit. News, 2002

Wavelet Transform for Large Scale Image Processing on Modern Microprocessors.
Proceedings of the High Performance Computing for Computational Science, 2002

Beowulf Performance in CFD Multigrid Applications.
Proceedings of the 10th Euromicro Workshop on Parallel, 2002

A Parallel Cloth Simulator Using Multilevel Algorithms.
Proceedings of the 16th International Parallel and Distributed Processing Symposium (IPDPS 2002), 2002

Parallel Wavelet Transform for Large Scale Image Processing.
Proceedings of the 16th International Parallel and Distributed Processing Symposium (IPDPS 2002), 2002

-D Wavelet Transform Enhancement on General-Purpose Microprocessors: Memory Hierarchy and SIMD Parallelism Exploitation.
Proceedings of the High Performance Computing, 2002

2001
A parallel multigrid solver for 3D convection and convection-diffusion problems.
Parallel Comput., 2001

Parallel Multigrid for Anisotropic Elliptic Equations.
J. Parallel Distributed Comput., 2001

A Multigrid Solver for the Incompressible Navier-Stokes Equations on a Beowulf-Class System.
Proceedings of the 2001 International Conference on Parallel Processing, 2001

2000
Data Locality Exploitation in the Decomposition of Regular Domain Problems.
IEEE Trans. Parallel Distributed Syst., 2000

A robust multigrid solver on parallel computers.
Proceedings of the Eight Euromicro Workshop on Parallel and Distributed Processing, 2000

Impact of PE Mapping on Cray T3E Message-Passing Performance.
Proceedings of the Euro-Par 2000, Parallel Processing, 6th International Euro-Par Conference, Munich, Germany, August 29, 2000

1999
An environment to develop parallel code for solving partial differential equations based-problems.
J. Syst. Archit., 1999

A Parallel Robust Multigrid Algorithm Based on Semi-Coarsening.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 1999

A Method for Model Parameter Identification Using Parallel Genetic Algorithms.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 1999

Parallel resolution of alternating-line processes by means of pipelining techniques.
Proceedings of the Seventh Euromicro Workshop on Parallel and Distributed Processing. PDP'99, 1999

Solution of Alternating-Line Processes on Modern Parallel Computers.
Proceedings of the International Conference on Parallel Processing 1999, 1999

Message Passing Evaluation and Analysis on Cray T3E and SGI Origin 2000 Systems.
Proceedings of the Euro-Par '99 Parallel Processing, 5th International Euro-Par Conference, Toulouse, France, August 31, 1999

1998
Partitioning Regular Domains on Modern Parallel Computers.
Proceedings of the Vector and Parallel Processing, 1998

1997
Automatic Generation of Parallel Code for Solving PDE Based Problems.
Proceedings of the IASTED International Conference on Parallel and Distributed Systems, 1997


  Loading...