Rafael Rodríguez-Sánchez

Orcid: 0000-0001-8789-3953

  • Universidad Complutense de Madrid, Madrid, Spain

According to our database1, Rafael Rodríguez-Sánchez authored at least 49 papers between 2010 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.



In proceedings 
PhD thesis 


Online presence:

On csauthors.net:


Experiences with nested parallelism in task-parallel applications using malleable BLAS on multicore processors.
Int. J. High Perform. Comput. Appl., 2024

Inference with Transformer Encoders on ARM and RISC-V Multicore Processors.
Proceedings of the Euro-Par 2024: Parallel Processing, 2024

Programming parallel dense matrix factorizations and inversion for new-generation NUMA architectures.
J. Parallel Distributed Comput., May, 2023

Co-Design of the Dense Linear AlgebravSoftware Stack for Multicore Processors.
CoRR, 2023

Fine-grain task-parallel algorithms for matrix factorizations and inversion on many-threaded CPUs.
Concurr. Comput. Pract. Exp., 2023

QR Factorization Using Malleable BLAS on Multicore Processors.
Proceedings of the High Performance Computing. ISC High Performance 2022 International Workshops - Hamburg, Germany, May 29, 2022

NUMA-Aware Dense Matrix Factorizations and Inversion with Look-Ahead on Multicore Processors.
Proceedings of the 2022 IEEE 34th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), 2022

Low precision matrix multiplication for efficient deep learning in NVIDIA Carmel processors.
J. Supercomput., 2021

A New Generation of Task-Parallel Algorithms for Matrix Inversion in Many-Threaded CPUs.
Proceedings of the PMAM@PPoPP 2021: Proceedings of the Twelfth International Workshop on Programming Models and Applications for Multicores and Manycores, 2021

Scalable Hybrid Loop- and Task-Parallel Matrix Inversion for Multicore Processors.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2021

Integration and exploitation of intra-routine malleability in BLIS.
J. Supercomput., 2020

Programming parallel dense matrix factorizations with look-ahead and OpenMP.
Clust. Comput., 2020

Dynamic look-ahead in the reduction to band form for the singular value decomposition.
Parallel Comput., 2019

Look-ahead in the two-sided reduction to compact band forms for symmetric eigenvalue problems and the SVD.
Numer. Algorithms, 2019

A Case for Malleable Thread-Level Linear Algebra Libraries: The LU Factorization With Partial Pivoting.
IEEE Access, 2019

Static scheduling of the LU factorization with look-ahead on asymmetric multicore processors.
Parallel Comput., 2018

Energy balance between voltage-frequency scaling and resilience for linear algebra routines on low-power multicore architectures.
Parallel Comput., 2018

Two-sided orthogonal reductions to condensed forms on asymmetric multicore processors.
Parallel Comput., 2018

Multi-threaded dense linear algebra libraries for low-power asymmetric multicore processors.
J. Comput. Sci., 2018

Reduction to Band Form for the Singular Value Decomposition on Graphics Accelerators.
Proceedings of the 9th International Workshop on Programming Models and Applications for Multicores and Manycores, 2018

Time and energy modeling of a high-performance multi-threaded Cholesky factorization.
J. Supercomput., 2017

Revisiting conventional task schedulers to exploit asymmetry in multi-core architectures for dense linear algebra operations.
Parallel Comput., 2017

Architecture-aware optimization of an HEVC decoder on asymmetric multicore processors.
J. Real Time Image Process., 2017

Two-Sided Reduction to Compact Band Forms with Look-Ahead.
CoRR, 2017

Reduction to Tridiagonal Form for Symmetric Eigenproblems on Asymmetric Multicore Processors.
Proceedings of the 8th International Workshop on Programming Models and Applications for Multicores and Manycores, 2017

Static Versus Dynamic Task Scheduling of the Lu Factorization on ARM big. LITTLE Architectures.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017

Tiles-and WPP-based HEVC Decoding on Asymmetric Multi-core Processors.
Proceedings of the Third IEEE International Conference on Multimedia Big Data, 2017

Architecture-aware configuration and scheduling of matrix multiplication on asymmetric multicore processors.
Clust. Comput., 2016

Refactoring Conventional Task Schedulers to Exploit Asymmetric ARM big.LITTLE Architectures in Dense Linear Algebra.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

Multimedia Communications Using a Fast and Flexible DVC to H.264/AVC/SVC Transcoder.
J. Signal Process. Syst., 2015

Fast video transcoding from HEVC to VP9.
IEEE Trans. Consumer Electron., 2015

Time and energy modeling of high-performance Level-3 BLAS on x86 architectures.
Simul. Model. Pract. Theory, 2015

Performance and Energy Optimization of Matrix Multiplication on Asymmetric big.LITTLE Processors.
CoRR, 2015

Multi-Threaded Dense Linear Algebra Libraries for Low-Power Asymmetric Multicore Processors.
CoRR, 2015

HEVC to VP9 transcoder.
Proceedings of the 2015 Visual Communications and Image Processing, 2015

Time and energy modeling of an INTRA-ONLY HEVC encoder.
Proceedings of the 2015 Visual Communications and Image Processing, 2015

An smpUMHexagonS-based motion estimation algorithm for heterogeneous architectures.
Proceedings of the 2014 IEEE International Conference on Image Processing, 2014

Parallel performance and energy efficiency of modern video encoders on multithreaded architectures.
Proceedings of the 22nd European Signal Processing Conference, 2014

H.264/AVC inter prediction for heterogeneous computing systems.
J. Supercomput., 2013

H.264/AVC inter prediction on accelerator-based multi-core systems.
Multim. Tools Appl., 2013

Adapting hierarchical bidirectional inter prediction on a GPU-based platform for 2D and 3D H.264 video coding.
EURASIP J. Adv. Signal Process., 2013

3D high definition video coding on a GPU-based heterogeneous system.
Comput. Electr. Eng., 2013

Fast transrating for high efficiency video coding based on machine learning.
Proceedings of the IEEE International Conference on Image Processing, 2013

Low delay H.264/AVC bidirectional inter prediction on a GPU.
Proceedings of the IEEE International Conference on Image Processing, 2013

Optimizing H.264/AVC interprediction on a GPU-based framework.
Concurr. Comput. Pract. Exp., 2012

A Fast GPU-Based Motion Estimation Algorithm for H.264/AVC.
Proceedings of the Advances in Multimedia Modeling - 18th International Conference, 2012

A Fast GPU-Based Motion Estimation Algorithm for HD 3D Video Coding.
Proceedings of the 10th IEEE International Symposium on Parallel and Distributed Processing with Applications, 2012

Reducing complexity in H.264/AVC motion estimation by using a GPU.
Proceedings of the IEEE 13th International Workshop on Multimedia Signal Processing (MMSP 2011), 2011

A GPU-Based DVC to H.264/AVC Transcoder.
Proceedings of the Hybrid Artificial Intelligence Systems, 5th International Conference, 2010
