Andrés Tomás

Antonio-Manuel Vidal-Maciá

Int. J. High Perform. Comput. Appl., July, 2023

Performance-energy trade-offs of deep learning convolution algorithms on ARM processors.

[BibT_eX]

[DOI]

Germán Fabregat

J. Supercomput., June, 2023

Compressed basis GMRES on high-performance graphics processing units.

[BibT_eX]

[DOI]

Thomas Grützmacher

Int. J. High Perform. Comput. Appl., March, 2023

Reformulating the direct convolution for high-performance deep learning inference on ARM processors.

[BibT_eX]

[DOI]

Upasana Sridhar

J. Syst. Archit., February, 2023

Sparse matrix-vector and matrix-multivector products for the truncated SVD on graphics processors.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2023

Tall-and-Skinny QR Factorization for Clusters of GPUs Using High-Performance Building Blocks.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2023: Parallel Processing Workshops - Euro-Par 2023 International Workshops, Limassol, Cyprus, August 28, 2023

2022

BestOf: an online implementation selector for the training and inference of deep neural networks.

[BibT_eX]

[DOI]

J. Supercomput., 2022

High performance and energy efficient inference for deep learning on multicore ARM processors using general optimization techniques and BLIS.

[BibT_eX]

[DOI]

Adrián Castelló

Sergio Barrachina

Manuel F. Dolz

Pau San Juan

J. Syst. Archit., 2022

Compression and load balancing for efficient sparse matrix-vector product on multicore processors and graphics processing units.

[BibT_eX]

[DOI]

Thomas Grützmacher

Concurr. Comput. Pract. Exp., 2022

2020

Tall-and-skinny QR factorization with approximate Householder reflectors on graphics processors.

[BibT_eX]

[DOI]

J. Supercomput., 2020

Compressed Basis GMRES on High Performance GPUs.

[BibT_eX]

[DOI]

Thomas Grützmacher

CoRR, 2020

Balanced and Compressed Coordinate Layout for the Sparse Matrix-Vector Product on GPUs.

[BibT_eX]

[DOI]

Yuhsiang M. Tsai

Proceedings of the Euro-Par 2020: Parallel Processing Workshops, 2020

2019

FloatX: A C++ Library for Customized Floating-Point Arithmetic.

[BibT_eX]

[DOI]

A. Cristiano I. Malossi

ACM Trans. Math. Softw., 2019

Dynamic look-ahead in the reduction to band form for the singular value decomposition.

[BibT_eX]

[DOI]

Rocío Carratalá-Sáez

Parallel Comput., 2019

Look-ahead in the two-sided reduction to compact band forms for symmetric eigenvalue problems and the SVD.

[BibT_eX]

[DOI]

José R. Herrero

Numer. Algorithms, 2019

Cholesky and Gram-Schmidt Orthogonalization for Tall-and-Skinny QR Factorizations on Graphics Processors.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2019: Parallel Processing, 2019

2018

Residual Replacement in Mixed-Precision Iterative Refinement for Sparse Linear Systems.

[BibT_eX]

[DOI]

Goran Flegar

Vedran Novakovic

Proceedings of the High Performance Computing, 2018

Reduction to Band Form for the Singular Value Decomposition on Graphics Accelerators.

[BibT_eX]

[DOI]

Andrés E. Tomás Dominguez

Proceedings of the 9th International Workshop on Programming Models and Applications for Multicores and Manycores, 2018

Fast Blocking of Householder Reflectors on Graphics Processors.

[BibT_eX]

[DOI]

Dimitrios S. Nikolopoulos

Proceedings of the 26th Euromicro International Conference on Parallel, 2018

The transprecision computing paradigm: Concept, design, and applications.

[BibT_eX]

[DOI]

A. Cristiano I. Malossi

Eric Flamand

Norbert Wehn

Proceedings of the 2018 Design, Automation & Test in Europe Conference & Exhibition, 2018

2017

Empirical Study and Modeling of Vehicular Communications at Intersections in the 5 GHz Band.

[BibT_eX]

[DOI]

Seilendria A. Hadiwardoyo

Carlos T. Calafate

Mob. Inf. Syst., 2017

Two-Sided Reduction to Compact Band Forms with Look-Ahead.

[BibT_eX]

[DOI]

José R. Herrero

Seilendria A. Hadiwardoyo

CoRR, 2017

On the impact of urban intersection characteristics in vehicular to vehicular (V2V) communications.

[BibT_eX]

[DOI]

Carlos T. Calafate

Proceedings of the 13th International Wireless Communications and Mobile Computing Conference, 2017

Evaluating the use of sub-gigahertz wireless technologies to improve message delivery in opportunistic networks.

[BibT_eX]

[DOI]

Marco Zennaro

Proceedings of the 14th IEEE International Conference on Networking, Sensing and Control, 2017

Variable-Size Batched Gauss-Huard for Block-Jacobi Preconditioning.

[BibT_eX]

[DOI]

Jack J. Dongarra

Goran Flegar

Proceedings of the International Conference on Computational Science, 2017

Selecting the optimal buffer management for opportunistic networks both in pedestrian and vehicular contexts.

[BibT_eX]

[DOI]

Proceedings of the 14th IEEE Annual Consumer Communications & Networking Conference, 2017

Mobility as the Main Enabler of Opportunistic Data Dissemination in Urban Scenarios.

[BibT_eX]

[DOI]

Anna Förster

Asanga Udugama

Proceedings of the Ad-hoc, Mobile, and Wireless Networks, 2017

2016

Friendly-Sharing: Improving the Performance of City Sensoring through Contact-Based Messaging Applications.

[BibT_eX]

[DOI]

Sensors, 2016

MuffinEc: Error correction for de Novo assembly via greedy partitioning and sequence alignment.

[BibT_eX]

[DOI]

Inf. Sci., 2016

Evaluating the Impact of Data Transfer Time and Mobility Patterns in Opportunistic Networks.

[BibT_eX]

[DOI]

Proceedings of the 2016 Intl IEEE Conferences on Ubiquitous Intelligence & Computing, 2016

Improving Message Delivery Performance in Opportunistic Networks Using a Forced-Stop Diffusion Scheme.

[BibT_eX]

[DOI]

Proceedings of the Ad-hoc, Mobile, and Wireless Networks - 15th International Conference, 2016

2014

Inexact Sequence Mapping Study Cases: Hybrid GPU Computing and Memory Demanding Indexes.

[BibT_eX]

[DOI]

Proceedings of the International Work-Conference on Bioinformatics and Biomedical Engineering, 2014

Robust Error Correction for De Novo Assembly via Spectral Partitioning and Sequence Alignment.

[BibT_eX]

[DOI]

Proceedings of the International Work-Conference on Bioinformatics and Biomedical Engineering, 2014

A Fast Sparse Block Circulant Matrix Vector Product.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2014 Parallel Processing, 2014

2012

Using GPUs for the Exact Alignment of Short-Read Genetic Sequences by Means of the Burrows-Wheeler Transform.

[BibT_eX]

[DOI]

José Salavert Torres

Ignacio Blanquer Espert

Andrés Tomás Dominguez

IEEE ACM Trans. Comput. Biol. Bioinform., 2012

Parallelization of the QR Decomposition with Column Pivoting Using Column Cyclic Distribution on Multicore and GPU Processors.

[BibT_eX]

[DOI]

Zhaojun Bai

Proceedings of the High Performance Computing for Computational Science, 2012

Advancing Large Scale Many-Body QMC Simulations on GPU Accelerated Multicore Systems.

[BibT_eX]

[DOI]

Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

2007

Parallel Arnoldi eigensolvers with enhanced scalability via global communications rearrangement.

[BibT_eX]

[DOI]

José E. Román

Parallel Comput., 2007

2006

Evaluation of Several Variants of Explicitly Restarted Lanczos Eigensolvers and Their Parallel Implementations.

[BibT_eX]

[DOI]

José E. Román

Proceedings of the High Performance Computing for Computational Science, 2006

2005

A Parallel Variant of the Gram-Schmidt Process with Reorthogonalization.

[BibT_eX]

José E. Román