Xavier Martorell

According to our database1, Xavier Martorell authored at least 161 papers between 1995 and 2019.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Other 

Links

Homepages:

On csauthors.net:

Bibliography

2019
Auto-tuned OpenCL kernel co-execution in OmpSs for heterogeneous systems.
J. Parallel Distrib. Comput., 2019

BLAS-3 Optimized by OmpSs Regions (LASs Library).
Proceedings of the 27th Euromicro International Conference on Parallel, 2019

2018
Performance and energy effects on task-based parallelized applications - User-directed versus manual vectorization.
The Journal of Supercomputing, 2018

cuThomasBatch and cuThomasVBatch, CUDA Routines to compute batch of tridiagonal systems on NVIDIA GPUs.
Concurrency and Computation: Practice and Experience, 2018

Analyzing the impact of communication imbalance in high-speed networks.
Concurrency and Computation: Practice and Experience, 2018

Formalization of Block Pruning: Reducing the Number of Cells Computed in Exact Biological Sequence Comparison Algorithms.
Comput. J., 2018


MPI+OpenMP Tasking Scalability for the Simulation of the Human Brain: Human Brain Project.
Proceedings of the 25th European MPI Users' Group Meeting, 2018

Variable Batched DGEMM.
Proceedings of the 26th Euromicro International Conference on Parallel, 2018

Analysis of the Impact Factors on Data Error Propagation in HPC Applications.
Proceedings of the 26th Euromicro International Conference on Parallel, 2018

Application Acceleration on FPGAs with OmpSs@FPGA.
Proceedings of the International Conference on Field-Programmable Technology, 2018


Safe Parallelism: Compiler Analysis Techniques for Ada and OpenMP.
Proceedings of the Reliable Software Technologies - Ada-Europe 2018, 2018

2017
The Hipster Approach for Improving Cloud System Efficiency.
ACM Trans. Comput. Syst., 2017

The AXIOM platform for next-generation cyber physical systems.
Microprocessors and Microsystems - Embedded Hardware Design, 2017

Automatic Scan Parallelization in OpenMP.
Proceedings of the 2017 International Symposium on Computer Architecture and High Performance Computing Workshops, 2017

Extending OmpSs for OpenCL Kernel Co-Execution in Heterogeneous Systems.
Proceedings of the 29th International Symposium on Computer Architecture and High Performance Computing, 2017

NVIDIA GPUs Scalability to Solve Multiple (Batch) Tridiagonal Systems Implementation of cuThomasBatch.
Proceedings of the Parallel Processing and Applied Mathematics, 2017

Implementation of the K-Means Algorithm on Heterogeneous Devices: A Use Case Based on an Industrial Dataset.
Proceedings of the Parallel Computing is Everywhere, 2017

A Functional Safety OpenMP ^* for Critical Real-Time Embedded Systems.
Proceedings of the Scaling OpenMP for Exascale Performance and Portability, 2017

Characterizing and Improving the Performance of Many-Core Task-Based Parallel Programming Runtimes.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017

cuHinesBatch: Solving Multiple Hines systems on GPUs Human Brain Project*.
Proceedings of the International Conference on Computational Science, 2017

Hipster: Hybrid Task Manager for Latency-Critical Cloud Workloads.
Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture, 2017

OpenMP Tasking Model for Ada: Safety and Correctness.
Proceedings of the Reliable Software Technologies - Ada-Europe 2017, 2017

Exploiting Parallelism on GPUs and FPGAs with OmpSs.
Proceedings of the 1st Workshop on AutotuniNg and aDaptivity AppRoaches for Energy efficient HPC Systems, 2017

2016
CUDAlign 4.0: Incremental Speculative Traceback for Exact Chromosome-Wide Alignment in GPU Clusters.
IEEE Trans. Parallel Distrib. Syst., 2016

Combining Static and Dynamic Data Coalescing in Unified Parallel C.
IEEE Trans. Parallel Distrib. Syst., 2016

MASA: A Multiplatform Architecture for Sequence Aligners with Block Pruning.
TOPC, 2016

Using shared-data localization to reduce the cost of inspector-execution in unified-parallel-C programs.
Parallel Computing, 2016

The AXIOM software layers.
Microprocessors and Microsystems - Embedded Hardware Design, 2016


REPP-H: Runtime Estimation of Power and Performance on Heterogeneous Data Centers.
Proceedings of the 28th International Symposium on Computer Architecture and High Performance Computing, 2016

Analyzing Data-Error Propagation Effects in High-Performance Computing.
Proceedings of the 24th Euromicro International Conference on Parallel, 2016

The Secrets of the Accelerators Unveiled: Tracing Heterogeneous Executions Through OMPT.
Proceedings of the OpenMP: Memory, Devices, and Tasks, 2016

Supporting Adaptive Privatization Techniques for Irregular Array Reductions in Task-Parallel Programming Models.
Proceedings of the OpenMP: Memory, Devices, and Tasks, 2016

Work-efficient parallel non-maximum suppression for embedded GPU architectures.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

RePP-C: Runtime estimation of performance-power with workload consolidation in CMPs.
Proceedings of the Seventh International Green and Sustainable Computing Conference, 2016


A lightweight OpenMP4 run-time for embedded systems.
Proceedings of the 21st Asia and South Pacific Design Automation Conference, 2016

2015
Resource-Aware Task Scheduling.
ACM Trans. Embedded Comput. Syst., 2015

Hardware-Software Coherence Protocol for the Coexistence of Caches and Local Memories.
IEEE Trans. Computers, 2015

Coarse-Grain Performance Estimator for Heterogeneous Parallel Computing Architectures like Zynq All-Programmable SoC.
CoRR, 2015

The AXIOM project (Agile, eXtensible, fast I/O Module).
Proceedings of the 2015 International Conference on Embedded Computer Systems: Architectures, 2015

Exploring Memory Error Vulnerability for Parallel Programming Models.
Proceedings of the Parallel Processing and Applied Mathematics, 2015

Evaluating the Performance Impact of Communication Imbalance in Sparse Matrix-Vector Multiplication.
Proceedings of the 23rd Euromicro International Conference on Parallel, 2015

Evaluating the Impact of OpenMP 4.0 Extensions on Relevant Parallel Workloads.
Proceedings of the OpenMP: Heterogenous Execution and Data Movements, 2015

Towards Task-Parallel Reductions in OpenMP.
Proceedings of the OpenMP: Heterogenous Execution and Data Movements, 2015

Coherence protocol for transparent management of scratchpad memories in shared memory manycore architectures.
Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

In search of the best MPI-OpenMP distribution for optimum Intel-MIC cluster performance.
Proceedings of the 2015 International Conference on High Performance Computing & Simulation, 2015

Optimizing Overlapped Memory Accesses in User-directed Vectorization.
Proceedings of the 29th ACM on International Conference on Supercomputing, 2015

A Methodology to Build Models and Predict Performance-Power in CMPs.
Proceedings of the 44th International Conference on Parallel Processing Workshops, 2015

Matchmaking Applications and Partitioning Strategies for Efficient Execution on Heterogeneous Platforms.
Proceedings of the 44th International Conference on Parallel Processing, 2015

Boosting irregular array Reductions through In-lined Block-ordering on fast processors.
Proceedings of the 2015 IEEE High Performance Extreme Computing Conference, 2015


Compiler analysis for OpenMP tasks correctness.
Proceedings of the 12th ACM International Conference on Computing Frontiers, 2015

Runtime-Guided Management of Scratchpad Memories in Multicore Architectures.
Proceedings of the 2015 International Conference on Parallel Architectures and Compilation, 2015

2014
Leveraging OmpSs to Exploit Hardware Accelerators.
Proceedings of the 26th IEEE International Symposium on Computer Architecture and High Performance Computing, 2014

Reducing Compiler-Inserted Instrumentation in Unified-Parallel-C Code Generation.
Proceedings of the 26th IEEE International Symposium on Computer Architecture and High Performance Computing, 2014

Fine-grain parallel megabase sequence comparison with multiple heterogeneous GPUs.
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2014

Task-Parallel Reductions in OpenMP and OmpSs.
Proceedings of the Using and Improving OpenMP for Devices, Tasks, and More, 2014

Analyzing the impact of programming models for efficient communication overlap in high-speed networks.
Proceedings of the International Conference on High Performance Computing & Simulation, 2014

OmpSs@Zynq all-programmable SoC ecosystem.
Proceedings of the 2014 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2014

Task-Based Programming with OmpSs and Its Application.
Proceedings of the Euro-Par 2014: Parallel Processing Workshops, 2014

CUDAlign 3.0: Parallel Biological Sequence Comparison in Large GPU Clusters.
Proceedings of the 14th IEEE/ACM International Symposium on Cluster, 2014

2013
A Systematic Methodology to Generate Decomposable and Responsive Power Models for CMPs.
IEEE Trans. Computers, 2013

Rendering of Bézier Surfaces on Handheld Devices.
Journal of WSCG, 2013

Counter-Based Power Modeling Methods: Top-Down vs. Bottom-Up.
Comput. J., 2013

Heterogeneous tasking on SMP/FPGA SoCs: The case of OmpSs and the Zynq.
Proceedings of the 21st IEEE/IFIP International Conference on VLSI and System-on-Chip, 2013

A Proposal for Task-Generating Loops in OpenMP.
Proceedings of the OpenMP in the Era of Low Power Devices and Accelerators, 2013

An OpenMP* Barrier Using SIMD Instructions for Intel® Xeon PhiTM Coprocessor.
Proceedings of the OpenMP in the Era of Low Power Devices and Accelerators, 2013

Implementing OmpSs support for regions of data in architectures with multiple address spaces.
Proceedings of the International Conference on Supercomputing, 2013

Improving performance of all-to-all communication through loop scheduling in PGAS environments.
Proceedings of the International Conference on Supercomputing, 2013

Improving communication in PGAS environments: static and dynamic coalescing in UPC.
Proceedings of the International Conference on Supercomputing, 2013

2012
DMA++: On the Fly Data Realignment for On-Chip Memories.
IEEE Trans. Computers, 2012

Energy accounting for shared virtualized environments under DVFS using PMC-based power models.
Future Generation Comp. Syst., 2012

POTRA: a framework for building power models for next generation multicore architectures.
Proceedings of the ACM SIGMETRICS/PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems, 2012

Hardware-software coherence protocol for the coexistence of caches and local memories.
Proceedings of the SC Conference on High Performance Computing Networking, 2012

Compiler Automatic Discovery of OmpSs Task Dependencies.
Proceedings of the Languages and Compilers for Parallel Computing, 2012

Extending OpenMP* with Vector Constructs for Modern Multicore SIMD Architectures.
Proceedings of the OpenMP in a Heterogeneous World - 8th International Workshop on OpenMP, 2012

Productive Programming of GPU Clusters with OmpSs.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

Accelerating Boosting-Based Face Detection on GPUs.
Proceedings of the 41st International Conference on Parallel Processing, 2012

On the Instrumentation of OpenMP and OmpSs Tasking Constructs.
Proceedings of the Euro-Par 2012: Parallel Processing Workshops, 2012

DMA-circular: an enhanced high level programmable DMA controller for optimized management of on-chip local memories.
Proceedings of the Computing Frontiers Conference, CF'12, 2012

Automatic communication coalescing for irregular computations in UPC language.
Proceedings of the Center for Advanced Studies on Collaborative Research, 2012

2D-FMFI SAR application on HPC architectures with OmpSs parallel programming model.
Proceedings of the 2012 NASA/ESA Conference on Adaptive Hardware and Systems, 2012

2011
Ompss: a Proposal for Programming Heterogeneous Multi-Core Architectures.
Parallel Processing Letters, 2011

ACOTES Project: Advanced Compiler Technologies for Embedded Streaming.
International Journal of Parallel Programming, 2011

Local Memory Design Space Exploration for High-Performance Computing.
Comput. J., 2011

Implementation of a hierarchical N-body simulator using the Ompss programming model.
Proceedings of the first workshop on Irregular applications: architectures and algorithm, 2011

Poster: programming clusters of GPUs with OMPSs.
Proceedings of the 25th International Conference on Supercomputing, 2011, Tucson, AZ, USA, May 31, 2011

Real-time GPU-based face detection in HD video sequences.
Proceedings of the IEEE International Conference on Computer Vision Workshops, 2011

Design space exploration for aggressive core replication schemes in CMPs.
Proceedings of the 20th ACM International Symposium on High Performance Distributed Computing, 2011

Productive Cluster Programming with OmpSs.
Proceedings of the Euro-Par 2011 Parallel Processing - 17th International Conference, 2011

Scalability Evaluation of a Polymorphic Register File: A CG Case Study.
Proceedings of the Architecture of Computing Systems - ARCS 2011, 2011

2010
Automatic Prefetch and Modulo Scheduling Transformations for the Cell BE Architecture.
IEEE Trans. Parallel Distrib. Syst., 2010

Transient Congestion Avoidance in Software Distributed Shared Memory Systems.
Proceedings of the 2010 International Conference on Parallel and Distributed Computing, 2010

Optimizing the Exploitation of Multicore Processors and GPUs with OpenMP and OpenCL.
Proceedings of the Languages and Compilers for Parallel Computing, 2010

Decomposable and responsive power models for multicore processors using performance counters.
Proceedings of the 24th International Conference on Supercomputing, 2010

DMA++: on the fly data realignment for on-chip memories.
Proceedings of the 16th International Conference on High-Performance Computer Architecture (HPCA-16 2010), 2010

Analysis of Task Offloading for Accelerators.
Proceedings of the High Performance Embedded Architectures and Compilers, 2010

Accurate energy accounting for shared virtualized environments using PMC-based power modeling techniques.
Proceedings of the 2010 11th IEEE/ACM International Conference on Grid Computing, 2010

Reducing data access latency in SDSM systems using runtime optimizations.
Proceedings of the 2010 conference of the Centre for Advanced Studies on Collaborative Research, 2010

2009
OpenMP extensions for FPGA accelerators.
Proceedings of the 2009 International Conference on Embedded Computer Systems: Architectures, 2009

Adaptive and Speculative Memory Consistency Support for Multi-core Architectures with On-Chip Local Memories.
Proceedings of the Languages and Compilers for Parallel Computing, 2009

Unrolling Loops Containing Task Parallelism.
Proceedings of the Languages and Compilers for Parallel Computing, 2009

A Proposal to Extend the OpenMP Tasking Model for Heterogeneous Architectures.
Proceedings of the Evolving OpenMP in an Age of Extreme Parallelism, 2009

Barcelona OpenMP Tasks Suite: A Set of Benchmarks Targeting the Exploitation of Task Parallelism in OpenMP.
Proceedings of the ICPP 2009, 2009

OpenMP tasking analysis for programmers.
Proceedings of the 2009 conference of the Centre for Advanced Studies on Collaborative Research, 2009

2008
Automatic Pre-Fetch and Modulo Scheduling Transformations for the Cell BE Architecture.
Proceedings of the Languages and Compilers for Parallel Computing, 2008

OpenMP tasks in IBM XL compilers.
Proceedings of the 2008 conference of the Centre for Advanced Studies on Collaborative Research, 2008

Hybrid access-specific software cache techniques for the cell BE architecture.
Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, 2008

2007
High-Performance Embedded Architecture and Compilation Roadmap.
Trans. HiPEAC, 2007

A Proposal for Error Handling in OpenMP.
International Journal of Parallel Programming, 2007

A Streaming Machine Description and Programming Model.
Proceedings of the Embedded Computer Systems: Architectures, 2007

A Novel Asynchronous Software Cache Implementation for the Cell-BE Processor.
Proceedings of the Languages and Compilers for Parallel Computing, 2007

Transactional Memory and OpenMP.
Proceedings of the A Practical Programming Model for the Multi-Core Era, 2007

Performance Analysis of Cell Broadband Engine for High Memory Bandwidth Applications.
Proceedings of the 2007 IEEE International Symposium on Performance Analysis of Systems and Software, 2007

Support for OpenMP tasks in Nanos v4.
Proceedings of the 2007 conference of the Centre for Advanced Studies on Collaborative Research, 2007

2006
Running OpenMP applications efficiently on an everything-shared SDSM.
J. Parallel Distrib. Comput., 2006

Employing nested OpenMP for the parallelization of multi-zone computational fluid dynamics applications.
J. Parallel Distrib. Comput., 2006

Exploiting multilevel parallelism using OpenMP on a massive multithreaded architecture.
J. Embedded Computing, 2006

Runtime Address Space Computation for SDSM Systems.
Proceedings of the Languages and Compilers for Parallel Computing, 2006

A Proposal for Error Handling in OpenMP.
Proceedings of the OpenMP Shared Memory Parallel Programming - International Workshops, 2006

Techniques supporting threadprivate in OpenMP.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

2005
Performance-Driven Processor Allocation.
IEEE Trans. Parallel Distrib. Syst., 2005

Blue Gene/L performance tools.
IBM Journal of Research and Development, 2005

Design and implementation of message-passing services for the Blue Gene/L supercomputer.
IBM Journal of Research and Development, 2005

Experiences Parallelizing a Web Server with OpenMP.
Proceedings of the OpenMP Shared Memory Parallel Programming - International Workshops, 2005

Optimizing NANOS OpenMP for the IBM Cyclops Multithreaded Architecture.
Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

Optimization of MPI collective communication on BlueGene/L systems.
Proceedings of the 19th Annual International Conference on Supercomputing, 2005

2004
Page Migration with Dynamic Space-Sharing Scheduling Policies: The Case of the SGI O2000.
International Journal of Parallel Programming, 2004

Architecture and Performance of the BlueGene/L Message Layer.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2004

Running OpenMP Applications Efficiently on an Everything-Shared SDSM.
Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), 2004

Employing Nested OpenMP for the Parallelization of Multi-Zone Computational Fluid Dynamics Applications.
Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), 2004


2003
Automatic multilevel parallelization using OpenMP.
Scientific Programming, 2003

An Overview Of The Bluegene/L System Software Organization.
Parallel Processing Letters, 2003

Is the Schedule Clause Really Necessary in OpenMP?
Proceedings of the OpenMP Shared Memory Parallel Programming, 2003

Evaluation of OpenMP for the Cyclops Multithreaded Architecture.
Proceedings of the OpenMP Shared Memory Parallel Programming, 2003

Enabling Dual-Core Mode in BlueGene/L: Challenges and Solutions.
Proceedings of the 15th Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2003), 2003

MPI on BlueGene/L: Designing an Efficient General Purpose Messaging Solution for a Large Cellular System.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface,10th European PVM/MPI Users' Group Meeting, Venice, Italy, September 29, 2003

Application/Kernel Cooperation Towards the Efficient Execution of Shared-Memory Parallel Java Codes.
Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), 2003

Evaluation of the memory page migration influence in the system performance: the case of the SGI O2000.
Proceedings of the 17th Annual International Conference on Supercomputing, 2003

An Overview of the Blue Gene/L System Software Organization.
Proceedings of the Euro-Par 2003. Parallel Processing, 2003

2002
Dual-Level Parallelism Exploitation with OpenMP in Coastal Ocean Circulation Modeling.
Proceedings of the High Performance Computing, 4th International Symposium, 2002

2001
Defining and Supporting Pipelined Executions in OpenMP.
Proceedings of the OpenMP Shared Memory Parallel Programming, 2001

Improving Gang Scheduling through job performance analysis and malleability.
Proceedings of the 15th international conference on Supercomputing, 2001

Complex Pipelined Executions in OpenMP Parallel Applications.
Proceedings of the 2001 International Conference on Parallel Processing, 2001

2000
NanosCompiler: supporting flexible multilevel parallelism exploitation in OpenMP.
Concurrency - Practice and Experience, 2000

Performance-Driven Processor Allocation.
Proceedings of the 4th Symposium on Operating System Design and Implementation (OSDI 2000), 2000

OpenMP Extensions for Thread Groups and Their Run-Time Support.
Proceedings of the Languages and Compilers for Parallel Computing, 2000

A Tool to Schedule Parallel Applications on Multiprocessors: The NANOS CPU MANAGER.
Proceedings of the Job Scheduling Strategies for Parallel Processing, IPDPS 2000 Workshop, 2000

Applying Interposition Techniques for Performance Analysis of OpenMP Parallel Applications.
Proceedings of the 14th International Parallel & Distributed Processing Symposium (IPDPS'00), 2000

1999
Thread fork/join techniques for multi-level parallelism exploitation in NUMA multiprocessors.
Proceedings of the 13th international conference on Supercomputing, 1999

Exploiting Multiple Levels of Parallelism in OpenMP: A Case Study.
Proceedings of the International Conference on Parallel Processing 1999, 1999

1998
Experiences on implementing PARMACS macros to run the SPLASH-2 suite on multiprocessors.
Proceedings of the Sixth Euromicro Workshop on Parallel and Distributed Processing, 1998

Kernel-level Scheduling for the Nano-threads Programming Model.
Proceedings of the 12th international conference on Supercomputing, 1998

1997
Exploiting Parallelism Through Directives on the Nano-Threads Programming Model.
Proceedings of the Languages and Compilers for Parallel Computing, 1997

Analysis of Several Scheduling Algorithms under the Nano-Thread Programming Model.
Proceedings of the 11th International Parallel Processing Symposium (IPPS '97), 1997

1996
A Library Implementation of the Nano-Threads Programming Model.
Proceedings of the Euro-Par '96 Parallel Processing, 1996

1995
The eXc Model: Scheduler-Activations on Mach 3.0.
Proceedings of the Seventh IASTED/ISMM International Conference on Parallel and Distributed Computing and Systems, 1995


  Loading...