Eduard Ayguadé

According to our database1, Eduard Ayguadé authored at least 355 papers between 1989 and 2018.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Other 

Links

Homepages:

On csauthors.net:

Bibliography

2018
Memory Controller for Vector Processor.
Signal Processing Systems, 2018

Asynchronous and Exact Forward Recovery for Detected Errors in Iterative Solvers.
IEEE Trans. Parallel Distrib. Syst., 2018

Automated curation of brand-related social media images with deep learning.
Multimedia Tools Appl., 2018

EMVS: Embedded Multi Vector-core System.
Journal of Systems Architecture - Embedded Systems Design, 2018

An approach to task-based parallel programming for undergraduate students.
J. Parallel Distrib. Comput., 2018

On the Behavior of Convolutional Nets for Feature Extraction.
J. Artif. Intell. Res., 2018

A Visual Distance for WordNet.
CoRR, 2018

Low-Precision Floating-Point Schemes for Neural Network Training.
CoRR, 2018

Formalization of Block Pruning: Reducing the Number of Cells Computed in Exact Biological Sequence Comparison Algorithms.
Comput. J., 2018

Graph partitioning applied to DAG scheduling to reduce NUMA effects.
Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2018

Reducing Data Movement on Large Shared Memory Systems by Exploiting Computation Dependencies.
Proceedings of the 32nd International Conference on Supercomputing, 2018

Runtime-Guided Management of Stacked DRAM Memories in Task Parallel Programs.
Proceedings of the 32nd International Conference on Supercomputing, 2018

HPC Benchmarking: Scaling Right and Looking Beyond the Average.
Proceedings of the Euro-Par 2018: Parallel Processing, 2018

A Visual Distance for WordNet.
Proceedings of the Artificial Intelligence Research and Development, 2018

2017
Task Scheduling Techniques for Asymmetric Multi-Core Systems.
IEEE Trans. Parallel Distrib. Syst., 2017

Main Memory in HPC: Do We Need More or Could We Live with Less?
TACO, 2017

The AXIOM platform for next-generation cyber physical systems.
Microprocessors and Microsystems - Embedded Hardware Design, 2017

Full-Network Embedding in a Multimodal Embedding Pipeline.
CoRR, 2017

Fluid Communities: A Community Detection Algorithm.
CoRR, 2017

Building Graph Representations of Deep Vector Embeddings.
CoRR, 2017

An Out-of-the-box Full-network Embedding for Convolutional Neural Networks.
CoRR, 2017

On the Behavior of Convolutional Nets for Feature Extraction.
CoRR, 2017

Identifying the potential of Near Data Computing for Apache Spark.
CoRR, 2017

A visual embedding for the unsupervised extraction of abstract semantics.
Cognitive Systems Research, 2017

Extending OmpSs for OpenCL Kernel Co-Execution in Heterogeneous Systems.
Proceedings of the 29th International Symposium on Computer Architecture and High Performance Computing, 2017

Efficient exception handling support for GPUs.
Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

Identifying the potential of near data processing for apache spark.
Proceedings of the International Symposium on Memory Systems, 2017

Adaptive and Architecture-Independent Task Granularity for Recursive Applications.
Proceedings of the Scaling OpenMP for Exascale Performance and Portability, 2017

General Purpose Task-Dependence Management Hardware for Task-Based Dataflow Programming Models.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

Improving the Integration of Task Nesting and Dependencies in OpenMP.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

Characterizing and Improving the Performance of Many-Core Task-Based Parallel Programming Runtimes.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017

Picos, A Hardware Task-Dependence Manager for Task-Based Dataflow Programming Models.
Proceedings of the 2017 International Conference on High Performance Computing & Simulation, 2017

A Directive-Based Approach to Perform Persistent Checkpoint/Restart.
Proceedings of the 2017 International Conference on High Performance Computing & Simulation, 2017

Exploiting Key-Value Data Stores Scalability for HPC.
Proceedings of the 46th International Conference on Parallel Processing Workshops, 2017

Efficient Data Sharing on Heterogeneous Systems.
Proceedings of the 46th International Conference on Parallel Processing, 2017

ParaView + Alya + D8tree: Integrating High Performance Computing and High Performance Data Analytics.
Proceedings of the International Conference on Computational Science, 2017

Fluid Communities: A Competitive, Scalable and Diverse Community Detection Algorithm.
Proceedings of the Complex Networks & Their Applications VI, 2017

Low-latency multi-threaded ensemble learning for dynamic big data streams.
Proceedings of the 2017 IEEE International Conference on Big Data, BigData 2017, 2017

2016
CUDAlign 4.0: Incremental Speculative Traceback for Exact Chromosome-Wide Alignment in GPU Clusters.
IEEE Trans. Parallel Distrib. Syst., 2016

MASA: A Multiplatform Architecture for Sequence Aligners with Block Pruning.
TOPC, 2016

PARSECSs: Evaluating the Impact of Task Parallelism in the PARSEC Benchmark Suite.
TACO, 2016

The AXIOM software layers.
Microprocessors and Microsystems - Embedded Hardware Design, 2016

Hierarchical Hyperlink Prediction for the WWW.
CoRR, 2016

Limitations and Alternatives for the Evaluation of Large-scale Link Prediction.
CoRR, 2016

Architectural Impact on Performance of In-memory Data Analytics: Apache Spark Case Study.
CoRR, 2016


MUSA: a multi-level simulation approach for next-generation HPC machines.
Proceedings of the International Conference for High Performance Computing, 2016

Large-Memory Nodes for Energy Efficient High-Performance Computing.
Proceedings of the Second International Symposium on Memory Systems, 2016

Multiple Target Task Sharing Support for the OpenMP Accelerator Model.
Proceedings of the OpenMP: Memory, Devices, and Tasks, 2016

The Secrets of the Accelerators Unveiled: Tracing Heterogeneous Executions Through OMPT.
Proceedings of the OpenMP: Memory, Devices, and Tasks, 2016

Supporting Adaptive Privatization Techniques for Irregular Array Reductions in Task-Parallel Programming Models.
Proceedings of the OpenMP: Memory, Devices, and Tasks, 2016

Performance analysis of a hardware accelerator of dependence management for task-based dataflow programming models.
Proceedings of the 2016 IEEE International Symposium on Performance Analysis of Systems and Software, 2016

TaskPoint: Sampled simulation of task-based programs.
Proceedings of the 2016 IEEE International Symposium on Performance Analysis of Systems and Software, 2016

CATA: Criticality Aware Task Acceleration for Multicore Processors.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

Evaluating the effect of last-level cache sharing on integrated GPU-CPU systems with heterogeneous applications.
Proceedings of the 2016 IEEE International Symposium on Workload Characterization, 2016

Runtime-Guided Mitigation of Manufacturing Variability in Power-Constrained Multi-Socket NUMA Nodes.
Proceedings of the 2016 International Conference on Supercomputing, 2016

D8-tree: a de-normalized approach for multidimensional data analysis on key-value databases.
Proceedings of the 17th International Conference on Distributed Computing and Networking, 2016


On the Representativeness of Convolutional Neural Networks Layers.
Proceedings of the Artificial Intelligence Research and Development, 2016

User-generated content curation with deep convolutional neural networks.
Proceedings of the 2016 IEEE International Conference on Big Data, 2016

Micro-Architectural Characterization of Apache Spark on Batch and Stream Processing Workloads.
Proceedings of the 2016 IEEE International Conferences on Big Data and Cloud Computing (BDCloud), 2016

Node architecture implications for in-memory data analytics on scale-in clusters.
Proceedings of the 3rd IEEE/ACM International Conference on Big Data Computing, 2016

Echo State Hoeffding Tree Learning.
Proceedings of The 8th Asian Conference on Machine Learning, 2016

POSTER: Collective Dynamic Parallelism for Directive Based GPU Programming Languages and Compilers.
Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

POSTER: Exploiting Asymmetric Multi-Core Processors with Flexible System Sofware.
Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

Reducing Cache Coherence Traffic with Hierarchical Directory Cache and NUMA-Aware Runtime Scheduling.
Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

2015
Hardware-Software Coherence Protocol for the Coexistence of Caches and Local Memories.
IEEE Trans. Computers, 2015

AMC: Advanced Multi-accelerator Controller.
Parallel Computing, 2015

DaSH: A benchmark suite for hybrid dataflow and shared memory programming models.
Parallel Computing, 2015

Extracting Visual Patterns from Deep Learning Representations.
CoRR, 2015

How Data Volume Affects \\Spark Based Data Analytics on a Scale-up Server.
CoRR, 2015

Performance Characterization of In-Memory Data Analytics on a Modern Cloud Server.
CoRR, 2015

Tareador: a tool to unveil parallelization strategies at undergraduate level.
Proceedings of the Workshop on Computer Architecture Education, 2015

SSMART: smart scheduling of multi-architecture tasks on heterogeneous systems.
Proceedings of the Second Workshop on Accelerator Programming using Directives, 2015

Exploring dynamic parallelism in OpenMP.
Proceedings of the Second Workshop on Accelerator Programming using Directives, 2015

Exploiting asynchrony from exact forward recovery for DUE in iterative solvers.
Proceedings of the International Conference for High Performance Computing, 2015

The AXIOM project (Agile, eXtensible, fast I/O Module).
Proceedings of the 2015 International Conference on Embedded Computer Systems: Architectures, 2015

Experiences of Using Cassandra for Molecular Dynamics Simulations.
Proceedings of the 23rd Euromicro International Conference on Parallel, 2015

Another Trip to the Wall: How Much Will Stacked DRAM Benefit HPC?
Proceedings of the 2015 International Symposium on Memory Systems, 2015

Evaluating the Impact of OpenMP 4.0 Extensions on Relevant Parallel Workloads.
Proceedings of the OpenMP: Heterogenous Execution and Data Movements, 2015

Towards Task-Parallel Reductions in OpenMP.
Proceedings of the OpenMP: Heterogenous Execution and Data Movements, 2015

Coherence protocol for transparent management of scratchpad memories in shared memory manycore architectures.
Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

Criticality-Aware Dynamic Task Scheduling for Heterogeneous Architectures.
Proceedings of the 29th ACM on International Conference on Supercomputing, 2015

Self-Tuned Software-Managed Energy Reduction in InfiniBand Links.
Proceedings of the 21st IEEE International Conference on Parallel and Distributed Systems, 2015

AMA: Asynchronous Management of Accelerators for Task-based Programming Models.
Proceedings of the International Conference on Computational Science, 2015

Automatic Query Driven Data Modelling in Cassandra.
Proceedings of the International Conference on Computational Science, 2015

Auto-Tuning OmpSs-OpenCL Kernels Across GPU Machines.
Proceedings of the 6th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures and the 4th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms, 2015



Evaluating Link Prediction on Large Graphs.
Proceedings of the Artificial Intelligence Research and Development, 2015

How Data Volume Affects Spark Based Data Analytics on a Scale-up Server.
Proceedings of the Big Data Benchmarks, Performance Optimization, and Emerging Hardware, 2015

ViPS: Visual processing system for medical imaging.
Proceedings of the 8th International Conference on Biomedical Engineering and Informatics, 2015

Multimedia Big Data Computing for In-Depth Event Analysis.
Proceedings of the 2015 IEEE International Conference on Multimedia Big Data, BigMM 2015, 2015

Spark deployment and performance evaluation on the MareNostrum supercomputer.
Proceedings of the 2015 IEEE International Conference on Big Data, 2015

Performance Characterization of In-Memory Data Analytics on a Modern Cloud Server.
Proceedings of the Fifth IEEE International Conference on Big Data and Cloud Computing, 2015

Runtime-Guided Management of Scratchpad Memories in Multicore Architectures.
Proceedings of the 2015 International Conference on Parallel Architecture and Compilation, 2015

2014
PMSS: A programmable memory system and scheduler for complex memory patterns.
J. Parallel Distrib. Comput., 2014

A methodology for the evaluation of high response time on E-commerce users and sales.
Information Systems Frontiers, 2014

Automatic Exploration of Potential Parallelism in Sequential Applications.
Proceedings of the Supercomputing - 29th International Conference, 2014

Scalability and Parallel Execution of OmpSs-OpenCL Tasks on Heterogeneous CPU-GPU Environment.
Proceedings of the Supercomputing - 29th International Conference, 2014

A data flow language to develop high performance computing DSLs.
Proceedings of the Fourth International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing, 2014

Leveraging OmpSs to Exploit Hardware Accelerators.
Proceedings of the 26th IEEE International Symposium on Computer Architecture and High Performance Computing, 2014

A Case Study of Hybrid Dataflow and Shared-Memory Programming Models: Dependency-Based Parallel Game Engine.
Proceedings of the 26th IEEE International Symposium on Computer Architecture and High Performance Computing, 2014

Analyzing Performance Improvements and Energy Savings in Infiniband Architecture using Network Compression.
Proceedings of the 26th IEEE International Symposium on Computer Architecture and High Performance Computing, 2014

PAMS: Pattern Aware Memory System for embedded systems.
Proceedings of the 2014 International Conference on ReConFigurable Computing and FPGAs, 2014

Fine-grain parallel megabase sequence comparison with multiple heterogeneous GPUs.
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2014

Towards the Cloudification of the Social Networks Analytics.
Proceedings of the Modeling Decisions for Artificial Intelligence, 2014

Towards Transactional Memory for OpenMP.
Proceedings of the Using and Improving OpenMP for Devices, Tasks, and More, 2014

On the Roles of the Programmer, the Compiler and the Runtime System When Programming Accelerators in OpenMP.
Proceedings of the Using and Improving OpenMP for Devices, Tasks, and More, 2014

Task-Parallel Reductions in OpenMP and OmpSs.
Proceedings of the Using and Improving OpenMP for Devices, Tasks, and More, 2014

Software-Managed Power Reduction in Infiniband Links.
Proceedings of the 43rd International Conference on Parallel Processing, 2014

Advanced Pattern based Memory Controller for FPGA based HPC applications.
Proceedings of the International Conference on High Performance Computing & Simulation, 2014

AMMC: Advanced Multi-Core Memory Controller.
Proceedings of the 2014 International Conference on Field-Programmable Technology, 2014

MAPC: Memory access pattern based controller.
Proceedings of the 24th International Conference on Field Programmable Logic and Applications, 2014

APMC: advanced pattern based memory controller (abstract only).
Proceedings of the 2014 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2014

Task-Based Programming with OmpSs and Its Application.
Proceedings of the Euro-Par 2014: Parallel Processing Workshops, 2014

Profit-aware cloud resource provisioner for ecommerce.
Proceedings of the 2014 IEEE International Conference on Cluster Computing, 2014

DaSH: a benchmark suite for hybrid dataflow and shared memory programming models: with comparative evaluation of three hybrid dataflow models.
Proceedings of the Computing Frontiers Conference, CF'14, 2014

CUDAlign 3.0: Parallel Biological Sequence Comparison in Large GPU Clusters.
Proceedings of the 14th IEEE/ACM International Symposium on Cluster, 2014

Adaptive MapReduce Scheduling in Shared Environments.
Proceedings of the 14th IEEE/ACM International Symposium on Cluster, 2014

ALOJA: A systematic study of Hadoop deployment variables to enable automated characterization of cost-effectiveness.
Proceedings of the 2014 IEEE International Conference on Big Data, 2014

PVMC: Programmable Vector Memory Controller.
Proceedings of the IEEE 25th International Conference on Application-Specific Systems, 2014

Stand-Alone Memory Controller for Graphics System.
Proceedings of the Reconfigurable Computing: Architectures, Tools, and Applications, 2014

2013
Deadline-Based MapReduce Workload Management.
IEEE Trans. Network and Service Management, 2013

A Systematic Methodology to Generate Decomposable and Responsive Power Models for CMPs.
IEEE Trans. Computers, 2013

A template system for the efficient compilation of domain abstractions onto reconfigurable computers.
Journal of Systems Architecture - Embedded Systems Design, 2013

Programmability and portability for exascale: Top down programming methodology and tools with StarSs.
J. Comput. Science, 2013

Counter-Based Power Modeling Methods: Top-Down vs. Bottom-Up.
Comput. J., 2013

Enabling Distributed Key-Value Stores with Low Latency-Impact Snapshot Support.
Proceedings of the 2013 IEEE 12th International Symposium on Network Computing and Applications, 2013

Self-Adaptive OmpSs Tasks in Heterogeneous Environments.
Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

Implementing OmpSs support for regions of data in architectures with multiple address spaces.
Proceedings of the International Conference on Supercomputing, 2013

Aeneas: A Tool to Enable Applications to Effectively Use Non-Relational Databases.
Proceedings of the International Conference on Computational Science, 2013

Loop level speculation in a task based programming model.
Proceedings of the 20th Annual International Conference on High Performance Computing, 2013

2012
Autonomic Placement of Mixed Batch and Transactional Workloads.
IEEE Trans. Parallel Distrib. Syst., 2012

DMA++: On the Fly Data Realignment for On-Chip Memories.
IEEE Trans. Computers, 2012

Energy accounting for shared virtualized environments under DVFS using PMC-based power models.
Future Generation Comp. Syst., 2012

POTRA: a framework for building power models for next generation multicore architectures.
Proceedings of the ACM SIGMETRICS/PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems, 2012

Hardware-software coherence protocol for the coexistence of caches and local memories.
Proceedings of the SC Conference on High Performance Computing Networking, 2012

Integrating Dataflow Abstractions into the Shared Memory Model.
Proceedings of the IEEE 24th International Symposium on Computer Architecture and High Performance Computing, 2012

OmpSs-OpenCL Programming Model for Heterogeneous Systems.
Proceedings of the Languages and Compilers for Parallel Computing, 2012

Productive Programming of GPU Clusters with OmpSs.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

Assessing the Impact of Network Compression on Molecular Dynamics and Finite Element Methods.
Proceedings of the 14th IEEE International Conference on High Performance Computing and Communication & 9th IEEE International Conference on Embedded Software and Systems, 2012

Task-based parallel breadth-first search in heterogeneous environments.
Proceedings of the 19th International Conference on High Performance Computing, 2012

Optimizing resource utilization with software-based temporal multi-threading (stmt).
Proceedings of the 19th International Conference on High Performance Computing, 2012

PPMC: Hardware scheduling and memory management support for multi accelerators.
Proceedings of the 22nd International Conference on Field Programmable Logic and Applications (FPL), 2012

On the Instrumentation of OpenMP and OmpSs Tasking Constructs.
Proceedings of the Euro-Par 2012: Parallel Processing Workshops, 2012

Transactional Access to Shared Memory in StarSs, a Task Based Programming Model.
Proceedings of the Euro-Par 2012 Parallel Processing - 18th International Conference, 2012

Topic 11: Multicore and Manycore Programming.
Proceedings of the Euro-Par 2012 Parallel Processing - 18th International Conference, 2012

DMA-circular: an enhanced high level programmable DMA controller for optimized management of on-chip local memories.
Proceedings of the Computing Frontiers Conference, CF'12, 2012

BSArc: blacksmith streaming architecture for HPC accelerators.
Proceedings of the Computing Frontiers Conference, CF'12, 2012

PPMC: A Programmable Pattern Based Memory Controller.
Proceedings of the Reconfigurable Computing: Architectures, Tools and Applications, 2012

Supporting stateful tasks in a dataflow graph.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

2011
Assessing Accelerator-Based HPC Reverse Time Migration.
IEEE Trans. Parallel Distrib. Syst., 2011

Ompss: a Proposal for Programming Heterogeneous Multi-Core Architectures.
Parallel Processing Letters, 2011

ACOTES Project: Advanced Compiler Technologies for Embedded Streaming.
International Journal of Parallel Programming, 2011

Local Memory Design Space Exploration for High-Performance Computing.
Comput. J., 2011

TARCAD: A template architecture for reconfigurable accelerator designs.
Proceedings of the IEEE 9th Symposium on Application Specific Processors, 2011

Hybrid Parallel Programming with MPI/StarSs.
Proceedings of the Applications, Tools and Techniques on the Road to Exascale Computing, Proceedings of the conference ParCo 2011, 31 August, 2011

Non-intrusive Estimation of QoS Degradation Impact on E-Commerce User Satisfaction.
Proceedings of The Tenth IEEE International Symposium on Networking Computing and Applications, 2011

Resource-Aware Adaptive Scheduling for MapReduce Clusters.
Proceedings of the Middleware 2011, 2011

Poster: programming clusters of GPUs with OMPSs.
Proceedings of the 25th International Conference on Supercomputing, 2011, Tucson, AZ, USA, May 31, 2011

Design space exploration for aggressive core replication schemes in CMPs.
Proceedings of the 20th ACM International Symposium on High Performance Distributed Computing, 2011

Implementation of a Reverse Time Migration kernel using the HCE High Level Synthesis tool.
Proceedings of the 2011 International Conference on Field-Programmable Technology, 2011

Introduction.
Proceedings of the Euro-Par 2011 Parallel Processing - 17th International Conference, 2011

Productive Cluster Programming with OmpSs.
Proceedings of the Euro-Par 2011 Parallel Processing - 17th International Conference, 2011

2010
Automatic Prefetch and Modulo Scheduling Transformations for the Cell BE Architecture.
IEEE Trans. Parallel Distrib. Syst., 2010

Guest Editors' Introduction.
International Journal of Parallel Programming, 2010

Extending OpenMP to Survive the Heterogeneous Multi-Core Era.
International Journal of Parallel Programming, 2010

Holistic Management for a more Energy-Efficient Cloud Computing.
ERCIM News, 2010

A survey on performance management for internet applications.
Concurrency and Computation: Practice and Experience, 2010

Effective communication and computation overlap with hybrid MPI/SMPSs.
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2010

Transient Congestion Avoidance in Software Distributed Shared Memory Systems.
Proceedings of the 2010 International Conference on Parallel and Distributed Computing, 2010

Task Superscalar: An Out-of-Order Task Pipeline.
Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture, 2010

Optimizing the Exploitation of Multicore Processors and GPUs with OpenMP and OpenCL.
Proceedings of the Languages and Compilers for Parallel Computing, 2010

A Proposal for User-Defined Reductions in OpenMP.
Proceedings of the Beyond Loop Level Parallelism in OpenMP: Accelerators, 2010

An Extension to Improve OpenMP Tasking Control.
Proceedings of the Beyond Loop Level Parallelism in OpenMP: Accelerators, 2010

Characterization of workload and resource consumption for an online travel and booking site.
Proceedings of the 2010 IEEE International Symposium on Workload Characterization, 2010

Overlapping communication and computation by using a hybrid MPI/SMPSs approach.
Proceedings of the 24th International Conference on Supercomputing, 2010

Decomposable and responsive power models for multicore processors using performance counters.
Proceedings of the 24th International Conference on Supercomputing, 2010

Performance Management of Accelerated MapReduce Workloads in Heterogeneous Clusters.
Proceedings of the 39th International Conference on Parallel Processing, 2010

A CellBE-based HPC Application for the Analysis of Vulnerabilities in Cryptographic Hash Functions.
Proceedings of the 12th IEEE International Conference on High Performance Computing and Communications, 2010

DMA++: on the fly data realignment for on-chip memories.
Proceedings of the 16th International Conference on High-Performance Computer Architecture (HPCA-16 2010), 2010

Analysis of Task Offloading for Accelerators.
Proceedings of the High Performance Embedded Architectures and Compilers, 2010

Buffer Sizing for Self-timed Stream Programs on Heterogeneous Distributed Memory Multiprocessors.
Proceedings of the High Performance Embedded Architectures and Compilers, 2010

Accurate energy accounting for shared virtualized environments using PMC-based power modeling techniques.
Proceedings of the 2010 11th IEEE/ACM International Conference on Grid Computing, 2010

FEM: A Step Towards a Common Memory Layout for FPGA Based Accelerators.
Proceedings of the International Conference on Field Programmable Logic and Applications, 2010

Starsscheck: A Tool to Find Errors in Task-Based Parallel Programs.
Proceedings of the Euro-Par 2010 - Parallel Processing, 16th International Euro-Par Conference, Ischia, Italy, August 31, 2010

Reducing data access latency in SDSM systems using runtime optimizations.
Proceedings of the 2010 conference of the Centre for Advanced Studies on Collaborative Research, 2010

2009
The Design of OpenMP Tasks.
IEEE Trans. Parallel Distrib. Syst., 2009

Guest Editors' Introduction.
International Journal of Parallel Programming, 2009

A Proposal to Extend the OpenMP Tasking Model with Dependent Tasks.
International Journal of Parallel Programming, 2009

Hierarchical Task-Based Programming With StarSs.
IJHPCA, 2009

BSC Vision Towards Exascale.
IJHPCA, 2009

Creating Power-Aware Middleware for Energy-Efficient Data Centres.
ERCIM News, 2009

The Abstract Streaming Machine: Compile-Time Performance Modelling of Stream Programs on Heterogeneous Multiprocessors.
Proceedings of the Embedded Computer Systems: Architectures, 2009

OpenMP extensions for FPGA accelerators.
Proceedings of the 2009 International Conference on Embedded Computer Systems: Architectures, 2009

Atomic quake: using transactional memory in an interactive multiplayer game server.
Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2009

Turbocharging boosted transactions or: how i learnt to stop worrying and love longer transactions.
Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2009

Batch Job Profiling and Adaptive Profile Enforcement for Virtualized Environments.
Proceedings of the 17th Euromicro International Conference on Parallel, 2009

Impact of the Memory Hierarchy on Shared Memory Architectures in Multicore Programming Models.
Proceedings of the 17th Euromicro International Conference on Parallel, 2009

Adaptive and Speculative Memory Consistency Support for Multi-core Architectures with On-Chip Local Memories.
Proceedings of the Languages and Compilers for Parallel Computing, 2009

Unrolling Loops Containing Task Parallelism.
Proceedings of the Languages and Compilers for Parallel Computing, 2009

A Proposal to Extend the OpenMP Tasking Model for Heterogeneous Architectures.
Proceedings of the Evolving OpenMP in an Age of Extreme Parallelism, 2009

QuakeTM: parallelizing a complex sequential application using transactional memory.
Proceedings of the 23rd international conference on Supercomputing, 2009

Barcelona OpenMP Tasks Suite: A Set of Benchmarks Targeting the Exploitation of Task Parallelism in OpenMP.
Proceedings of the ICPP 2009, 2009

Speeding Up Distributed MapReduce Applications Using Hardware Accelerators.
Proceedings of the ICPP 2009, 2009

CellMT: A cooperative multithreading library for the Cell/B.E.
Proceedings of the 16th International Conference on High Performance Computing, 2009

Exploiting memory customization in FPGA for 3D stencil computations.
Proceedings of the 2009 International Conference on Field-Programmable Technology, 2009

Introduction.
Proceedings of the Euro-Par 2009 Parallel Processing, 2009

An Extension of the StarSs Programming Model for Platforms with Multiple GPUs.
Proceedings of the Euro-Par 2009 Parallel Processing, 2009

Mapping stream programs onto heterogeneous multiprocessor systems.
Proceedings of the 2009 International Conference on Compilers, 2009

OpenMP tasking analysis for programmers.
Proceedings of the 2009 conference of the Centre for Advanced Studies on Collaborative Research, 2009

2008
Nebelung: Execution Environment for Transactional OpenMP.
International Journal of Parallel Programming, 2008

Guest Editors Introduction: Special Issue on OpenMP.
International Journal of Parallel Programming, 2008

A hybrid connector for efficient web servers.
IJHPCN, 2008

Power-efficient VLIW design using clustering and widening.
IJES, 2008

Dynamic CPU provisioning for self-managed secure web applications in SMP hosting platforms.
Computer Networks, 2008

An adaptive cut-off for task parallelism.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2008

Utility-based placement of dynamic Web applications with fairness goals.
Proceedings of the IEEE/IFIP Network Operations and Management Symposium: Pervasive Management for Ubioquitous Networks and Services, 2008

Enabling Resource Sharing between Transactional and Batch Workloads Using Dynamic Application Placement.
Proceedings of the Middleware 2008, 2008

Automatic Pre-Fetch and Modulo Scheduling Transformations for the Cell BE Architecture.
Proceedings of the Languages and Compilers for Parallel Computing, 2008

Extending the OpenMP Tasking Model to Allow Dependent Tasks.
Proceedings of the OpenMP in a New Era of Parallelism, 4th International Workshop, 2008

Evaluation of OpenMP Task Scheduling Strategies.
Proceedings of the OpenMP in a New Era of Parallelism, 4th International Workshop, 2008

Understanding tuning complexity in multithreaded and hybrid web servers.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Improving Web Server Performance Through Main Memory Compression.
Proceedings of the 14th International Conference on Parallel and Distributed Systems, 2008

Tailoring Resources: The Energy Efficient Consolidation Strategy Goes Beyond Virtualization.
Proceedings of the 2008 International Conference on Autonomic Computing, 2008

Managing SLAs of heterogeneous workloads using dynamic application placement.
Proceedings of the 17th International Symposium on High-Performance Distributed Computing (HPDC-17 2008), 2008

OpenMP tasks in IBM XL compilers.
Proceedings of the 2008 conference of the Centre for Advanced Studies on Collaborative Research, 2008

Hybrid access-specific software cache techniques for the cell BE architecture.
Proceedings of the 17th International Conference on Parallel Architecture and Compilation Techniques, 2008

2007
Transactional Memory: An Overview.
IEEE Micro, 2007

A Proposal for Error Handling in OpenMP.
International Journal of Parallel Programming, 2007

Introduction.
International Journal of Parallel Programming, 2007

Special Issue on OpenMP - Guest Editors' Introduction.
International Journal of Parallel Programming, 2007

Designing an overload control strategy for secure e-commerce applications.
Computer Networks, 2007

A Streaming Machine Description and Programming Model.
Proceedings of the Embedded Computer Systems: Architectures, 2007

A Novel Asynchronous Software Cache Implementation for the Cell-BE Processor.
Proceedings of the Languages and Compilers for Parallel Computing, 2007

An Experimental Evaluation of the New OpenMP Tasking Model.
Proceedings of the Languages and Compilers for Parallel Computing, 2007

Transactional Memory and OpenMP.
Proceedings of the A Practical Programming Model for the Multi-Core Era, 2007

A Proposal for Task Parallelism in OpenMP.
Proceedings of the A Practical Programming Model for the Multi-Core Era, 2007

Support for OpenMP tasks in Nanos v4.
Proceedings of the 2007 conference of the Centre for Advanced Studies on Collaborative Research, 2007

2006
Running OpenMP applications efficiently on an everything-shared SDSM.
J. Parallel Distrib. Comput., 2006

Employing nested OpenMP for the parallelization of multi-zone computational fluid dynamics applications.
J. Parallel Distrib. Comput., 2006

Exploiting multilevel parallelism using OpenMP on a massive multithreaded architecture.
J. Embedded Computing, 2006

Performance, power efficiency and scalability of asymmetric cluster chip multiprocessors.
Computer Architecture Letters, 2006

Runtime Address Space Computation for SDSM Systems.
Proceedings of the Languages and Compilers for Parallel Computing, 2006

A Proposal for Error Handling in OpenMP.
Proceedings of the OpenMP Shared Memory Parallel Programming - International Workshops, 2006

Techniques supporting threadprivate in OpenMP.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

Topic 7: Parallel Computer Architecture and Instruction Level Parallelism.
Proceedings of the Euro-Par 2006, Parallel Processing, 12th International Euro-Par Conference, Dresden, Germany, August 28, 2006

2005
Tuning Dynamic Web Applications using Fine-Grain Analysis.
Proceedings of the 13th Euromicro Workshop on Parallel, 2005

WAS Control Center: An Autonomic Performance-Triggered Tracing Environment for WebSphere.
Proceedings of the 13th Euromicro Workshop on Parallel, 2005

Experiences Parallelizing a Web Server with OpenMP.
Proceedings of the OpenMP Shared Memory Parallel Programming - International Workshops, 2005

Optimizing NANOS OpenMP for the IBM Cyclops Multithreaded Architecture.
Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

Characterizing Secure Dynamic Web Applications Scalability.
Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

Session-Based Adaptive Overload Control for Secure Dynamic Web Applications.
Proceedings of the 34th International Conference on Parallel Processing (ICPP 2005), 2005

A Hybrid Web Server Architecture for e-Commerce Applications.
Proceedings of the 11th International Conference on Parallel and Distributed Systems, 2005

A Hybrid Web Server Architecture for Secure e-Business Web Applications.
Proceedings of the High Performance Computing and Communications, 2005

2004
Register Constrained Modulo Scheduling.
IEEE Trans. Parallel Distrib. Syst., 2004

Software and Hardware Techniques to Optimize Register File Utilization in VLIW Architectures.
International Journal of Parallel Programming, 2004

Dynamic Memory Instruction Bypassing.
International Journal of Parallel Programming, 2004

High-performance and low-power VLIW cores for numerical computations.
IJHPCN, 2004

Performance and Power Evaluation of Clustered VLIW Processors with Wide Functional Units.
Proceedings of the Computer Systems: Architectures, 2004

Running OpenMP Applications Efficiently on an Everything-Shared SDSM.
Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), 2004

Employing Nested OpenMP for the Parallelization of Multi-Zone Computational Fluid Dynamics Applications.
Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), 2004

Evaluating the Scalability of Java Event-Driven Web Servers.
Proceedings of the 33rd International Conference on Parallel Processing (ICPP 2004), 2004

2003
Scaling non-regular shared-memory codes by reusing custom loop schedules.
Scientific Programming, 2003

Automatic multilevel parallelization using OpenMP.
Scientific Programming, 2003

Introduction.
Scientific Programming, 2003

Is the Schedule Clause Really Necessary in OpenMP?
Proceedings of the OpenMP Shared Memory Parallel Programming, 2003

Evaluation of OpenMP for the Cyclops Multithreaded Architecture.
Proceedings of the OpenMP Shared Memory Parallel Programming, 2003

Complete instrumentation requirements for performance analysis of Web based technologies.
Proceedings of the 2003 IEEE International Symposium on Performance Analysis of Systems and Software, 2003

Power-Performance Trade-Offs in Wide and Clustered VLIW Cores for Numerical Codes.
Proceedings of the High Performance Computing, 5th International Symposium, 2003

Hierarchical Clustered Register File Organization for VLIW Processors.
Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), 2003

Application/Kernel Cooperation Towards the Efficient Execution of Shared-Memory Parallel Java Codes.
Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), 2003

Dynamic memory instruction bypassing.
Proceedings of the 17th Annual International Conference on Supercomputing, 2003

2002
Scheduler-Activated Dynamic Page Migration for Multiprogrammed DSM Multiprocessors.
J. Parallel Distrib. Comput., 2002

Runtime vs. Manual Data Distribution for Architecture-Agnostic Shared-Memory Programming Models.
International Journal of Parallel Programming, 2002

Dual-Level Parallelism Exploitation with OpenMP in Coastal Ocean Circulation Modeling.
Proceedings of the High Performance Computing, 4th International Symposium, 2002

Cost-Effective Compiler Directed Memory Prefetching and Bypassing.
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques (PACT 2002), 2002

2001
Static and Dynamic Locality Optimizations Using Integer Linear Programming.
IEEE Trans. Parallel Distrib. Syst., 2001

A Framework for Integrating Data Alignment, Distribution, and Redistribution in Distributed Memory Multiprocessors.
IEEE Trans. Parallel Distrib. Syst., 2001

Cost-Conscious Strategies to Increase Performance of Numerical Programs on Aggressive VLIW Architectures.
IEEE Trans. Computers, 2001

Lifetime-Sensitive Modulo Scheduling in a Production Environment.
IEEE Trans. Computers, 2001

New OpenMP directives for irregular data access loops.
Scientific Programming, 2001

Exploiting memory affinity in OpenMP through schedule reuse.
SIGARCH Computer Architecture News, 2001

Strategies for the efficient exploitation of loop-level parallelism in Java.
Concurrency and Computation: Practice and Experience, 2001

A Study of Implicit Data Distribution Methods for OpenMP Using the SPEC Benchmarks.
Proceedings of the OpenMP Shared Memory Parallel Programming, 2001

Defining and Supporting Pipelined Executions in OpenMP.
Proceedings of the OpenMP Shared Memory Parallel Programming, 2001

Scaling irregular parallel codes with minimal programming effort.
Proceedings of the 2001 ACM/IEEE conference on Supercomputing, 2001

Modulo scheduling with integrated register spilling for clustered VLIW architectures.
Proceedings of the 34th Annual International Symposium on Microarchitecture, 2001

MIRS: Modulo Scheduling with Integrated Register Spilling.
Proceedings of the Languages and Compilers for Parallel Computing, 2001

A novel renaming mechanism that boosts software prefetching.
Proceedings of the 15th international conference on Supercomputing, 2001

The trade-off between implicit and explicit data distribution in shared-memory programming paradigms.
Proceedings of the 15th international conference on Supercomputing, 2001

Performance Analysis Tools for Parallel Java Applications on Shared-memory Systems.
Proceedings of the 2001 International Conference on Parallel Processing, 2001

Complex Pipelined Executions in OpenMP Parallel Applications.
Proceedings of the 2001 International Conference on Parallel Processing, 2001

Topic 08+13: Instruction-Level Parallelism and Computer Architecture.
Proceedings of the Euro-Par 2001: Parallel Processing, 2001

2000
NanosCompiler: supporting flexible multilevel parallelism exploitation in OpenMP.
Concurrency - Practice and Experience, 2000

Is Data Distribution Necessary in OpenMP?
Proceedings of the Proceedings Supercomputing 2000, 2000

Improved spill code generation for software pipelined loops.
Proceedings of the 2000 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2000

Two-level hierarchical register file organization for VLIW processors.
Proceedings of the 33rd Annual IEEE/ACM International Symposium on Microarchitecture, 2000

UPMLIB: A Runtime System for Tuning the Memory Performance of OpenMP Programs on Scalable Shared-Memory Multiprocessors.
Proceedings of the Languages, 2000

OpenMP Extensions for Thread Groups and Their Run-Time Support.
Proceedings of the Languages and Compilers for Parallel Computing, 2000

Towards an efficient exploitation of loop-level parallelism in Java.
Proceedings of the ACM 2000 Java Grande Conference, San Francisco, CA, USA, 2000

Leveraging Transparent Data Distribution in OpenMP via User-Level Dynamic Page Migration.
Proceedings of the High Performance Computing, Third International Symposium, 2000

Applying Interposition Techniques for Performance Analysis of OpenMP Parallel Applications.
Proceedings of the 14th International Parallel & Distributed Processing Symposium (IPDPS'00), 2000

A case for use-level dynamic page migration.
Proceedings of the 14th international conference on Supercomputing, 2000

User-Level Dynamic Page Migration for Multiprogrammed Shared-Memory Multiprocessors.
Proceedings of the 2000 International Conference on Parallel Processing, 2000

1999
Thread fork/join techniques for multi-level parallelism exploitation in NUMA multiprocessors.
Proceedings of the 13th international conference on Supercomputing, 1999

Increasing effective IPC by exploiting distant parallelism.
Proceedings of the 13th international conference on Supercomputing, 1999

An integer linear programming approach for optimizing cache locality.
Proceedings of the 13th international conference on Supercomputing, 1999

Impact on Performance of Fused Multiply-Add Units in Aggressive VLIW Architectures.
Proceedings of the International Conference on Parallel Processing 1999, 1999

Exploiting Multiple Levels of Parallelism in OpenMP: A Case Study.
Proceedings of the International Conference on Parallel Processing 1999, 1999

Quantifying the Benefits of SPECint Distant Parallelism in Simultaneous Multi-Threading Architectures.
Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques, 1999

1998
Modulo Scheduling with Reduced Register Pressure.
IEEE Trans. Computers, 1998

Tools and Techniques for Automatic Data Layout: A Case Study.
Parallel Computing, 1998

Quantitative Evaluation of Register Pressure on Software Pipelined Loops.
International Journal of Parallel Programming, 1998

Widening Resources: A Cost-effective Technique for Aggressive ILP Architectures.
Proceedings of the 31st Annual IEEE/ACM International Symposium on Microarchitecture, 1998

Resource Widening Versus Replication: Limits and Performance-cost Trade-off.
Proceedings of the 12th international conference on Supercomputing, 1998

1997
High Performance Fortran Implementations: A Survey.
Scientific Programming, 1997

DDT: A Research Tool for Automatic Data Distribution in High Performance Fortran.
Scientific Programming, 1997

Exploiting Parallelism Through Directives on the Nano-Threads Programming Model.
Proceedings of the Languages and Compilers for Parallel Computing, 1997

Analysis of Several Scheduling Algorithms under the Nano-Thread Programming Model.
Proceedings of the 11th International Parallel Processing Symposium (IPPS '97), 1997

Increasing Memory Bandwidth with Wide Buses: Compiler, Hardware and Performance Trade-Offs.
Proceedings of the 11th international conference on Supercomputing, 1997

1996
Using a 0-1 Integer Programming Model for Automatic Static Data Distribution.
Parallel Processing Letters, 1996

A framework for automatic dynamic data mapping.
Proceedings of the Eighth IEEE Symposium on Parallel and Distributed Processing, 1996

Loop Parallelization: Revisiting Framework of Unimodular Transformations.
Proceedings of the 4th Euromicro Workshop on Parallel and Distributed Processing (PDP '96), 1996

Heuristics for Register-Constrained Software Pipelining.
Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture, 1996

Data Distribution and Loop Parallelization for Shared-Memory Multiprocessors.
Proceedings of the Languages and Compilers for Parallel Computing, 1996

A Library Implementation of the Nano-Threads Programming Model.
Proceedings of the Euro-Par '96 Parallel Processing, 1996

Swing module scheduling: a lifetime-sensitive approach.
Proceedings of the Fifth International Conference on Parallel Architectures and Compilation Techniques, 1996

1995
Conflict-Free Access for Streams in Multimodule Memories.
IEEE Trans. Computers, 1995

Analyzing reference patterns in automatic data distribution tools.
International Journal of Parallel Programming, 1995

A Novel Approach Towards Automatic Data Distribution.
Proceedings of the Proceedings Supercomputing '95, San Diego, CA, USA, December 4-8, 1995, 1995

Quantitative analysis of vector code.
Proceedings of the 3rd Euromicro Workshop on Parallel and Distributed Processing (PDP '95), 1995

Hypernode reduction modulo scheduling.
Proceedings of the 28th Annual International Symposium on Microarchitecture, Ann Arbor, Michigan, USA, November 29, 1995

Data Redistribution in an Automatic Data Distribution Tool.
Proceedings of the Languages and Compilers for Parallel Computing, 1995

Vector Multiprocessors with Arbitrated Memory Access.
Proceedings of the 22nd Annual International Symposium on Computer Architecture, 1995

Non-Consistent Dual Register Files to Reduce Register Pressure.
Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture (HPCA 1995), 1995

Automatic generation of loop scheduling for VLIW.
Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques, 1995

1994
Network Synchronization and Out-of-Order Access to Vectors.
Parallel Processing Letters, 1994

Access To Vectors In Multi-module Memories.
Proceedings of the Second Euromicro Workshop on Parallel and Distributed Processing, 1994

Detecting and Using Affinity in an Automatic Data Distribution Tool.
Proceedings of the Languages and Compilers for Parallel Computing, 1994

Synchronized access to streams in SIMD vector multiprocessors.
Proceedings of the 8th international conference on Supercomputing, 1994

Memory Access Synchronization in Vector Multiprocessors.
Proceedings of the Parallel Processing: CONPAR 94, 1994

Using Sacks to Organize Registers in VLIW Machines.
Proceedings of the Parallel Processing: CONPAR 94, 1994

1993
Conflict-free access to streams in multiprocessor systems.
Microprocessing and Microprogramming, 1993

Access to streams in multiprocessor systems.
Proceedings of the 1993 Euromicro Workshop on Parallel and Distributed Processing, 1993

Align and Distribute-based Linear Loop Transformations.
Proceedings of the Languages and Compilers for Parallel Computing, 1993

Partitioning the Statement per Iteration Space Using Non-Singular Matrices.
Proceedings of the 7th international conference on Supercomputing, 1993

1992
Increasing the Number of Strides for Conflict-Free Vector Access.
Proceedings of the 19th Annual International Symposium on Computer Architecture. Gold Coast, 1992

Conflict-free access of vectors with power-of-two strides.
Proceedings of the 6th international conference on Supercomputing, 1992

1991
Conflict-Free Strides for Vectors in Matched Memories.
Parallel Processing Letters, 1991

Balanced Loop Partitioning Using GTS.
Proceedings of the Languages and Compilers for Parallel Computing, 1991

On Automatic Loop Data-Mapping for Distributed-Memory Multiprocessors.
Proceedings of the Distributed Memory Computing, 2nd Euronean Conference, 1991

1989
GTS: parallelization and vectorization of tight recurrences.
Proceedings of the Proceedings Supercomputing '89, Reno, NV, USA, November 12-17, 1989, 1989

GTS: Extracting Full Parallelism Out of DO Loops.
Proceedings of the PARLE '89: Parallel Architectures and Languages Europe, 1989


  Loading...