We stand with Ukraine

We stand with Ukraine

Mateo Valero

Orcid: 0000-0003-2917-2482

Affiliations:

Polytechnic University of Catalonia, Barcelona, Spain
Barcelona Supercomputing Center, Spain

According to our database¹, Mateo Valero authored at least 473 papers between 1982 and 2023.

Collaborative distances:

Dijkstra number² of three.
Erdős number³ of three.

Awards

ACM Fellow

ACM Fellow 2002, "For contributions to the design of vector, superscalar, and VLIW architectures, and technical leadership.".

IEEE Fellow

IEEE Fellow 2001, "For contributions to the design of vector architectures and superscalar processors.".

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

Online presence:

On csauthors.net:

Bibliography

2023

Vitruvius+: An Area-Efficient RISC-V Decoupled Vector Coprocessor for High Performance Computing Applications.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., June, 2023

VAQUERO: A Scratchpad-based Vector Accelerator for Query Processing.

[BibT_eX]

[DOI]

,

Iván Vargas Valdivieso

,

,

,

,

,

,

Adrián Cristal

Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2023

Sargantana: An Academic SoC RISC-V Processor in 22nm FDSOI Technology.

[BibT_eX]

[DOI]

Proceedings of the 38th Conference on Design of Circuits and Integrated Systems, 2023

2022

Adaptable Register File Organization for Vector Processors.

[BibT_eX]

[DOI]

Cristóbal Ramírez Lazo

,

Enrico Reggiani

,

Carlos Rojas Morales

,

Roger Figueras Bagué

,

Luis A. Villa Vargas

,

Marco Antonio Ramírez Salinas

,

Mateo Valero Cortés

,

Osman Sabri Unsal

,

Adrián Cristal

Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022

DVINO: A RISC-V Vector Processor Implemented in 65nm Technology.

[BibT_eX]

[DOI]

Proceedings of the 37th Conference on Design of Circuits and Integrated Systems, 2022

2021

When Sally Met Harry or When AI Met HPC.

[BibT_eX]

[DOI]

,

Eduardo Ulises Moya-Sánchez

,

Supercomput. Front. Innov., 2021

The Ultimate DataFlow for Ultimate SuperComputers-on-a-Chip, for Scientific Computing, Geo Physics, Complex Mathematics, and Information Processing.

[BibT_eX]

[DOI]

Proceedings of the 10th Mediterranean Conference on Embedded Computing, 2021

VIA: A Smart Scratchpad for Vector Units with Application to Sparse Matrix Computations.

[BibT_eX]

[DOI]

,

Iván Vargas Valdivieso

,

Adrián Barredo

,

,

,

,

,

,

Adrián Cristal

Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2021

PrioRAT: Criticality-Driven Prioritization Inside the On-Chip Memory Hierarchy.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the Euro-Par 2021: Parallel Processing, 2021

2020

Efficiency analysis of modern vector architectures: vector ALU sizes, core counts and clock frequencies.

[BibT_eX]

[DOI]

Adrián Barredo

,

Juan M. Cebrian

,

,

,

J. Supercomput., 2020

Using Arm's scalable vector extension on stencil codes.

[BibT_eX]

[DOI]

Adrià Armejach

,

,

Juan M. Cebrian

,

Rubén Langarita

,

Rekai González-Alberquilla

,

Chris Adeniyi-Jones

,

,

,

J. Supercomput., 2020

Advances in the Hierarchical Emergent Behaviors (HEB) Approach to Autonomous Vehicles.

[BibT_eX]

[DOI]

,

Rodolfo A. Milito

,

Mario Nemirovsky

,

IEEE Intell. Transp. Syst. Mag., 2020

Semi-automatic validation of cycle-accurate simulation infrastructures: The case for gem5-x86.

[BibT_eX]

[DOI]

Juan M. Cebrian

,

Adrián Barredo

,

,

,

,

Future Gener. Comput. Syst., 2020

The Ultimate DataFlow for Ultimate SuperComputers-on-a-Chips.

[BibT_eX]

[DOI]

Veljko Milutinovic

,

,

,

,

Miljan Djordjevic

,

Kristy Yoshimoto

,

,

CoRR, 2020

Runtime-guided ECC protection using online estimation of memory vulnerability.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the International Conference for High Performance Computing, 2020

RICH: implementing reductions in the cache hierarchy.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the ICS '20: 2020 International Conference on Supercomputing, 2020

Improving Accuracy and Speeding Up Document Image Classification Through Parallel Systems.

[BibT_eX]

[DOI]

Javier Ferrando

,

Juan Luis Domínguez

,

,

,

,

,

,

Proceedings of the Computational Science - ICCS 2020, 2020

Improving Predication Efficiency through Compaction/Restoration of SIMD Instructions.

[BibT_eX]

[DOI]

Adrián Barredo

,

Juan M. Cebrian

,

,

,

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2020

An Academic RISC-V Silicon Implementation Based on Open-Source Components.

[BibT_eX]

[DOI]

Proceedings of the XXXV Conference on Design of Circuits and Integrated Systems, 2020

2019

A Hardware Runtime for Task-Based Programming Models.

[BibT_eX]

[DOI]

,

,

Carlos Álvarez

,

Daniel Jiménez-González

,

Eduard Ayguadé

,

IEEE Trans. Parallel Distributed Syst., 2019

On the maturity of parallel applications for asymmetric multi-core processors.

[BibT_eX]

[DOI]

Kallia Chronaki

,

,

,

,

,

Eduard Ayguadé

,

J. Parallel Distributed Comput., 2019

Guest Editorial: Special Issue on Network and Parallel Computing for Emerging Architectures and Applications.

[BibT_eX]

[DOI]

,

,

,

,

Hironori Kasahara

,

Int. J. Parallel Program., 2019

The international race towards Exascale in Europe.

[BibT_eX]

[DOI]

Fabrizio Gagliardi

,

,

,

CCF Trans. High Perform. Comput., 2019

Optimizing computation-communication overlap in asynchronous task-based programs: poster.

[BibT_eX]

[DOI]

Emilio Castillo

,

,

,

,

,

,

,

Abhinav Bhatele

Proceedings of the 24th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2019

A Vulnerability Factor for ECC-protected Memory.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the 25th IEEE International Symposium on On-Line Testing and Robust System Design, 2019

Power efficient job scheduling by predicting the impact of processor manufacturing variability.

[BibT_eX]

[DOI]

Dimitrios Chasapis

,

,

,

,

,

Proceedings of the ACM International Conference on Supercomputing, 2019

Optimizing computation-communication overlap in asynchronous task-based programs.

[BibT_eX]

[DOI]

Emilio Castillo

,

,

,

,

,

,

,

Abhinav Bhatele

Proceedings of the ACM International Conference on Supercomputing, 2019

POSTER: An Optimized Predication Execution for SIMD Extensions.

[BibT_eX]

[DOI]

Adrián Barredo

,

Juan M. Cebrian

,

,

,

Proceedings of the 28th International Conference on Parallel Architectures and Compilation Techniques, 2019

2018

Vector Processing-Aware Advanced Clock-Gating Techniques for Low-Power Fused Multiply-Add.

[BibT_eX]

[DOI]

,

,

,

Osman Sabri Unsal

,

Adrián Cristal

,

IEEE Trans. Very Large Scale Integr. Syst., 2018

Asynchronous and Exact Forward Recovery for Detected Errors in Iterative Solvers.

[BibT_eX]

[DOI]

,

,

Eduard Ayguadé

,

,

,

IEEE Trans. Parallel Distributed Syst., 2018

Reducing Cache Coherence Traffic with a NUMA-Aware Runtime Approach.

[BibT_eX]

[DOI]

,

,

,

,

,

IEEE Trans. Parallel Distributed Syst., 2018

Performance and energy effects on task-based parallelized applications - User-directed versus manual vectorization.

[BibT_eX]

[DOI]

,

Diego Caballero

,

Juan M. Cebrian

,

,

,

,

Xavier Martorell

,

J. Supercomput., 2018

A General Guide to Applying Machine Learning to Computer Architecture.

[BibT_eX]

[DOI]

Daniel Nemirovsky

,

,

Nikola Markovic

,

Mario Nemirovsky

,

,

Adrián Cristal

,

Supercomput. Front. Innov., 2018

Memory Vulnerability: A Case for Delaying Error Reporting.

[BibT_eX]

[DOI]

,

,

,

CoRR, 2018

Runtime-assisted cache coherence deactivation in task parallel programs.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the International Conference for High Performance Computing, 2018

Graph partitioning applied to DAG scheduling to reduce NUMA effects.

[BibT_eX]

[DOI]

Isaac Sánchez Barrera

,

,

,

Eduard Ayguadé

,

,

Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2018

Runtime Aware Architectures.

[BibT_eX]

[DOI]

Mateo Valero Cortés

Proceedings of the 2018 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation, 2018

Reducing Data Movement on Large Shared Memory Systems by Exploiting Computation Dependencies.

[BibT_eX]

[DOI]

Isaac Sánchez Barrera

,

,

Eduard Ayguadé

,

,

,

Proceedings of the 32nd International Conference on Supercomputing, 2018

Runtime-Guided Management of Stacked DRAM Memories in Task Parallel Programs.

[BibT_eX]

[DOI]

,

,

,

Eduard Ayguadé

,

,

Proceedings of the 32nd International Conference on Supercomputing, 2018

Architectural Support for Task Dependence Management with Flexible Software Scheduling.

[BibT_eX]

[DOI]

Emilio Castillo

,

,

,

,

Enrique Vallejo

,

José Luis Bosque

,

,

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018

Stencil codes on a vector length agnostic architecture.

[BibT_eX]

[DOI]

Adrià Armejach

,

,

Juan M. Cebrian

,

Rekai González-Alberquilla

,

Chris Adeniyi-Jones

,

,

,

Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques, 2018

2017

Task Scheduling Techniques for Asymmetric Multi-Core Systems.

[BibT_eX]

[DOI]

Kallia Chronaki

,

,

,

,

,

Eduard Ayguadé

,

,

IEEE Trans. Parallel Distributed Syst., 2017

An Integrated Vector-Scalar Design on an In-Order ARM Core.

[BibT_eX]

[DOI]

,

,

,

,

Adrián Cristal

,

,

ACM Trans. Archit. Code Optim., 2017

Determinism at Standard-Library Level in TM-Based Applications.

[BibT_eX]

[DOI]

Vesna Smiljkovic

,

Osman S. Ünsal

,

Adrián Cristal

,

Int. J. Parallel Program., 2017

A scalable synthetic traffic model of Graph500 for computer networks analysis.

[BibT_eX]

[DOI]

,

,

Enrique Vallejo

,

José Luis Bosque

,

,

,

Germán Rodríguez

,

,

Cyriel Minkenberg

,

Concurr. Comput. Pract. Exp., 2017

SEDEA: A Sensible Approach to Account DRAM Energy in Multicore Systems.

[BibT_eX]

[DOI]

,

,

,

Francisco J. Cazorla

,

Proceedings of the 29th International Symposium on Computer Architecture and High Performance Computing, 2017

iQ: An Efficient and Flexible Queue-Based Simulation Framework.

[BibT_eX]

[DOI]

,

Daniel Nemirovsky

,

,

,

,

Mario Nemirovsky

Proceedings of the 25th IEEE International Symposium on Modeling, 2017

General Purpose Task-Dependence Management Hardware for Task-Based Dataflow Programming Models.

[BibT_eX]

[DOI]

,

,

,

Carlos Álvarez

,

Daniel Jiménez-González

,

Eduard Ayguadé

,

Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

FlexVC: Flexible Virtual Channel Management in Low-Diameter Networks.

[BibT_eX]

[DOI]

,

Enrique Vallejo

,

,

Cyriel Minkenberg

,

Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

ATM: Approximate Task Memoization in the Runtime System.

[BibT_eX]

[DOI]

,

,

,

,

Gurindar S. Sohi

Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

Picos, A Hardware Task-Dependence Manager for Task-Based Dataflow Programming Models.

[BibT_eX]

[DOI]

,

,

,

Carlos Álvarez

,

Daniel Jiménez-González

,

Eduard Ayguadé

,

Proceedings of the 2017 International Conference on High Performance Computing & Simulation, 2017

Fog Function Virtualization: A flexible solution for IoT applications.

[BibT_eX]

[DOI]

,

Josue V. Quiroga

,

,

Mario Nemirovsky

Proceedings of the Second International Conference on Fog and Mobile Edge Computing, 2017

Direct Inter-Process Communication (dIPC): Repurposing the CODOMs Architecture to Accelerate IPC.

[BibT_eX]

[DOI]

Lluís Vilanova

,

,

,

,

Proceedings of the Twelfth European Conference on Computer Systems, 2017

To Distribute or Not to Distribute: The Question of Load Balancing for Performance or Energy.

[BibT_eX]

[DOI]

Esteban Stafford

,

,

José Luis Bosque

,

,

Proceedings of the Euro-Par 2017: Parallel Processing - 23rd International Conference on Parallel and Distributed Computing, Santiago de Compostela, Spain, August 28, 2017

Runtime-Assisted Shared Cache Insertion Policies Based on Re-reference Intervals.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the Euro-Par 2017: Parallel Processing - 23rd International Conference on Parallel and Distributed Computing, Santiago de Compostela, Spain, August 28, 2017

A Deep Learning Mapper (DLM) for Scheduling on Heterogeneous Systems.

[BibT_eX]

[DOI]

Daniel Nemirovsky

,

,

Nikola Markovic

,

Mario Nemirovsky

,

,

Adrián Cristal

,

Proceedings of the High Performance Computing - 4th Latin American Conference, 2017

2016

DReAM: An Approach to Estimate per-Task DRAM Energy in Multicore Systems.

[BibT_eX]

[DOI]

,

,

,

Francisco J. Cazorla

,

ACM Trans. Design Autom. Electr. Syst., 2016

Network unfairness in dragonfly topologies.

[BibT_eX]

[DOI]

,

Enrique Vallejo

,

Cristobal Camarero

,

,

J. Supercomput., 2016

Thread Assignment in Multicore/Multithreaded Processors: A Statistical Approach.

[BibT_eX]

[DOI]

Petar Radojkovic

,

Paul M. Carpenter

,

,

Vladimir Cakarevic

,

,

,

Francisco J. Cazorla

,

Mario Nemirovsky

,

IEEE Trans. Computers, 2016

Sensible Energy Accounting with Abstract Metering for Multicore Systems.

[BibT_eX]

[DOI]

,

,

,

Francisco J. Cazorla

,

Daniel A. Jiménez

,

ACM Trans. Archit. Code Optim., 2016

PARSECSs: Evaluating the Impact of Task Parallelism in the PARSEC Benchmark Suite.

[BibT_eX]

[DOI]

Dimitrios Chasapis

,

,

,

,

Eduard Ayguadé

,

,

ACM Trans. Archit. Code Optim., 2016

Emergent Behaviors in the Internet of Things: The Ultimate Ultra-Large-Scale System.

[BibT_eX]

[DOI]

,

Daniel Nemirovsky

,

Mario Nemirovsky

,

Rodolfo A. Milito

,

IEEE Micro, 2016

Alya: Multiphysics engineering simulation toward exascale.

[BibT_eX]

[DOI]

Mariano Vázquez

,

Guillaume Houzeaux

,

,

Antoni Artigues

,

Jazmin Aguado-Sierra

,

,

,

,

Fernando M. Cucchietti

,

Herbert Coppola-Owen

,

,

Evan Dering Burness

,

José María Cela

,

J. Comput. Sci., 2016

Interconnection Networks in Petascale Computer Systems: A Survey.

[BibT_eX]

[DOI]

,

Radivoje Vasiljevic

,

,

Veljko Milutinovic

,

,

ACM Comput. Surv., 2016

The mont-blanc prototype: an alternative approach for HPC systems.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2016

MUSA: a multi-level simulation approach for next-generation HPC machines.

[BibT_eX]

[DOI]

,

,

Adrià Armejach

,

,

Eduard Ayguadé

,

,

,

,

Proceedings of the International Conference for High Performance Computing, 2016

Performance analysis of a hardware accelerator of dependence management for task-based dataflow programming models.

[BibT_eX]

[DOI]

,

,

Daniel Jiménez-González

,

Carlos Álvarez-Martínez

,

Eduard Ayguadé

,

Proceedings of the 2016 IEEE International Symposium on Performance Analysis of Systems and Software, 2016

A Fully Parameterizable Low Power Design of Vector Fused Multiply-Add Using Active Clock-Gating Techniques.

[BibT_eX]

[DOI]

,

,

,

,

Adrián Cristal

,

Proceedings of the 2016 International Symposium on Low Power Electronics and Design, 2016

Future Vector Microprocessor Extensions for Data Aggregations.

[BibT_eX]

[DOI]

,

,

,

Adrián Cristal

,

Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture, 2016

CATA: Criticality Aware Task Acceleration for Multicore Processors.

[BibT_eX]

[DOI]

Emilio Castillo

,

,

,

,

Enrique Vallejo

,

Kallia Chronaki

,

,

José Luis Bosque

,

,

Eduard Ayguadé

,

,

Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

Runtime-Guided Mitigation of Manufacturing Variability in Power-Constrained Multi-Socket NUMA Nodes.

[BibT_eX]

[DOI]

Dimitrios Chasapis

,

,

,

,

Eduard Ayguadé

,

,

Proceedings of the 2016 International Conference on Supercomputing, 2016

POSTER: An Integrated Vector-Scalar Design on an In-order ARM Core.

[BibT_eX]

[DOI]

,

,

,

,

,

Adrián Cristal

,

Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

POSTER: Exploiting Asymmetric Multi-Core Processors with Flexible System Sofware.

[BibT_eX]

[DOI]

Kallia Chronaki

,

,

,

,

,

Eduard Ayguadé

,

,

Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

Reducing Cache Coherence Traffic with Hierarchical Directory Cache and NUMA-Aware Runtime Scheduling.

[BibT_eX]

[DOI]

,

,

,

Hervé Gloaguen

,

,

Eduard Ayguadé

,

,

Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

2015

On-the-fly adaptive routing for dragonfly interconnection networks.

[BibT_eX]

[DOI]

,

Enrique Vallejo

,

,

Cristobal Camarero

,

,

Germán Rodríguez

,

Cyriel Minkenberg

J. Supercomput., 2015

Reimagining Heterogeneous Computing: A Functional Instruction-Set Architecture Computing Model.

[BibT_eX]

[DOI]

Daniel Nemirovsky

,

Nikola Markovic

,

,

,

Adrián Cristal

IEEE Micro, 2015

Kernel-to-User-Mode Transition-Aware Hardware Scheduling.

[BibT_eX]

[DOI]

Nikola Markovic

,

Daniel Nemirovsky

,

,

,

Adrián Cristal

IEEE Micro, 2015

New Benchmarking Methodology and Programming Model for Big Data Processing.

[BibT_eX]

[DOI]

,

,

,

Nemanja Trifunovic

,

,

Veljko Milutinovic

Int. J. Distributed Sens. Networks, 2015

Picos: A hardware runtime architecture support for OmpSs.

[BibT_eX]

[DOI]

Fahimeh Yazdanpanah

,

Carlos Álvarez

,

Daniel Jiménez-González

,

,

Future Gener. Comput. Syst., 2015

Adaptive and application dependent runtime guided hardware prefetcher reconfiguration on the IBM POWER7.

[BibT_eX]

[DOI]

,

Cristobal Ortega

,

,

,

CoRR, 2015

Thread Lock Section-Aware Scheduling on Asymmetric Single-ISA Multi-Core.

[BibT_eX]

[DOI]

Nikola Markovic

,

Daniel Nemirovsky

,

Osman S. Ünsal

,

,

Adrián Cristal

IEEE Comput. Archit. Lett., 2015

Exploiting asynchrony from exact forward recovery for DUE in iterative solvers.

[BibT_eX]

[DOI]

,

,

,

Eduard Ayguadé

,

,

Proceedings of the International Conference for High Performance Computing, 2015

Performance and Energy Efficient Hardware-Based Scheduler for Symmetric/Asymmetric CMPs.

[BibT_eX]

[DOI]

Nikola Markovic

,

Daniel Nemirovsky

,

,

,

Adrián Cristal

Proceedings of the 27th International Symposium on Computer Architecture and High Performance Computing, 2015

Imposing coarse-grained reconfiguration to general purpose processors.

[BibT_eX]

[DOI]

,

,

,

,

,

Adrián Cristal

,

,

Proceedings of the 2015 International Conference on Embedded Computer Systems: Architectures, 2015

Evaluating the Impact of OpenMP 4.0 Extensions on Relevant Parallel Workloads.

[BibT_eX]

[DOI]

,

,

,

Dimitrios Chasapis

,

,

Xavier Martorell

,

Eduard Ayguadé

,

,

Proceedings of the OpenMP: Heterogenous Execution and Data Movements, 2015

Joint Circuit-System Design Space Exploration of Multiplier Unit Structure for Energy-Efficient Vector Processors.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Adrián Cristal

,

Proceedings of the 2015 IEEE Computer Society Annual Symposium on VLSI, 2015

Coherence protocol for transparent management of scratchpad memories in shared memory manycore architectures.

[BibT_eX]

[DOI]

,

Lluís Vilanova

,

,

,

,

Xavier Martorell

,

,

Eduard Ayguadé

,

Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

Contention-Based Nonminimal Adaptive Routing in High-Radix Networks.

[BibT_eX]

[DOI]

,

Enrique Vallejo

,

,

,

Germán Rodríguez

,

Cyriel Minkenberg

,

Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

Criticality-Aware Dynamic Task Scheduling for Heterogeneous Architectures.

[BibT_eX]

[DOI]

Kallia Chronaki

,

,

,

Eduard Ayguadé

,

,

Proceedings of the 29th ACM on International Conference on Supercomputing, 2015

Increasing multicore system efficiency through intelligent bandwidth shifting.

[BibT_eX]

[DOI]

Víctor Jiménez

,

Alper Buyuktosunoglu

,

,

Francis P. O'Connell

,

Francisco J. Cazorla

,

Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015

VSR sort: A novel vectorised sorting algorithm & architecture extensions for future microprocessors.

[BibT_eX]

[DOI]

,

,

,

Adrián Cristal

,

Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015

Hardware Round-Robin Scheduler for Single-ISA Asymmetric Multi-core.

[BibT_eX]

[DOI]

Nikola Markovic

,

Daniel Nemirovsky

,

Veljko Milutinovic

,

,

,

Adrián Cristal

Proceedings of the Euro-Par 2015: Parallel Processing, 2015

Runtime-Aware Architectures.

[BibT_eX]

[DOI]

,

,

,

Emilio Castillo

,

Dimitrios Chasapis

,

,

,

,

,

Adrián Cristal

,

Eduard Ayguadé

,

,

Proceedings of the Euro-Par 2015: Parallel Processing, 2015

Throughput Unfairness in Dragonfly Networks under Realistic Traffic Patterns.

[BibT_eX]

[DOI]

,

Enrique Vallejo

,

Cristobal Camarero

,

,

Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

Spark deployment and performance evaluation on the MareNostrum supercomputer.

[BibT_eX]

[DOI]

,

Anastasios Gounaris

,

Carlos Tripiana

,

,

,

Eduard Ayguadé

,

,

Yolanda Becerra

,

,

Proceedings of the 2015 IEEE International Conference on Big Data (IEEE BigData 2015), Santa Clara, CA, USA, October 29, 2015

Runtime-Guided Management of Scratchpad Memories in Multicore Architectures.

[BibT_eX]

[DOI]

,

,

,

Emilio Castillo

,

Xavier Martorell

,

,

Eduard Ayguadé

,

Proceedings of the 2015 International Conference on Parallel Architectures and Compilation, 2015

2014

Analyzing the Efficiency of L1 Caches for Reliable Hybrid-Voltage Operation Using EDC Codes.

[BibT_eX]

[DOI]

,

,

IEEE Trans. Very Large Scale Integr. Syst., 2014

Hybrid Cache Designs for Reliable Hybrid High and Ultra-Low Voltage Operation.

[BibT_eX]

[DOI]

,

,

Francisco J. Cazorla

,

ACM Trans. Design Autom. Electr. Syst., 2014

Runtime-Aware Architectures: A First Approach.

[BibT_eX]

[DOI]

,

,

,

Eduard Ayguadé

,

Supercomput. Front. Innov., 2014

TERAFLUX: Harnessing dataflow in next generation teradevices.

[BibT_eX]

[DOI]

Microprocess. Microsystems, 2014

Using Dynamic Runtime Testing for Rapid Development of Architectural Simulators.

[BibT_eX]

[DOI]

,

Adrián Cristal

,

,

Int. J. Parallel Program., 2014

Editorial.

[BibT_eX]

[DOI]

,

Computación y Sistemas, 2014

Per-task Energy Accounting in Computing Systems.

[BibT_eX]

[DOI]

,

Víctor Jiménez

,

,

,

Francisco J. Cazorla

,

IEEE Comput. Archit. Lett., 2014

Automatic Exploration of Potential Parallelism in Sequential Applications.

[BibT_eX]

[DOI]

Vladimir Subotic

,

Eduard Ayguadé

,

,

Proceedings of the Supercomputing - 29th International Conference, 2014

DeTrans: Deterministic and Parallel execution of Transactions.

[BibT_eX]

[DOI]

Vesna Smiljkovic

,

,

Christof Fetzer

,

,

Adrián Cristal

,

Proceedings of the 26th IEEE International Symposium on Computer Architecture and High Performance Computing, 2014

Dynamic-vector execution on a general purpose EDGE chip multiprocessor.

[BibT_eX]

[DOI]

,

,

,

,

,

Adrián Cristal

,

,

,

Alexander V. Veidenbaum

Proceedings of the XIVth International Conference on Embedded Computer Systems: Architectures, 2014

PAMS: Pattern Aware Memory System for embedded systems.

[BibT_eX]

[DOI]

Tassadaq Hussain

,

,

,

,

Adrián Cristal

,

Eduard Ayguadé

,

,

Shakaib A. Gursal

Proceedings of the 2014 International Conference on ReConFigurable Computing and FPGAs, 2014

Physical vs. Physically-Aware Estimation Flow: Case Study of Design Space Exploration of Adders.

[BibT_eX]

[DOI]

,

,

,

,

Adrián Cristal

,

Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2014

CODOMs: Protecting software with Code-centric memory Domains.

[BibT_eX]

[DOI]

Lluís Vilanova

,

Muli Ben-Yehuda

,

,

,

Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture, 2014

Enabling preemptive multiprogramming on GPUs.

[BibT_eX]

[DOI]

,

,

,

,

,

Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture, 2014

Scaling Irregular Applications through Data Aggregation and Software Multithreading.

[BibT_eX]

[DOI]

Alessandro Morari

,

,

Daniel G. Chavarría-Miranda

,

,

Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

Big Data Processing: Data Flow vs Control Flow (New Benchmarking Methodology).

[BibT_eX]

[DOI]

,

Sao Tomac Jakob

,

Nemanja Trifunovic

,

,

Veljko Milutinovic

Proceedings of the International Conference on Identification, 2014

Author retrospective for software trace cache.

[BibT_eX]

[DOI]

,

,

Oliverio J. Santana

,

Proceedings of the ACM International Conference on Supercomputing 25th Anniversary Volume, 2014

Evaluation of vectorization potential of Graph500 on Intel's Xeon Phi.

[BibT_eX]

[DOI]

,

,

,

,

,

Adrián Cristal

,

Proceedings of the International Conference on High Performance Computing & Simulation, 2014

Advanced Pattern based Memory Controller for FPGA based HPC applications.

[BibT_eX]

[DOI]

Tassadaq Hussain

,

,

,

Adrián Cristal

,

Eduard Ayguadé

,

Proceedings of the International Conference on High Performance Computing & Simulation, 2014

AMMC: Advanced Multi-Core Memory Controller.

[BibT_eX]

[DOI]

Tassadaq Hussain

,

,

,

Adrián Cristal

,

Eduard Ayguadé

,

,

Shakaib A. Gursal

Proceedings of the 2014 International Conference on Field-Programmable Technology, 2014

MAPC: Memory access pattern based controller.

[BibT_eX]

[DOI]

Tassadaq Hussain

,

,

,

Adrián Cristal

,

Eduard Ayguadé

,

Proceedings of the 24th International Conference on Field Programmable Logic and Applications, 2014

APMC: advanced pattern based memory controller (abstract only).

[BibT_eX]

[DOI]

Tassadaq Hussain

,

,

,

Adrián Cristal

,

Eduard Ayguadé

,

,

Santhosh Kumar Rethinagiri

Proceedings of the 2014 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2014

DReAM: Per-Task DRAM Energy Metering in Multicore Systems.

[BibT_eX]

[DOI]

,

,

,

Francisco J. Cazorla

,

Proceedings of the Euro-Par 2014 Parallel Processing, 2014

EVX: Vector execution on low power EDGE cores.

[BibT_eX]

[DOI]

,

,

,

,

Adrián Cristal

,

,

Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2014

Dynamic transaction coalescing.

[BibT_eX]

[DOI]

,

Vasileios Karakostas

,

Vesna Smiljkovic

,

Vladimir Gajinov

,

,

Adrián Cristal

,

Proceedings of the Computing Frontiers Conference, CF'14, 2014

Characterizing the Communication Demands of the Graph500 Benchmark on a Commodity Cluster.

[BibT_eX]

[DOI]

,

José Luis Bosque

,

,

,

Cyriel Minkenberg

Proceedings of the 1st IEEE/ACM International Symposium on Big Data Computing, 2014

PVMC: Programmable Vector Memory Controller.

[BibT_eX]

[DOI]

Tassadaq Hussain

,

,

,

Adrián Cristal

,

Eduard Ayguadé

,

Proceedings of the IEEE 25th International Conference on Application-Specific Systems, 2014

Stand-Alone Memory Controller for Graphics System.

[BibT_eX]

[DOI]

Tassadaq Hussain

,

,

Osman S. Ünsal

,

Adrián Cristal

,

Eduard Ayguadé

,

,

Proceedings of the Reconfigurable Computing: Architectures, Tools, and Applications, 2014

2013

Thread Assignment of Multithreaded Network Applications in Multicore/Multithreaded Processors.

[BibT_eX]

[DOI]

Petar Radojkovic

,

Vladimir Cakarevic

,

,

,

Francisco J. Cazorla

,

Mario Nemirovsky

,

IEEE Trans. Parallel Distributed Syst., 2013

SMT Malleability in IBM POWER5 and POWER6 Processors.

[BibT_eX]

[DOI]

Alessandro Morari

,

,

Francisco J. Cazorla

,

Roberto Gioiosa

,

,

Alper Buyuktosunoglu

,

,

IEEE Trans. Computers, 2013

Profile-guided transaction coalescing - lowering transactional overheads by merging transactions.

[BibT_eX]

[DOI]

,

Vesna Smiljkovic

,

,

Adrián Cristal

,

ACM Trans. Archit. Code Optim., 2013

Fair CPU time accounting in CMP+SMT processors.

[BibT_eX]

[DOI]

,

,

Francisco J. Cazorla

,

ACM Trans. Archit. Code Optim., 2013

Hardware support for accurate per-task energy metering in multicore systems.

[BibT_eX]

[DOI]

,

,

Víctor Jiménez

,

,

Francisco J. Cazorla

,

ACM Trans. Archit. Code Optim., 2013

Programmability and portability for exascale: Top down programming methodology and tools with StarSs.

[BibT_eX]

[DOI]

Vladimir Subotic

,

Steffen Brinkmann

,

Vladimir Marjanovic

,

,

,

Christoph Niethammer

,

Eduard Ayguadé

,

,

J. Comput. Sci., 2013

Moving from petaflops to petadata.

[BibT_eX]

[DOI]

Michael J. Flynn

,

,

Veljko M. Milutinovic

,

Goran Rakocevic

,

,

,

Commun. ACM, 2013

Supercomputing with commodity CPUs: are mobile SoCs ready for HPC?

[BibT_eX]

[DOI]

,

Paul M. Carpenter

,

,

,

,

Proceedings of the International Conference for High Performance Computing, 2013

Identifying Critical Code Sections in Dataflow Programming Models.

[BibT_eX]

[DOI]

Vladimir Subotic

,

José Carlos Sancho

,

,

Proceedings of the 21st Euromicro International Conference on Parallel, 2013

On the selection of adder unit in energy efficient vector processing.

[BibT_eX]

[DOI]

,

,

,

,

Adrián Cristal

,

Proceedings of the International Symposium on Quality Electronic Design, 2013

Trace filtering of multithreaded applications for CMP memory simulation.

[BibT_eX]

[DOI]

,

,

Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software, 2013

TM-dietlibc: A TM-aware Real-World System Library.

[BibT_eX]

[DOI]

Vesna Smiljkovic

,

,

,

,

Osman S. Ünsal

,

Adrián Cristal

,

Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

HPC System Software for Regular and Irregular Parallel Applications.

[BibT_eX]

[DOI]

Alessandro Morari

,

Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013

HPCS 2013 panel: The era of exascale sciences: Challenges, needs and requirements.

[BibT_eX]

[DOI]

Franck Cappello

,

Wolfgang Gentzsch

,

,

Proceedings of the International Conference on High Performance Computing & Simulation, 2013

Efficient Routing Mechanisms for Dragonfly Networks.

[BibT_eX]

[DOI]

,

Enrique Vallejo

,

,

Miguel Odriozola

,

Proceedings of the 42nd International Conference on Parallel Processing, 2013

EcoTM: Conflict-Aware Economical Unbounded Hardware Transactional Memory.

[BibT_eX]

[DOI]

,

,

Adrián Cristal

,

,

Proceedings of the International Conference on Computational Science, 2013

OFAR-CM: Efficient Dragonfly Networks with Simple Congestion Management.

[BibT_eX]

[DOI]

,

Enrique Vallejo

,

,

,

Germán Rodríguez

Proceedings of the IEEE 21st Annual Symposium on High-Performance Interconnects, 2013

Global misrouting policies in two-level hierarchical networks.

[BibT_eX]

[DOI]

,

Enrique Vallejo

,

,

Miguel Odriozola

,

Cristobal Camarero

,

,

,

Germán Rodríguez

Proceedings of the 2013 Interconnection Network Architecture: On-Chip, Multi-Chip, 2013

The TERAFLUX Project: Exploiting the DataFlow Paradigm in Next Generation Teradevices.

[BibT_eX]

[DOI]

Proceedings of the 2013 Euromicro Conference on Digital System Design, 2013

Efficient cache architectures for reliable hybrid voltage operation using EDC codes.

[BibT_eX]

[DOI]

,

,

Proceedings of the Design, Automation and Test in Europe, 2013

APPLE: adaptive performance-predictable low-energy caches for reliable hybrid voltage operation.

[BibT_eX]

[DOI]

,

,

Proceedings of the 50th Annual Design Automation Conference 2013, 2013

Killer-mobiles: The way towards energy efficient high performance computers?

[BibT_eX]

[DOI]

Proceedings of the 13th International Conference on Application of Concurrency to System Design, 2013

2012

CPU Accounting for Multicore Processors.

[BibT_eX]

[DOI]

,

,

Francisco J. Cazorla

,

Roberto Gioiosa

,

Alper Buyuktosunoglu

,

IEEE Trans. Computers, 2012

Dynamic Tolerance Region Computing for Multimedia.

[BibT_eX]

[DOI]

Carlos Álvarez

,

,

IEEE Trans. Computers, 2012

On the simulation of large-scale architectures using multiple application abstraction levels.

[BibT_eX]

[DOI]

,

Felipe Cabarcas

,

Carlos Villavieja

,

,

,

,

,

ACM Trans. Archit. Code Optim., 2012

Hardware transactional memory with software-defined conflicts.

[BibT_eX]

[DOI]

J. Rubén Titos Gil

,

Manuel E. Acacio

,

José M. García

,

,

Adrián Cristal

,

,

,

ACM Trans. Archit. Code Optim., 2012

Parallel job scheduling for power constrained HPC systems.

[BibT_eX]

[DOI]

,

Julita Corbalán

,

,

Parallel Comput., 2012

The Problem of Evaluating CPU-GPU Systems with 3D Visualization Applications.

[BibT_eX]

[DOI]

,

,

IEEE Micro, 2012

Resource-bounded multicore emulation using Beefarm.

[BibT_eX]

[DOI]

,

,

,

,

,

Adrián Cristal

,

,

Microprocess. Microsystems, 2012

Understanding the future of energy-performance trade-off via DVFS in HPC environments.

[BibT_eX]

[DOI]

,

Julita Corbalán

,

,

J. Parallel Distributed Comput., 2012

Circuit design of a dual-versioning L1 data cache.

[BibT_eX]

[DOI]

,

Adrià Armejach

,

Adrián Cristal

,

,

,

Integr., 2012

Profiling and Optimizing Transactional Memory Applications.

[BibT_eX]

[DOI]

Ferad Zyulkyarov

,

,

,

,

Adrián Cristal

,

,

Int. J. Parallel Program., 2012

The Network Adapter: The Missing Link between MPI Applications and Network Performance.

[BibT_eX]

[DOI]

Germán Rodríguez

,

Cyriel Minkenberg

,

Ronald P. Luijten

,

,

Patrick Geoffray

,

,

,

Proceedings of the IEEE 24th International Symposium on Computer Architecture and High Performance Computing, 2012

Efficient Sorting on the Tilera Manycore Architecture.

[BibT_eX]

[DOI]

Alessandro Morari

,

,

,

,

Proceedings of the IEEE 24th International Symposium on Computer Architecture and High Performance Computing, 2012

Novel SRAM bias control circuits for a low power L1 data cache.

[BibT_eX]

[DOI]

,

Adrià Armejach

,

Adrián Cristal

,

,

Proceedings of the NORCHIP 2012, Copenhagen, Denmark, November 12-13, 2012, 2012

Vector Extensions for Decision Support DBMS Acceleration.

[BibT_eX]

[DOI]

,

,

,

Adrián Cristal

,

Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture, 2012

Improving Cache Management Policies Using Dynamic Reuse Distances.

[BibT_eX]

[DOI]

,

,

,

Rosario Cammarota

,

,

Alexander V. Veidenbaum

Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture, 2012

Evaluating the Impact of TLB Misses on Future HPC Systems.

[BibT_eX]

[DOI]

Alessandro Morari

,

Roberto Gioiosa

,

Robert W. Wisniewski

,

Bryan S. Rosenburg

,

,

Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

Enhancing the performance of assisted execution runtime systems through hardware/software techniques.

[BibT_eX]

[DOI]

,

Roberto Gioiosa

,

,

Adrián Cristal

,

Proceedings of the International Conference on Supercomputing, 2012

On-the-Fly Adaptive Routing in High-Radix Hierarchical Networks.

[BibT_eX]

[DOI]

,

Enrique Vallejo

,

,

Miguel Odriozola

,

Cristobal Camarero

,

,

Germán Rodríguez

,

,

Cyriel Minkenberg

Proceedings of the 41st International Conference on Parallel Processing, 2012

ADAM: an efficient data management mechanism for hybrid high and ultra-low voltage operation caches.

[BibT_eX]

[DOI]

,

,

Proceedings of the Great Lakes Symposium on VLSI 2012, 2012

TagTM - accelerating STMs with hardware tags for fast meta-data access.

[BibT_eX]

[DOI]

,

,

Ferad Zyulkyarov

,

Adrián Cristal

,

Osman S. Ünsal

,

Proceedings of the 2012 Design, Automation & Test in Europe Conference & Exhibition, 2012

Optimal task assignment in multithreaded processors: a statistical approach.

[BibT_eX]

[DOI]

Petar Radojkovic

,

Vladimir Cakarevic

,

,

,

,

Francisco J. Cazorla

,

Mario Nemirovsky

,

Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems, 2012

2011

Assessing Accelerator-Based HPC Reverse Time Migration.

[BibT_eX]

[DOI]

Mauricio Araya-Polo

,

,

Mauricio Hanzich

,

Miquel Pericàs

,

,

,

Muhammad Shafiq

,

,

,

Eduard Ayguadé

,

José María Cela

,

IEEE Trans. Parallel Distributed Syst., 2011

Dynamic Cache Partitioning Based on the MLP of Cache Misses.

[BibT_eX]

[DOI]

,

Francisco J. Cazorla

,

,

Trans. High Perform. Embed. Archit. Compil., 2011

A Highly Scalable Parallel Implementation of H.264.

[BibT_eX]

[DOI]

Arnaldo Azevedo

,

Ben H. H. Juurlink

,

Cor Meenderinck

,

Andrei Sergeevich Terechko

,

Jan Hoogerbrugge

,

Mauricio Alvarez

,

,

Trans. High Perform. Embed. Archit. Compil., 2011

RMS-TM: a comprehensive benchmark suite for transactional memory systems (abstracts only).

[BibT_eX]

[DOI]

,

Vasileios Karakostas

,

,

Adrián Cristal

,

,

SIGMETRICS Perform. Evaluation Rev., 2011

Exploiting intra-task slack time of load operations for DVFS in hard real-time multi-core systems.

[BibT_eX]

[DOI]

Eduardo Quiñones

,

,

Francisco J. Cazorla

,

SIGBED Rev., 2011

Energy-Aware Accounting and Billing in Large-Scale Computing Facilities.

[BibT_eX]

[DOI]

Víctor Jiménez

,

Francisco J. Cazorla

,

Roberto Gioiosa

,

,

,

Alper Buyuktosunoglu

,

,

IEEE Micro, 2011

Simulating Whole Supercomputer Applications.

[BibT_eX]

[DOI]

,

,

,

,

,

,

IEEE Micro, 2011

Hybrid Transactional Memory with Pessimistic Concurrency Control.

[BibT_eX]

[DOI]

Enrique Vallejo

,

Sutirtha Sanyal

,

,

Fernando Vallejo

,

,

,

Adrián Cristal

,

Int. J. Parallel Program., 2011

The International Exascale Software Project roadmap.

[BibT_eX]

[DOI]

Jack J. Dongarra

,

Peter H. Beckman

,

,

,

Giovanni Aloisio

,

Jean-Claude Andre

,

,

Jean-Yves Berthou

,

,

Bertrand Braunschweig

,

Franck Cappello

,

Barbara M. Chapman

,

,

Alok N. Choudhary

,

Sudip S. Dosanjh

,

Thom H. Dunning

,

,

,

,

Robert J. Harrison

,

,

Michael A. Heroux

,

,

,

,

Yutaka Ishikawa

,

,

,

,

,

,

,

Alain Lichnewsky

,

,

,

,

Satoshi Matsuoka

,

,

Peter Michielse

,

,

Matthias S. Müller

,

Wolfgang E. Nagel

,

Hiroshi Nakashima

,

Michael E. Papka

,

,

,

,

,

,

,

Thomas L. Sterling

,

,

Frederick H. Streitz

,

,

Shinji Sumimoto

,

William M. Tang

,

,

,

Anne E. Trefethen

,

,

Aad J. van der Steen

,

Jeffrey S. Vetter

,

,

Robert W. Wisniewski

,

Katherine A. Yelick

Int. J. High Perform. Comput. Appl., 2011

Characterizing Power and Temperature Behavior of POWER6-Based System.

[BibT_eX]

[DOI]

Víctor Jiménez

,

Francisco J. Cazorla

,

Roberto Gioiosa

,

,

,

,

,

,

Alper Buyuktosunoglu

,

IEEE J. Emerg. Sel. Topics Circuits Syst., 2011

Scalable multicore architectures for long DNA sequence comparison.

[BibT_eX]

[DOI]

Friman Sánchez

,

Felipe Cabarcas

,

,

Concurr. Comput. Pract. Exp., 2011

RMS-TM: a comprehensive benchmark suite for transactional memory systems.

[BibT_eX]

[DOI]

,

Vasileios Karakostas

,

,

Adrián Cristal

,

,

Proceedings of the ICPE'11, 2011

Rapid Development of Error-Free Architectural Simulators Using Dynamic Runtime Testing.

[BibT_eX]

[DOI]

,

Adrián Cristal

,

Osman S. Ünsal

,

Proceedings of the 23rd International Symposium on Computer Architecture and High Performance Computing, 2011

Breaking the bandwidth wall in chip multiprocessors.

[BibT_eX]

[DOI]

,

Felipe Cabarcas

,

,

Proceedings of the 2011 International Conference on Embedded Computer Systems: Architectures, 2011

IA^3: An Interference Aware Allocation Algorithm for Multicore Hard Real-Time Systems.

[BibT_eX]

[DOI]

,

Eduardo Quiñones

,

Francisco J. Cazorla

,

Robert I. Davis

,

Proceedings of the 17th IEEE Real-Time and Embedded Technology and Applications Symposium, 2011

The Impact of Application's Micro-Imbalance on the Communication-Computation Overlap.

[BibT_eX]

[DOI]

Vladimir Subotic

,

José Carlos Sancho

,

,

Proceedings of the 19th International Euromicro Conference on Parallel, 2011

Hybrid Parallel Programming with MPI/StarSs.

[BibT_eX]

[DOI]

,

Vladimir Marjanovic

,

Eduard Ayguadé

,

,

Proceedings of the Applications, Tools and Techniques on the Road to Exascale Computing, Proceedings of the conference ParCo 2011, 31 August, 2011

An Abstraction Methodology for the Evaluation of Multi-core Multi-threaded Architectures.

[BibT_eX]

[DOI]

,

,

Jorge García-Vidal

,

Mario Nemirovsky

,

Rodolfo A. Milito

,

Proceedings of the MASCOTS 2011, 2011

Trace-driven simulation of multithreaded applications.

[BibT_eX]

[DOI]

,

Alejandro Duran

,

Felipe Cabarcas

,

,

,

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2011

A Quantitative Analysis of OS Noise.

[BibT_eX]

[DOI]

Alessandro Morari

,

Roberto Gioiosa

,

Robert W. Wisniewski

,

Francisco J. Cazorla

,

Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

RVC-based time-predictable faulty caches for safety-critical systems.

[BibT_eX]

[DOI]

,

Eduardo Quiñones

,

Francisco J. Cazorla

,

,

Proceedings of the 17th IEEE International On-Line Testing Symposium (IOLTS 2011), 2011

Linear programming based parallel job scheduling for power constrained systems.

[BibT_eX]

[DOI]

,

Julita Corbalán

,

,

Proceedings of the 2011 International Conference on High Performance Computing & Simulation, 2011

FIMSIM: A fault injection infrastructure for microarchitectural simulators.

[BibT_eX]

[DOI]

,

,

Adrián Cristal

,

Proceedings of the IEEE 29th International Conference on Computer Design, 2011

RVC: a mechanism for time-analyzable real-time processors with faulty caches.

[BibT_eX]

[DOI]

,

Eduardo Quiñones

,

Francisco J. Cazorla

,

,

Proceedings of the High Performance Embedded Architectures and Compilers, 2011

Circuit design of a dual-versioning L1 data cache for optimistic concurrency.

[BibT_eX]

[DOI]

,

Adrià Armejach

,

Adrián Cristal

,

,

,

Proceedings of the 21st ACM Great Lakes Symposium on VLSI 2010, 2011

TMbox: A Flexible and Reconfigurable 16-Core Hybrid Transactional Memory System.

[BibT_eX]

[DOI]

,

,

,

,

Adrián Cristal

,

,

,

Proceedings of the IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines, 2011

Quantifying the Potential Task-Based Dataflow Parallelism in MPI Applications.

[BibT_eX]

[DOI]

Vladimir Subotic

,

,

José Carlos Sancho

,

,

Proceedings of the Euro-Par 2011 Parallel Processing - 17th International Conference, 2011

Hybrid high-performance low-power and ultra-low energy reliable caches.

[BibT_eX]

[DOI]

,

,

Francisco J. Cazorla

,

Proceedings of the 8th Conference on Computing Frontiers, 2011

From Plasma to BeeFarm: Design Experience of an FPGA-Based Multicore Prototype.

[BibT_eX]

[DOI]

,

,

,

,

Adrián Cristal

,

,

,

Proceedings of the Reconfigurable Computing: Architectures, Tools and Applications, 2011

SymptomTM: Symptom-Based Error Detection and Recovery Using Hardware Transactional Memory.

[BibT_eX]

[DOI]

,

,

Adrián Cristal

,

,

Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011

STM2: A Parallel STM for High Performance Simultaneous Multithreading Systems.

[BibT_eX]

[DOI]

,

Roberto Gioiosa

,

,

,

Adrián Cristal

,

,

Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011

Using a Reconfigurable L1 Data Cache for Efficient Version Management in Hardware Transactional Memory.

[BibT_eX]

[DOI]

Adrià Armejach

,

,

J. Rubén Titos Gil

,

,

Adrián Cristal

,

,

Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011

2010

On the Problem of Evaluating the Performance of Multiprogrammed Workloads.

[BibT_eX]

[DOI]

Francisco J. Cazorla

,

,

Oliverio J. Santana

,

Enrique Fernández

,

IEEE Trans. Computers, 2010

Multicore: The View from Europe.

[BibT_eX]

[DOI]

,

IEEE Micro, 2010

Utilization driven power-aware parallel job scheduling.

[BibT_eX]

[DOI]

,

Julita Corbalán

,

,

Comput. Sci. Res. Dev., 2010

Trends and techniques for energy efficient architectures.

[BibT_eX]

[DOI]

Víctor Jiménez

,

Roberto Gioiosa

,

,

Francisco J. Cazorla

,

,

Alper Buyuktosunoglu

,

,

Proceedings of the 18th IEEE/IFIP VLSI-SoC 2010, 2010

Debugging programs that use atomic blocks and transactional memory.

[BibT_eX]

[DOI]

Ferad Zyulkyarov

,

,

,

Adrián Cristal

,

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2010

Thread to strand binding of parallel network applications in massive multi-threaded systems.

[BibT_eX]

[DOI]

Petar Radojkovic

,

Vladimir Cakarevic

,

,

,

Francisco J. Cazorla

,

Mario Nemirovsky

,

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2010

Effective communication and computation overlap with hybrid MPI/SMPSs.

[BibT_eX]

[DOI]

Vladimir Marjanovic

,

,

Eduard Ayguadé

,

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2010

Architectural Support for Fair Reader-Writer Locking.

[BibT_eX]

[DOI]

Enrique Vallejo

,

,

Adrián Cristal

,

,

Fernando Vallejo

,

,

Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture, 2010

Task Superscalar: An Out-of-Order Task Pipeline.

[BibT_eX]

[DOI]

,

Felipe Cabarcas

,

,

,

,

Eduard Ayguadé

,

,

Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture, 2010

Simulation environment for studying overlap of communication and computation.

[BibT_eX]

[DOI]

Vladimir Subotic

,

,

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2010

Adapting cache partitioning algorithms to pseudo-LRU replacement policies.

[BibT_eX]

[DOI]

Kamil Kedzierski

,

,

Francisco J. Cazorla

,

Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

BSLD threshold driven power management policy for HPC centers.

[BibT_eX]

[DOI]

,

Julita Corbalán

,

,

Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Comparing last-level cache designs for CMP architectures.

[BibT_eX]

[DOI]

,

,

Felipe Cabarcas

,

,

Proceedings of the Second International Forum on Next-Generation Multicore/Manycore Technologies, 2010

Power and performance aware reconfigurable cache for CMPs.

[BibT_eX]

[DOI]

Kamil Kedzierski

,

Francisco J. Cazorla

,

Roberto Gioiosa

,

Alper Buyuktosunoglu

,

Proceedings of the Second International Forum on Next-Generation Multicore/Manycore Technologies, 2010

Overlapping communication and computation by using a hybrid MPI/SMPSs approach.

[BibT_eX]

[DOI]

Vladimir Marjanovic

,

,

Eduard Ayguadé

,

Proceedings of the 24th International Conference on Supercomputing, 2010

Optimizing job performance under a given power constraint in HPC centers.

[BibT_eX]

[DOI]

,

Julita Corbalán

,

,

Proceedings of the International Green Computing Conference 2010, 2010

Long DNA Sequence Comparison on Multicore Architectures.

[BibT_eX]

[DOI]

Friman Sánchez

,

Felipe Cabarcas

,

,

Proceedings of the Euro-Par 2010 - Parallel Processing, 16th International Euro-Par Conference, Ischia, Italy, August 31, 2010

A Simulation Framework to Automatically Analyze the Communication-Computation Overlap in Scientific Applications.

[BibT_eX]

[DOI]

Vladimir Subotic

,

José Carlos Sancho

,

,

Proceedings of the 2010 IEEE International Conference on Cluster Computing, 2010

Designing OS for HPC Applications: Scheduling.

[BibT_eX]

[DOI]

Roberto Gioiosa

,

,

Proceedings of the 2010 IEEE International Conference on Cluster Computing, 2010

Scalability Analysis of Progressive Alignment on a Multicore.

[BibT_eX]

[DOI]

Sebastián Isaza

,

Friman Sánchez

,

Georgi Gaydadjiev

,

,

Proceedings of the CISIS 2010, 2010

Load balancing using dynamic cache allocation.

[BibT_eX]

[DOI]

,

Francisco J. Cazorla

,

Rizos Sakellariou

,

Proceedings of the 7th Conference on Computing Frontiers, 2010

Exploiting Inactive Rename Slots for Detecting Soft Errors.

[BibT_eX]

[DOI]

,

,

Osman S. Ünsal

,

Proceedings of the Architecture of Computing Systems, 2010

Discovering and understanding performance bottlenecks in transactional applications.

[BibT_eX]

[DOI]

Ferad Zyulkyarov

,

,

,

,

Adrián Cristal

,

,

Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, 2010

Efficient runahead threads.

[BibT_eX]

[DOI]

Tanausú Ramírez

,

,

Oliverio J. Santana

,

,

Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, 2010

Power and thermal characterization of POWER6 system.

[BibT_eX]

[DOI]

Víctor Jiménez

,

Francisco J. Cazorla

,

Roberto Gioiosa

,

,

,

,

,

,

Alper Buyuktosunoglu

,

Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, 2010

2009

DIA: A Complexity-Effective Decoding Architecture.

[BibT_eX]

[DOI]

Oliverio J. Santana

,

,

,

IEEE Trans. Computers, 2009

Available task-level parallelism on the Cell BE.

[BibT_eX]

[DOI]

,

,

Sci. Program., 2009

FlexDCP: a QoS framework for CMP architectures.

[BibT_eX]

[DOI]

,

Francisco J. Cazorla

,

,

Rizos Sakellariou

,

ACM SIGOPS Oper. Syst. Rev., 2009

Evaluación del rendimiento paralelo en el nivel macro bloque del decodificador H.264 en una arquitectura multiprocesador cc-NUMA.

[BibT_eX]

[DOI]

Mauricio Alvarez

,

,

,

Arnaldo Azevedo

,

Cor Meenderinck

Rev. Avances en Sistemas Informática, 2009

BSC Vision Towards Exascale.

[BibT_eX]

[DOI]

,

Eduard Ayguadé

,

Int. J. High Perform. Comput. Appl., 2009

The International Exascale Software Project: a Call To Cooperative Action By the Global High-Performance Community.

[BibT_eX]

[DOI]

Jack J. Dongarra

,

Peter H. Beckman

,

,

Franck Cappello

,

,

Satoshi Matsuoka

,

,

,

,

Anne E. Trefethen

,

Int. J. High Perform. Comput. Appl., 2009

An Analyzable Memory Controller for Hard Real-Time CMPs.

[BibT_eX]

[DOI]

,

Eduardo Quiñones

,

Francisco J. Cazorla

,

IEEE Embed. Syst. Lett., 2009

CPU Accounting in CMP Processors.

[BibT_eX]

[DOI]

,

,

Francisco J. Cazorla

,

Roberto Gioiosa

,

Alper Buyuktosunoglu

,

IEEE Comput. Archit. Lett., 2009

Thread to Core Assignment in SMT On-Chip Multiprocessors.

[BibT_eX]

[DOI]

,

Francisco J. Cazorla

,

,

Proceedings of the 21st International Symposium on Computer Architecture and High Performance Computing, 2009

Atomic quake: using transactional memory in an interactive multiplayer game server.

[BibT_eX]

[DOI]

Ferad Zyulkyarov

,

Vladimir Gajinov

,

,

Adrián Cristal

,

Eduard Ayguadé

,

,

Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2009

Turbocharging boosted transactions or: how i learnt to stop worrying and love longer transactions.

[BibT_eX]

[DOI]

Chinmay Eishan Kulkarni

,

,

Adrián Cristal

,

Eduard Ayguadé

,

Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2009

EazyHTM: eager-lazy hardware transactional memory.

[BibT_eX]

[DOI]

,

Cristian Perfumo

,

Chinmay Eishan Kulkarni

,

Adrià Armejach

,

Adrián Cristal

,

,

,

Proceedings of the 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009), 2009

Characterizing the resource-sharing levels in the UltraSPARC T2 processor.

[BibT_eX]

[DOI]

Vladimir Cakarevic

,

Petar Radojkovic

,

,

,

Francisco J. Cazorla

,

Mario Nemirovsky

,

Proceedings of the 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009), 2009

Hardware support for WCET analysis of hard real-time multicore systems.

[BibT_eX]

[DOI]

,

Eduardo Quiñones

,

Francisco J. Cazorla

,

,

Proceedings of the 36th International Symposium on Computer Architecture (ISCA 2009), 2009

Taking the heat off transactions: Dynamic selection of pessimistic concurrency control.

[BibT_eX]

[DOI]

,

,

Adrián Cristal

,

,

Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

Clock gate on abort: Towards energy-efficient hardware Transactional Memory.

[BibT_eX]

[DOI]

Sutirtha Sanyal

,

,

Adrián Cristal

,

,

Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

Power-aware load balancing of large scale MPI applications.

[BibT_eX]

[DOI]

,

Julita Corbalán

,

,

,

Alexander V. Veidenbaum

Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

A european perspective on supercomputing.

[BibT_eX]

[DOI]

Proceedings of the 23rd international conference on Supercomputing, 2009

Exploring pattern-aware routing in generalized fat tree networks.

[BibT_eX]

[DOI]

Germán Rodríguez

,

,

Cyriel Minkenberg

,

,

Proceedings of the 23rd international conference on Supercomputing, 2009

QuakeTM: parallelizing a complex sequential application using transactional memory.

[BibT_eX]

[DOI]

Vladimir Gajinov

,

Ferad Zyulkyarov

,

,

Adrián Cristal

,

Eduard Ayguadé

,

,

Proceedings of the 23rd international conference on Supercomputing, 2009

Code Semantic-Aware Runahead Threads.

[BibT_eX]

[DOI]

Tanausú Ramírez

,

,

Oliverio J. Santana

,

Proceedings of the ICPP 2009, 2009

Scalability of Macroblock-level Parallelism for H.264 Decoding.

[BibT_eX]

[DOI]

Mauricio Alvarez-Mesa

,

,

Arnaldo Azevedo

,

Cor Meenderinck

,

Ben H. H. Juurlink

,

Proceedings of the 15th IEEE International Conference on Parallel and Distributed Systems, 2009

Dynamically Filtering Thread-Local Variables in Lazy-Lazy Hardware Transactional Memory.

[BibT_eX]

[DOI]

Sutirtha Sanyal

,

,

Adrián Cristal

,

,

Proceedings of the 11th IEEE International Conference on High Performance Computing and Communications, 2009

Oblivious routing schemes in extended generalized Fat Tree networks.

[BibT_eX]

[DOI]

Germán Rodríguez

,

Cyriel Minkenberg

,

,

Ronald P. Luijten

,

,

Proceedings of the 2009 IEEE International Conference on Cluster Computing, August 31, 2009

Quantitative analysis of sequence alignment applications on multiprocessor architectures.

[BibT_eX]

[DOI]

Friman Sánchez

,

,

Proceedings of the 6th Conference on Computing Frontiers, 2009

ITCA: Inter-task Conflict-Aware CPU Accounting for CMPs.

[BibT_eX]

[DOI]

,

,

Francisco J. Cazorla

,

Roberto Gioiosa

,

Alper Buyuktosunoglu

,

Proceedings of the PACT 2009, 2009

2008

Multicore Resource Management.

[BibT_eX]

[DOI]

,

,

Francisco J. Cazorla

,

,

,

IEEE Micro, 2008

Nebelung: Execution Environment for Transactional OpenMP.

[BibT_eX]

[DOI]

Milos Milovanovic

,

,

Vladimir Gajinov

,

,

Adrián Cristal

,

Eduard Ayguadé

,

Int. J. Parallel Program., 2008

Power-efficient VLIW design using clustering and widening.

[BibT_eX]

[DOI]

Miquel Pericàs

,

Eduard Ayguadé

,

,

,

Int. J. Embed. Syst., 2008

Vectorized AES Core for High-throughput Secure Environments.

[BibT_eX]

[DOI]

Miquel Pericàs

,

,

Georgi Gaydadjiev

,

Stamatis Vassiliadis

,

Proceedings of the High Performance Computing for Computational Science, 2008

A dynamic scheduler for balancing HPC applications.

[BibT_eX]

[DOI]

,

Roberto Gioiosa

,

Francisco J. Cazorla

,

Proceedings of the ACM/IEEE Conference on High Performance Computing, 2008

Measuring Operating System Overhead on CMT Processors.

[BibT_eX]

[DOI]

Petar Radojkovic

,

Vladimir Cakarevic

,

,

Alejandro Pajuelo

,

Roberto Gioiosa

,

Francisco J. Cazorla

,

Mario Nemirovsky

,

Proceedings of the 20th International Symposium on Computer Architecture and High Performance Computing, 2008

Selection of the Register File Size and the Resource Allocation Policy on SMT Processors.

[BibT_eX]

[DOI]

Jesús Alastruey

,

,

Francisco J. Cazorla

,

Víctor Viñals

,

Proceedings of the 20th International Symposium on Computer Architecture and High Performance Computing, 2008

Preliminary Analysis of the Cell BE Processor Limitations for Sequence Alignment Applications.

[BibT_eX]

[DOI]

Sebastián Isaza

,

Friman Sánchez

,

Georgi Gaydadjiev

,

,

Proceedings of the Embedded Computer Systems: Architectures, 2008

A distributed processor state management architecture for large-window processors.

[BibT_eX]

[DOI]

Isidro Gonzalez

,

,

Alexander V. Veidenbaum

,

Marco Antonio Ramírez

,

Adrián Cristal

,

Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-41 2008), 2008

WormBench: a configurable workload for evaluating transactional memory systems.

[BibT_eX]

[DOI]

Ferad Zyulkyarov

,

Adrián Cristal

,

,

Eduard Ayguadé

,

,

,

Proceedings of the 9th workshop on MEmory performance, 2008

A Two-Level Load/Store Queue Based on Execution Locality.

[BibT_eX]

[DOI]

Miquel Pericàs

,

Adrián Cristal

,

Francisco J. Cazorla

,

Rubén González

,

Alexander V. Veidenbaum

,

Daniel A. Jiménez

,

Proceedings of the 35th International Symposium on Computer Architecture (ISCA 2008), 2008

Software-Controlled Priority Characterization of POWER5 Processor.

[BibT_eX]

[DOI]

,

Francisco J. Cazorla

,

Roberto Gioiosa

,

Alper Buyuktosunoglu

,

,

Proceedings of the 35th International Symposium on Computer Architecture (ISCA 2008), 2008

Balancing HPC applications through smart allocation of resources in MT processors.

[BibT_eX]

[DOI]

,

Roberto Gioiosa

,

Francisco J. Cazorla

,

Julita Corbalán

,

,

Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

MFLUSH: Handling Long-Latency Loads in SMT On-Chip Multiprocessors.

[BibT_eX]

[DOI]

,

Francisco J. Cazorla

,

,

Proceedings of the 2008 International Conference on Parallel Processing, 2008

Runahead Threads to improve SMT performance.

[BibT_eX]

[DOI]

Tanausú Ramírez

,

,

Oliverio J. Santana

,

Proceedings of the 14th International Conference on High-Performance Computer Architecture (HPCA-14 2008), 2008

Supercomputing for the Future, Supercomputing from the Past (Keynote).

[BibT_eX]

[DOI]

,

Proceedings of the High Performance Embedded Architectures and Compilers, 2008

LPA: A First Approach to the Loop Processor Architecture.

[BibT_eX]

[DOI]

Alejandro García

,

Oliverio J. Santana

,

Enrique Fernández

,

,

Proceedings of the High Performance Embedded Architectures and Compilers, 2008

Architecture Performance Prediction Using Evolutionary Artificial Neural Networks.

[BibT_eX]

[DOI]

Pedro A. Castillo

,

Antonio Miguel Mora

,

Juan Julián Merelo Guervós

,

Juan Luis Jiménez Laredo

,

,

Francisco J. Cazorla

,

,

Proceedings of the Applications of Evolutionary Computing, 2008

The limits of software transactional memory (STM): dissecting Haskell STM applications on a many-core environment.

[BibT_eX]

[DOI]

Cristian Perfumo

,

,

,

,

Adrián Cristal

,

,

Proceedings of the 5th Conference on Computing Frontiers, 2008

Evolutionary system for prediction and optimization of hardware architecture performance.

[BibT_eX]

[DOI]

Pedro Ángel Castillo Valdivieso

,

Juan Julián Merelo Guervós

,

,

Francisco J. Cazorla

,

,

Antonio Miguel Mora

,

Juan Luis Jiménez Laredo

,

Proceedings of the IEEE Congress on Evolutionary Computation, 2008

Soft Real-Time Scheduling on SMT Processors with Explicit Resource Allocation.

[BibT_eX]

[DOI]

,

Francisco J. Cazorla

,

Roberto Gioiosa

,

Proceedings of the Architecture of Computing Systems, 2008

MultiLayer processing - an execution model for parallel stateful packet processing.

[BibT_eX]

[DOI]

,

Mario Nemirovsky

,

Proceedings of the 2008 ACM/IEEE Symposium on Architecture for Networking and Communications Systems, 2008

2007

Enlarging Instruction Streams.

[BibT_eX]

[DOI]

Oliverio J. Santana

,

,

IEEE Trans. Computers, 2007

Energy saving through a simple load control mechanism.

[BibT_eX]

[DOI]

Tanausú Ramírez

,

,

Oliverio J. Santana

,

SIGARCH Comput. Archit. News, 2007

Transactional Memory: An Overview.

[BibT_eX]

[DOI]

,

Adrián Cristal

,

,

Eduard Ayguadé

,

Fabrizio Gagliardi

,

,

IEEE Micro, 2007

Explaining Dynamic Cache Partitioning Speed Ups.

[BibT_eX]

[DOI]

,

Francisco J. Cazorla

,

,

IEEE Comput. Archit. Lett., 2007

unreadTVar: Extending Haskell Software Transactional Memory for Performance.

[BibT_eX]

,

Cristian Perfumo

,

,

Adrián Cristal

,

,

Proceedings of the Eighth Symposium on Trends in Functional Programming, 2007

Online Prediction of Applications Cache Utility.

[BibT_eX]

[DOI]

,

Francisco J. Cazorla

,

,

Proceedings of the 2007 International Conference on Embedded Computer Systems: Architectures, 2007

On the Problem of Minimizing Workload Execution Time in SMT Processors.

[BibT_eX]

[DOI]

Francisco J. Cazorla

,

Enrique Fernández

,

Peter M. W. Knijnenburg

,

,

Rizos Sakellariou

,

Proceedings of the 2007 International Conference on Embedded Computer Systems: Architectures, 2007

Multithreaded software transactional memory and OpenMP.

[BibT_eX]

[DOI]

Milos Milovanovic

,

,

Vladimir Gajinov

,

,

Adrián Cristal

,

Eduard Ayguadé

,

Proceedings of the 2007 workshop on MEmory performance, 2007

Transactional Memory and OpenMP.

[BibT_eX]

[DOI]

Milos Milovanovic

,

,

,

Adrián Cristal

,

Xavier Martorell

,

Eduard Ayguadé

,

,

Proceedings of the A Practical Programming Model for the Multi-Core Era, 2007

Performance Impact of Unaligned Memory Operations in SIMD Extensions for Video Codec Applications.

[BibT_eX]

[DOI]

Mauricio Alvarez

,

,

,

Proceedings of the 2007 IEEE International Symposium on Performance Analysis of Systems and Software, 2007

Microarchitectural Support for Speculative Register Renaming.

[BibT_eX]

[DOI]

Jesús Alastruey

,

,

Víctor Viñals

,

Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

HD-VideoBench. A Benchmark for Evaluating High Definition Digital Video Applications.

[BibT_eX]

[DOI]

Mauricio Alvarez

,

,

,

Proceedings of the IEEE 10th International Symposium on Workload Characterization, 2007

Hardware Transactional Memory with Operating System Support, HTMOS.

[BibT_eX]

[DOI]

,

Adrián Cristal

,

,

Proceedings of the Euro-Par 2007 Workshops: Parallel Processing, 2007

Implicit Transactional Memory in Kilo-Instruction Multiprocessors.

[BibT_eX]

[DOI]

,

Enrique Vallejo

,

Adrián Cristal

,

Fernando Vallejo

,

,

,

,

Proceedings of the Advances in Computer Systems Architecture, 2007

FAME: FAirly MEasuring Multithreaded Architectures.

[BibT_eX]

[DOI]

,

Francisco J. Cazorla

,

,

Oliverio J. Santana

,

Enrique Fernández

,

Proceedings of the 16th International Conference on Parallel Architectures and Compilation Techniques (PACT 2007), 2007

Runahead Threads: Reducing Resource Contention in SMT Processors.

[BibT_eX]

[DOI]

Tanausú Ramírez

,

,

Oliverio J. Santana

,

Proceedings of the 16th International Conference on Parallel Architectures and Compilation Techniques (PACT 2007), 2007

A Flexible Heterogeneous Multi-Core Architecture.

[BibT_eX]

[DOI]

Miquel Pericàs

,

Adrián Cristal

,

Francisco J. Cazorla

,

Rubén González

,

Daniel A. Jiménez

,

Proceedings of the 16th International Conference on Parallel Architectures and Compilation Techniques (PACT 2007), 2007

MLP-Aware Dynamic Cache Partitioning.

[BibT_eX]

[DOI]

,

Francisco J. Cazorla

,

,

Proceedings of the 16th International Conference on Parallel Architectures and Compilation Techniques (PACT 2007), 2007

2006

A DRAM/SRAM Memory Scheme for Fast Packet Buffers.

[BibT_eX]

[DOI]

Jorge García-Vidal

,

,

Llorenç Cerdà

,

,

IEEE Trans. Computers, 2006

Predictable Performance in SMT Processors: Synergy between the OS and SMTs.

[BibT_eX]

[DOI]

Francisco J. Cazorla

,

Peter M. W. Knijnenburg

,

Rizos Sakellariou

,

Enrique Fernández

,

,

IEEE Trans. Computers, 2006

Performance, power efficiency and scalability of asymmetric cluster chip multiprocessors.

[BibT_eX]

[DOI]

,

,

,

,

Eduard Ayguadé

IEEE Comput. Archit. Lett., 2006

A simple speculative load control mechanism for energy saving.

[BibT_eX]

[DOI]

Tanausú Ramírez

,

,

Oliverio J. Santana

,

Proceedings of the 2006 workshop on MEmory performance, 2006

Performance Analysis of Sequence Alignment Applications.

[BibT_eX]

[DOI]

Friman Sánchez

,

,

,

Proceedings of the 2006 IEEE International Symposium on Workload Characterization, 2006

A decoupled KILO-instruction processor.

[BibT_eX]

[DOI]

Miquel Pericàs

,

Adrián Cristal

,

Rubén González

,

Daniel A. Jiménez

,

Proceedings of the 12th International Symposium on High-Performance Computer Architecture, 2006

Kilo-instruction processors, runahead and prefetching.

[BibT_eX]

[DOI]

Tanausú Ramírez

,

,

Oliverio J. Santana

,

Proceedings of the Third Conference on Computing Frontiers, 2006

Speculative early register release.

[BibT_eX]

[DOI]

Jesús Alastruey

,

,

Víctor Viñals

,

Proceedings of the Third Conference on Computing Frontiers, 2006

Branch predictor guided instruction decoding.

[BibT_eX]

[DOI]

Oliverio J. Santana

,

,

,

Proceedings of the 15th International Conference on Parallel Architectures and Compilation Techniques (PACT 2006), 2006

2005

Software Trace Cache.

[BibT_eX]

[DOI]

,

Josep Lluís Larriba-Pey

,

IEEE Trans. Computers, 2005

Fuzzy Memoization for Floating-Point Multimedia Applications.

[BibT_eX]

[DOI]

Carlos Álvarez

,

,

IEEE Trans. Computers, 2005

Dynamic memory interval test vs. interprocedural pointer analysis in multimedia applications.

[BibT_eX]

[DOI]

,

ACM Trans. Archit. Code Optim., 2005

The impact of traffic aggregation on the memory performance of networking applications.

[BibT_eX]

[DOI]

,

Jorge García-Vidal

,

Mario Nemirovsky

,

SIGARCH Comput. Archit. News, 2005

Speculative execution for hiding memory latency.

[BibT_eX]

[DOI]

,

Antonio González

,

SIGARCH Comput. Archit. News, 2005

Better Branch Prediction Through Prophet/Critic Hybrids.

[BibT_eX]

[DOI]

,

,

,

,

IEEE Micro, 2005

Kilo-Instruction Processors: Overcoming the Memory Wall.

[BibT_eX]

[DOI]

Adrián Cristal

,

Oliverio J. Santana

,

Francisco J. Cazorla

,

,

Tanausú Ramírez

,

Miquel Pericàs

,

IEEE Micro, 2005

Hardware support for early register release.

[BibT_eX]

[DOI]

,

Víctor Viñals

,

Antonio González

,

Int. J. High Perform. Comput. Netw., 2005

On the Scalability of 1- and 2-Dimensional SIMD Extensions for Multimedia Applications.

[BibT_eX]

[DOI]

Friman Sánchez

,

Mauricio Alvarez

,

,

,

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005

Performance Analysis of a New Packet Trace Compressor based on TCP Flow Clustering.

[BibT_eX]

[DOI]

,

,

Jorge García-Vidal

,

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005

Workload Characterization of Stateful Networking Applications.

[BibT_eX]

[DOI]

,

Mario Nemirovsky

,

Jorge García-Vidal

,

Proceedings of the High-Performance Computing - 6th International Symposium, 2005

Multiple Stream Prediction.

[BibT_eX]

[DOI]

Oliverio J. Santana

,

,

Proceedings of the High-Performance Computing - 6th International Symposium, 2005

Decoupled State-Execute Architecture.

[BibT_eX]

[DOI]

Miquel Pericàs

,

Adrián Cristal

,

Rubén González

,

Alexander V. Veidenbaum

,

Proceedings of the High-Performance Computing - 6th International Symposium, 2005

Exploiting Execution Locality with a Decoupled Kilo-Instruction Processor.

[BibT_eX]

[DOI]

Miquel Pericàs

,

Adrián Cristal

,

Rubén González

,

Daniel A. Jiménez

,

Proceedings of the High-Performance Computing - 6th International Symposium, 2005

Control-Flow Independence Reuse via Dynamic Vectorization.

[BibT_eX]

[DOI]

,

Antonio González

,

Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

Effective Instruction Prefetching via Fetch Prestaging.

[BibT_eX]

[DOI]

,

,

Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

An asymmetric clustered processor based on value content.

[BibT_eX]

[DOI]

Rubén González

,

Adrián Cristal

,

Miquel Pericàs

,

,

Alexander V. Veidenbaum

Proceedings of the 19th Annual International Conference on Supercomputing, 2005

Implementing Kilo-Instruction Multiprocessors.

[BibT_eX]

[DOI]

Enrique Vallejo

,

,

Adrián Cristal

,

Fernando Vallejo

,

,

,

,

Proceedings of the International Conference on Pervasive Services 2005, 2005

A Vector-µSIMD-VLIW Architecture for Multimedia Applications.

[BibT_eX]

[DOI]

,

Proceedings of the 34th International Conference on Parallel Processing (ICPP 2005), 2005

A Complexity-Effective Simultaneous Multithreading Architecture.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the 34th International Conference on Parallel Processing (ICPP 2005), 2005

A New Pointer-based Instruction Queue Design and Its Power-Performance Evaluation.

[BibT_eX]

[DOI]

Marco Antonio Ramírez

,

Adrián Cristal

,

,

Alexander V. Veidenbaum

,

Proceedings of the 23rd International Conference on Computer Design (ICCD 2005), 2005

Architectural support for real-time task scheduling in SMT processors.

[BibT_eX]

[DOI]

Francisco J. Cazorla

,

Peter M. W. Knijnenburg

,

Rizos Sakellariou

,

Enrique Fernández

,

,

Proceedings of the 2005 International Conference on Compilers, 2005

Architectural impact of stateful networking applications.

[BibT_eX]

[DOI]

,

Jorge García-Vidal

,

Mario Nemirovsky

,

Proceedings of the 2005 ACM/IEEE Symposium on Architecture for Networking and Communications Systems, 2005

2004

Register Constrained Modulo Scheduling.

[BibT_eX]

[DOI]

,

,

Eduard Ayguadé

,

IEEE Trans. Parallel Distributed Syst., 2004

Late Allocation and Early Release of Physical Registers.

[BibT_eX]

[DOI]

,

Víctor Viñals

,

José González

,

Antonio González

,

IEEE Trans. Computers, 2004

A low-complexity fetch architecture for high-performance superscalar processors.

[BibT_eX]

[DOI]

Oliverio J. Santana

,

,

Josep Lluís Larriba-Pey

,

ACM Trans. Archit. Code Optim., 2004

Toward kilo-instruction processors.

[BibT_eX]

[DOI]

Adrián Cristal

,

Oliverio J. Santana

,

,

José F. Martínez

ACM Trans. Archit. Code Optim., 2004

A case for resource-conscious out-of-order processors: towards kilo-instruction in-flight processors.

[BibT_eX]

[DOI]

Adrián Cristal

,

José F. Martínez

,

,

SIGARCH Comput. Archit. News, 2004

QoS for High-Performance SMT Processors in Embedded Systems.

[BibT_eX]

[DOI]

Francisco J. Cazorla

,

,

,

Peter M. W. Knijnenburg

,

Rizos Sakellariou

,

Enrique Fernández

IEEE Micro, 2004

Software and Hardware Techniques to Optimize Register File Utilization in VLIW Architectures.

[BibT_eX]

[DOI]

,

,

Eduard Ayguadé

,

Int. J. Parallel Program., 2004

Dynamic Memory Instruction Bypassing.

[BibT_eX]

[DOI]

,

,

Eduard Ayguadé

Int. J. Parallel Program., 2004

A partitioned instruction queue to reduce instruction wakeup energy.

[BibT_eX]

[DOI]

Marco Antonio Ramírez

,

Adrián Cristal

,

,

Alexander V. Veidenbaum

,

Int. J. High Perform. Comput. Netw., 2004

High-performance and low-power VLIW cores for numerical computations.

[BibT_eX]

[DOI]

Miquel Pericàs

,

Eduard Ayguadé

,

,

,

Int. J. High Perform. Comput. Netw., 2004

A latency-conscious SMT branch prediction architecture.

[BibT_eX]

[DOI]

,

Oliverio J. Santana

,

,

Int. J. High Perform. Comput. Netw., 2004

Future ILP processors.

[BibT_eX]

[DOI]

Adrián Cristal

,

,

,

Int. J. High Perform. Comput. Netw., 2004

Optimising long-latency-load-aware fetch policies for SMT processors.

[BibT_eX]

[DOI]

Francisco J. Cazorla

,

,

,

Enrique Fernández

Int. J. High Perform. Comput. Netw., 2004

Evaluating kilo-instruction multiprocessors.

[BibT_eX]

[DOI]

,

,

Valentin Puente

,

José-Ángel Gregorio

,

Adrián Cristal

,

Proceedings of the 3rd Workshop on Memory Performance Issues, 2004

Initial Evaluation of Multimedia Extensions on VLIW Architectures.

[BibT_eX]

[DOI]

,

Proceedings of the Computer Systems: Architectures, 2004

Performance and Power Evaluation of Clustered VLIW Processors with Wide Functional Units.

[BibT_eX]

[DOI]

Miquel Pericàs

,

Eduard Ayguadé

,

,

,

Proceedings of the Computer Systems: Architectures, 2004

An Optimized Front-End Physical Register File with Banking and Writeback Filtering.

[BibT_eX]

[DOI]

Miquel Pericàs

,

Rubén González

,

Adrián Cristal

,

Alexander V. Veidenbaum

,

Proceedings of the Power-Aware Computer Systems, 4th International Workshop, 2004

Dynamically Controlled Resource Allocation in SMT Processors.

[BibT_eX]

[DOI]

Francisco J. Cazorla

,

,

,

Enrique Fernández

Proceedings of the 37th Annual International Symposium on Microarchitecture (MICRO-37 2004), 2004

The impact of traffic aggregation on the memory performance of networking applications.

[BibT_eX]

[DOI]

,

,

Mario Nemirovsky

,

Proceedings of the 2004 workshop on MEmory performance, 2004

A Content Aware Integer Register File Organization.

[BibT_eX]

[DOI]

Rubén González

,

Adrián Cristal

,

,

Alexander V. Veidenbaum

,

Proceedings of the 31st International Symposium on Computer Architecture (ISCA 2004), 2004

Prophet/Critic Hybrid Branch Prediction.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the 31st International Symposium on Computer Architecture (ISCA 2004), 2004

DCache Warn: An I-Fetch Policy to Increase SMT Efficiency.

[BibT_eX]

[DOI]

Francisco J. Cazorla

,

,

,

Enrique Fernández

Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), 2004

A Low-Complexity, High-Performance Fetch Unit for Simultaneous Multithreading Processors.

[BibT_eX]

[DOI]

,

,

Proceedings of the 10th International Conference on High-Performance Computer Architecture (HPCA-10 2004), 2004

Out-of-Order Commit Processors.

[BibT_eX]

[DOI]

Adrián Cristal

,

,

,

Proceedings of the 10th International Conference on High-Performance Computer Architecture (HPCA-10 2004), 2004

Enabling SMT for real-time embedded systems.

[BibT_eX]

[DOI]

Francisco J. Cazorla

,

Peter M. W. Knijnenburg

,

Rizos Sakellariou

,

Enrique Fernández

,

,

Proceedings of the 2004 12th European Signal Processing Conference, 2004

Maintaining Thousands of In-flight Instructions.

[BibT_eX]

[DOI]

Adrián Cristal

,

Oliverio J. Santana

,

Proceedings of the Euro-Par 2004 Parallel Processing, 2004

Feasibility of QoS for SMT.

[BibT_eX]

[DOI]

Francisco J. Cazorla

,

Peter M. W. Knijnenburg

,

Rizos Sakellariou

,

Enrique Fernández

,

,

Proceedings of the Euro-Par 2004 Parallel Processing, 2004

Implicit vs. Explicit Resource Allocation in SMT Processors.

[BibT_eX]

[DOI]

Francisco J. Cazorla

,

Peter M. W. Knijnenburg

,

Rizos Sakellariou

,

Enrique Fernández

,

,

Proceedings of the 2004 Euromicro Symposium on Digital Systems Design (DSD 2004), Architectures, Methods and Tools, 31 August, 2004

A first glance at Kilo-instruction based multiprocessors.

[BibT_eX]

[DOI]

,

Valentin Puente

,

Adrián Cristal

,

,

José-Ángel Gregorio

,

Proceedings of the First Conference on Computing Frontiers, 2004

Predictable performance in SMT processors.

[BibT_eX]

[DOI]

Francisco J. Cazorla

,

Peter M. W. Knijnenburg

,

Rizos Sakellariou

,

Enrique Fernández

,

,

Proceedings of the First Conference on Computing Frontiers, 2004

Reducing Fetch Architecture Complexity Using Procedure Inlining.

[BibT_eX]

[DOI]

Oliverio J. Santana

,

,

Proceedings of the 8th Annual Workshop on Interaction between Compilers and Computer Architecture (INTERACT-8 2004), 2004

2003

A Cost-Effective Architecture for Vectorizable Numerical and Multimedia Applications.

[BibT_eX]

[DOI]

Francisca Quintana

,

,

,

Theory Comput. Syst., 2003

A Case for Resource-conscious Out-of-order Processors.

[BibT_eX]

[DOI]

Adrián Cristal

,

José F. Martínez

,

,

IEEE Comput. Archit. Lett., 2003

Design and Implementation of High-Performance Memory Systems for Future Packet Buffers.

[BibT_eX]

[DOI]

Jorge García-Vidal

,

,

Llorenç Cerdà

,

Proceedings of the 36th Annual International Symposium on Microarchitecture, 2003

An MPEG-4 performance study for non-SIMD, general purpose architectures.

[BibT_eX]

[DOI]

,

,

Proceedings of the 2003 IEEE International Symposium on Performance Analysis of Systems and Software, 2003

A Simple Low-Energy Instruction Wakeup Mechanism.

[BibT_eX]

[DOI]

Marco Antonio Ramírez

,

Adrián Cristal

,

Alexander V. Veidenbaum

,

,

Proceedings of the High Performance Computing, 5th International Symposium, 2003

Power-Performance Trade-Offs in Wide and Clustered VLIW Cores for Numerical Codes.

[BibT_eX]

[DOI]

Miquel Pericàs

,

Eduard Ayguadé

,

,

,

Proceedings of the High Performance Computing, 5th International Symposium, 2003

Tolerating Branch Predictor Latency on SMT.

[BibT_eX]

[DOI]

,

Oliverio J. Santana

,

,

Proceedings of the High Performance Computing, 5th International Symposium, 2003

Kilo-instruction Processors.

[BibT_eX]

[DOI]

Adrián Cristal

,

,

,

Proceedings of the High Performance Computing, 5th International Symposium, 2003

Improving Memory Latency Aware Fetch Policies for SMT Processors.

[BibT_eX]

[DOI]

Francisco J. Cazorla

,

Enrique Fernández

,

,

Proceedings of the High Performance Computing, 5th International Symposium, 2003

Hierarchical Clustered Register File Organization for VLIW Processors.

[BibT_eX]

[DOI]

,

,

Eduard Ayguadé

,

Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), 2003

A conflict-free memory banking architecture for fast VOQ packet buffers.

[BibT_eX]

[DOI]

,

Llorenç Cerdà

,

,

Proceedings of the Global Telecommunications Conference, 2003

2002

Errata on "Measuring Experimental Error in Microprocessor Simulation".

[BibT_eX]

[DOI]

Rajagopalan Desikan

,

,

Stephen W. Keckler

,

José-Lorenzo Cruz

,

Fernando Latorre

,

Antonio González

,

SIGARCH Comput. Archit. News, 2002

Software Trace Cache for Commercial Applications.

[BibT_eX]

[DOI]

,

Josep Lluís Larriba-Pey

,

,

,

Josep Torrellas

Int. J. Parallel Program., 2002

Initial Results on Fuzzy Floating Point Computation for Multimedia Processors.

[BibT_eX]

[DOI]

Carlos Álvarez

,

,

,

IEEE Comput. Archit. Lett., 2002

Fetching instruction streams.

[BibT_eX]

[DOI]

,

Oliverio J. Santana

,

Josep Lluís Larriba-Pey

,

Proceedings of the 35th Annual International Symposium on Microarchitecture, 2002

Three-dimensional memory vectorization for high bandwidth media memory systems.

[BibT_eX]

[DOI]

,

,

Proceedings of the 35th Annual International Symposium on Microarchitecture, 2002

A Comprehensive Analysis of Indirect Branch Prediction.

[BibT_eX]

[DOI]

Oliverio J. Santana

,

,

Enrique Fernández

,

,

,

Proceedings of the High Performance Computing, 4th International Symposium, 2002

Studying New Ways for Improving Adaptive History Length Branch Predictors.

[BibT_eX]

[DOI]

,

Oliverio J. Santana

,

,

Enrique Fernández

,

,

Proceedings of the High Performance Computing, 4th International Symposium, 2002

Speculative Dynamic Vectorization.

[BibT_eX]

[DOI]

,

Antonio González

,

Proceedings of the 29th International Symposium on Computer Architecture (ISCA 2002), 2002

Hardware Schemes for Early Register Release.

[BibT_eX]

[DOI]

,

Víctor Viñals

,

Antonio González

,

Proceedings of the 31st International Conference on Parallel Processing (ICPP 2002), 2002

A Comparative Study of Redundancy in Trace Caches (Research Note).

[BibT_eX]

[DOI]

Hans Vandierendonck

,

,

Koenraad De Bosschere

,

Proceedings of the Euro-Par 2002, 2002

Cost effective memory disambiguation for multimedia codes.

[BibT_eX]

[DOI]

,

,

Carlos Álvarez

,

Proceedings of the International Conference on Compilers, 2002

Cost-Effective Compiler Directed Memory Prefetching and Bypassing.

[BibT_eX]

[DOI]

,

Eduard Ayguadé

,

,

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques (PACT 2002), 2002

2001

Cost-Conscious Strategies to Increase Performance of Numerical Programs on Aggressive VLIW Architectures.

[BibT_eX]

[DOI]

,

,

,

Eduard Ayguadé

IEEE Trans. Computers, 2001

Lifetime-Sensitive Modulo Scheduling in a Production Environment.

[BibT_eX]

[DOI]

,

Eduard Ayguadé

,

Antonio González

,

,

IEEE Trans. Computers, 2001

Parallel architecture and compilation techniques: selection of workshop papers, guests' editors introduction.

[BibT_eX]

[DOI]

Sandro Bartolini

,

,

,

Cosimo Antonio Prete

,

SIGARCH Comput. Archit. News, 2001

Instruction fetch architectures and code layout optimizations.

[BibT_eX]

[DOI]

,

Josep Lluís Larriba-Pey

,

Proc. IEEE, 2001

Early 21st Century Processors - Guest Editors' Introduction.

[BibT_eX]

[DOI]

Sriram Vajapeyam

,

Computer, 2001

Modulo scheduling with integrated register spilling for clustered VLIW architectures.

[BibT_eX]

[DOI]

,

,

Eduard Ayguadé

,

Proceedings of the 34th Annual International Symposium on Microarchitecture, 2001

<i>MIRS</i>: Modulo Scheduling with Integrated Register Spilling.

[BibT_eX]

[DOI]

,

,

Eduard Ayguadé

,

Proceedings of the Languages and Compilers for Parallel Computing, 2001

Code layout optimizations for transaction processing workloads.

[BibT_eX]

[DOI]

,

Luiz André Barroso

,

Kourosh Gharachorloo

,

,

Josep Lluís Larriba-Pey

,

P. Geoffrey Lowney

,

Proceedings of the 28th Annual International Symposium on Computer Architecture, 2001

A novel renaming mechanism that boosts software prefetching.

[BibT_eX]

[DOI]

,

,

Eduard Ayguadé

Proceedings of the 15th international conference on Supercomputing, 2001

On the potential of tolerant region reuse for multimedia applications.

[BibT_eX]

[DOI]

Carlos Álvarez

,

,

,

Proceedings of the 15th international conference on Supercomputing, 2001

DLP + TLP Processors for the Next Generation of Media Workloads.

[BibT_eX]

[DOI]

,

,

Proceedings of the Seventh International Symposium on High-Performance Computer Architecture (HPCA'01), 2001

Topic 15+20: Multimedia and Embedded Systems.

[BibT_eX]

[DOI]

Stamatis Vassiliadis

,

Francky Catthoor

,

,

Proceedings of the Euro-Par 2001: Parallel Processing, 2001

Branch Prediction Using Profile Data.

[BibT_eX]

[DOI]

,

Josep Lluís Larriba-Pey

,

Proceedings of the Euro-Par 2001: Parallel Processing, 2001

On the Efficiency of Reductions in µ-SIMD Media Extensions.

[BibT_eX]

[DOI]

,

,

Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques (PACT 2001), 2001

2000

Dynamic Register Renaming Through Virtual-Physical Registers.

[BibT_eX]

[DOI]

,

Antonio González

,

,

José González

,

Víctor Viñals

J. Instr. Level Parallelism, 2000

Improved spill code generation for software pipelined loops.

[BibT_eX]

[DOI]

,

,

Eduard Ayguadé

,

Proceedings of the 2000 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2000

Two-level hierarchical register file organization for VLIW processors.

[BibT_eX]

[DOI]

,

,

Eduard Ayguadé

,

Proceedings of the 33rd Annual IEEE/ACM International Symposium on Microarchitecture, 2000

Architectures for One Billion of Transistors.

[BibT_eX]

[DOI]

Proceedings of the 13th International Symposium on System Synthesis, 2000

Multiple-banked register file architectures.

[BibT_eX]

[DOI]

José-Lorenzo Cruz

,

Antonio González

,

,

Nigel P. Topham

Proceedings of the 27th International Symposium on Computer Architecture (ISCA 2000), 2000

Trace Cache Redundancy: Red & Blue Traces.

[BibT_eX]

[DOI]

,

Josep Lluís Larriba-Pey

,

Proceedings of the Sixth International Symposium on High-Performance Computer Architecture, 2000

On the Performance of Fetch Engines Running DSS Workloads.

[BibT_eX]

[DOI]

,

,

Josep Lluís Larriba-Pey

,

Proceedings of the Euro-Par 2000, Parallel Processing, 6th International Euro-Par Conference, Munich, Germany, August 29, 2000

Parallel Computer Architecture.

[BibT_eX]

[DOI]

Silvia M. Müller

,

,

,

Stamatis Vassiliadis

Proceedings of the Euro-Par 2000, Parallel Processing, 6th International Euro-Par Conference, Munich, Germany, August 29, 2000

The Effect of Code Reordering on Branch Prediction.

[BibT_eX]

[DOI]

,

Josep Lluís Larriba-Pey

,

Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques (PACT'00), 2000

1999

A Simulation Study of Decoupled Vector Architectures.

[BibT_eX]

[DOI]

,

J. Supercomput., 1999

Enhancing and Exploiting the Locality.

[BibT_eX]

[DOI]

Veljko M. Milutinovic

,

IEEE Trans. Computers, 1999

MOM: a Matrix SIMD Instruction Set Architecture for Multimedia Applications.

[BibT_eX]

[DOI]

,

,

Proceedings of the ACM/IEEE Conference on Supercomputing, 1999

Delaying Physical Register Allocation through Virtual-Physical Registers.

[BibT_eX]

[DOI]

,

Antonio González

,

,

José González

,

Víctor Viñals

Proceedings of the 32nd Annual IEEE/ACM International Symposium on Microarchitecture, 1999

Exploiting a New Level of DLP in Multimedia Applications.

[BibT_eX]

[DOI]

,

,

Proceedings of the 32nd Annual IEEE/ACM International Symposium on Microarchitecture, 1999

Software trace cache.

[BibT_eX]

[DOI]

,

Josep Lluís Larriba-Pey

,

,

Josep Torrellas

,

Proceedings of the 13th international conference on Supercomputing, 1999

Adding a vector unit to a superscalar processor.

[BibT_eX]

[DOI]

Francisca Quintana

,

,

,

Proceedings of the 13th international conference on Supercomputing, 1999

Increasing effective IPC by exploiting distant parallelism.

[BibT_eX]

[DOI]

,

,

Eduard Ayguadé

,

Proceedings of the 13th international conference on Supercomputing, 1999

Optimization of Instruction Fetch for Decision Support Workloads.

[BibT_eX]

[DOI]

,

Josep Lluís Larriba-Pey

,

,

,

,

Josep Torrellas

Proceedings of the International Conference on Parallel Processing 1999, 1999

Impact on Performance of Fused Multiply-Add Units in Aggressive VLIW Architectures.

[BibT_eX]

[DOI]

,

,

Eduard Ayguadé

,

Proceedings of the International Conference on Parallel Processing 1999, 1999

Instruction-Level Parallelism and Uniprocessor Architecture - Introduction.

[BibT_eX]

[DOI]

,

Proceedings of the Euro-Par '99 Parallel Processing, 5th International Euro-Par Conference, Toulouse, France, August 31, 1999

Quantifying the Benefits of SPECint Distant Parallelism in Simultaneous Multi-Threading Architectures.

[BibT_eX]

[DOI]

,

,

Venkata Krishnan

,

Eduard Ayguadé

,

Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques, 1999

1998

Modulo Scheduling with Reduced Register Pressure.

[BibT_eX]

[DOI]

,

,

Eduard Ayguadé

,

Antonio González

IEEE Trans. Computers, 1998

Quantitative Evaluation of Register Pressure on Software Pipelined Loops.

[BibT_eX]

[DOI]

,

Eduard Ayguadé

,

Int. J. Parallel Program., 1998

Registers Size Influence on Vector Architectures.

[BibT_eX]

[DOI]

,

,

Proceedings of the Vector and Parallel Processing, 1998

An ISA Comparison Between Superscalar and Vector Processors.

[BibT_eX]

[DOI]

Francisca Quintana

,

,

Proceedings of the Vector and Parallel Processing, 1998

Effective usage of vector registers in decoupled vector architectures.

[BibT_eX]

[DOI]

,

,

Proceedings of the Sixth Euromicro Workshop on Parallel and Distributed Processing, 1998

A case for merging the ILP and DLP paradigms.

[BibT_eX]

[DOI]

Francisca Quintana

,

,

Proceedings of the Sixth Euromicro Workshop on Parallel and Distributed Processing, 1998

Widening Resources: A Cost-effective Technique for Aggressive ILP Architectures.

[BibT_eX]

[DOI]

,

,

,

Eduard Ayguadé

Proceedings of the 31st Annual IEEE/ACM International Symposium on Microarchitecture, 1998

A Performance Study of Out-of-order Vector Architectures and Short Registers.

[BibT_eX]

[DOI]

,

,

Proceedings of the 12th international conference on Supercomputing, 1998

Resource Widening Versus Replication: Limits and Performance-cost Trade-off.

[BibT_eX]

[DOI]

,

,

,

Eduard Ayguadé

Proceedings of the 12th international conference on Supercomputing, 1998

Vector Architectures: Past, Present and Future.

[BibT_eX]

[DOI]

,

,

Proceedings of the 12th international conference on Supercomputing, 1998

Virtual-Physical Registers.

[BibT_eX]

[DOI]

Antonio González

,

José González

,

Proceedings of the Fourth International Symposium on High-Performance Computer Architecture, Las Vegas, Nevada, USA, January 31, 1998

Command Vector Memory Systems: High Performance at Low Cost.

[BibT_eX]

[DOI]

,

,

Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques, 1998

1997

Exploiting instruction- and data-level parallelism.

[BibT_eX]

[DOI]

,

IEEE Micro, 1997

Out-of-Order Vector Architectures.

[BibT_eX]

[DOI]

,

,

Proceedings of the Thirtieth Annual IEEE/ACM International Symposium on Microarchitecture, 1997

Increasing Memory Bandwidth with Wide Buses: Compiler, Hardware and Performance Trade-Offs.

[BibT_eX]

[DOI]

,

,

,

Eduard Ayguadé

Proceedings of the 11th international conference on Supercomputing, 1997

Eliminating Cache Conflict Misses through XOR-Based Placement Functions.

[BibT_eX]

[DOI]

Antonio González

,

,

Nigel P. Topham

,

Joan-Manuel Parcerisa

Proceedings of the 11th international conference on Supercomputing, 1997

A Victim Cache for Vector Registers.

[BibT_eX]

[DOI]

,

Proceedings of the 11th international conference on Supercomputing, 1997

Multithreaded Vector Architectures.

[BibT_eX]

[DOI]

,

Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture (HPCA '97), 1997

Virtual registers.

[BibT_eX]

[DOI]

Antonio González

,

,

José González

,

Teresa Monreal Arnal

Proceedings of the Fourth International on High-Performance Computing, 1997

Simultaneous multithreaded vector architecture: merging ILP and DLP for high performance.

[BibT_eX]

[DOI]

,

Proceedings of the Fourth International on High-Performance Computing, 1997

Effective Usage of Vector Registers in Advanced Vector Architectures.

[BibT_eX]

[DOI]

,

,

Proceedings of the 1997 Conference on Parallel Architectures and Compilation Techniques (PACT '97), 1997

Static Locality Analysis for Cache Management.

[BibT_eX]

[DOI]

F. Jesús Sánchez

,

Antonio González

,

Proceedings of the 1997 Conference on Parallel Architectures and Compilation Techniques (PACT '97), 1997

1996

Loop Parallelization: Revisiting Framework of Unimodular Transformations.

[BibT_eX]

[DOI]

,

Eduard Ayguadé

,

,

Proceedings of the 4th Euromicro Workshop on Parallel and Distributed Processing (PDP '96), 1996

Heuristics for Register-Constrained Software Pipelining.

[BibT_eX]

[DOI]

,

,

Eduard Ayguadé

Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture, 1996

Decoupled Vector Architectures.

[BibT_eX]

[DOI]

,

Proceedings of the Second International Symposium on High-Performance Computer Architecture, 1996

Swing module scheduling: a lifetime-sensitive approach.

[BibT_eX]

[DOI]

,

Antonio González

,

Eduard Ayguadé

,

Proceedings of the Fifth International Conference on Parallel Architectures and Compilation Techniques, 1996

1995

Conflict-Free Access for Streams in Multimodule Memories.

[BibT_eX]

[DOI]

,

,

,

Eduard Ayguadé

IEEE Trans. Computers, 1995

Analyzing reference patterns in automatic data distribution tools.

[BibT_eX]

[DOI]

Eduard Ayguadé

,

,

,

Mercè Gironés

,

Int. J. Parallel Program., 1995

Quantitative analysis of vector code.

[BibT_eX]

[DOI]

,

,

,

,

Eduard Ayguadé

Proceedings of the 3rd Euromicro Workshop on Parallel and Distributed Processing (PDP '95), 1995

Hypernode reduction modulo scheduling.

[BibT_eX]

[DOI]

,

,

Eduard Ayguadé

,

Antonio González

Proceedings of the 28th Annual International Symposium on Microarchitecture, Ann Arbor, Michigan, USA, November 29, 1995

Vector Multiprocessors with Arbitrated Memory Access.

[BibT_eX]

[DOI]

,

,

Eduard Ayguadé

,

Proceedings of the 22nd Annual International Symposium on Computer Architecture, 1995

A Data Cache with Multiple Caching Strategies Tuned to Different Types of Locality.

[BibT_eX]

[DOI]

Antonio González

,

,

Proceedings of the 9th international conference on Supercomputing, 1995

Non-Consistent Dual Register Files to Reduce Register Pressure.

[BibT_eX]

[DOI]

,

,

Eduard Ayguadé

Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture (HPCA 1995), 1995

Automatic generation of loop scheduling for VLIW.

[BibT_eX]

[DOI]

Cristina Barrado

,

,

Eduard Ayguadé

,

Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques, 1995

1994

Network Synchronization and Out-of-Order Access to Vectors.

[BibT_eX]

[DOI]

,

Eduard Ayguadé

,

Parallel Process. Lett., 1994

Access To Vectors In Multi-module Memories.

[BibT_eX]

[DOI]

,

,

Eduard Ayguadé

Proceedings of the Second Euromicro Workshop on Parallel and Distributed Processing, 1994

Detecting and Using Affinity in an Automatic Data Distribution Tool.

[BibT_eX]

[DOI]

Eduard Ayguadé

,

,

Mercè Gironés

,

,

,

Proceedings of the Languages and Compilers for Parallel Computing, 1994

Synchronized access to streams in SIMD vector multiprocessors.

[BibT_eX]

[DOI]

,

,

Eduard Ayguadé

Proceedings of the 8th international conference on Supercomputing, 1994

Memory Access Synchronization in Vector Multiprocessors.

[BibT_eX]

[DOI]

,

,

Eduard Ayguadé

Proceedings of the Parallel Processing: CONPAR 94, 1994

Using Sacks to Organize Registers in VLIW Machines.

[BibT_eX]

[DOI]

,

,

José A. B. Fortes

,

Eduard Ayguadé

Proceedings of the Parallel Processing: CONPAR 94, 1994

1993

Chairmen's introduction.

[BibT_eX]

[DOI]

,

Jordi Cortadella

,

Antonio González

Microprocess. Microprogramming, 1993

Conflict-free access to streams in multiprocessor systems.

[BibT_eX]

[DOI]

,

,

Eduard Ayguadé

,

Microprocess. Microprogramming, 1993

Access to streams in multiprocessor systems.

[BibT_eX]

[DOI]

,

,

Eduard Ayguadé

Proceedings of the 1993 Euromicro Workshop on Parallel and Distributed Processing, 1993

Align and Distribute-based Linear Loop Transformations.

[BibT_eX]

[DOI]

,

Eduard Ayguadé

,

,

Proceedings of the Languages and Compilers for Parallel Computing, 1993

1992

A method for implementation of one-dimensional systolic algorithms with data contraflow using pipelined functional units.

[BibT_eX]

[DOI]

Miguel Valero-García

,

Juan J. Navarro

,

José María Llabería

,

,

J. VLSI Signal Process., 1992

Increasing the Number of Strides for Conflict-Free Vector Access.

[BibT_eX]

[DOI]

,

,

José M. Llabería

,

,

Eduard Ayguadé

,

Juan J. Navarro

Proceedings of the 19th Annual International Symposium on Computer Architecture. Gold Coast, 1992

Conflict-free access of vectors with power-of-two strides.

[BibT_eX]

[DOI]

,

,

Eduard Ayguadé

Proceedings of the 6th international conference on Supercomputing, 1992

1991

Conflict-Free Strides for Vectors in Matched Memories.

[BibT_eX]

[DOI]

,

,

José María Llabería

,

,

Juan J. Navarro

,

Eduard Ayguadé

Parallel Process. Lett., 1991

Balanced Loop Partitioning Using GTS.

[BibT_eX]

[DOI]

,

Eduard Ayguadé

,

,

,

José M. Llabería

Proceedings of the Languages and Compilers for Parallel Computing, 1991

On Automatic Loop Data-Mapping for Distributed-Memory Multiprocessors.

[BibT_eX]

[DOI]

,

Eduard Ayguadé

,

,

José M. Llabería

,

Proceedings of the Distributed Memory Computing, 2nd European Conference, 1991

Mapping QR decomposition of a banded matrix on a ID systolic array with data contraflow and pipelined functional units.

[BibT_eX]

Miguel Valero-García

,

Juan J. Navarro

,

José J. M. Liabería

,

,

Proceedings of the Algorithms and Parallel VLSI Architectures II, 1991

1990

Implementation of systolic algorithms using pipelined functional units.

[BibT_eX]

[DOI]

Miguel Valero-García

,

Juan J. Navarro

,

José M. Llabería

,

Proceedings of the Application Specific Array Processors, 1990

1989

A block algorithm and optimal fixed-size systolic array processor for the algebraic path problem.

[BibT_eX]

[DOI]

Fernando J. Nuñez

,

J. VLSI Signal Process., 1989

Systematic Hardware Adaptation of Systolic Algorithms.

[BibT_eX]

[DOI]

Miguel Valero-García

,

Juan J. Navarro

,

José M. Llabería

,

Proceedings of the 16th Annual International Symposium on Computer Architecture. Jerusalem, 1989

1987

A Discrete Optimization Problem in Local Networks and Data Alignment.

[BibT_eX]

[DOI]

Miguel Angel Fiol

,

J. Luis A. Yebra

,

,

IEEE Trans. Computers, 1987

Partitioning: An Essential Step in Mapping Algorithms Into Systolic Array Processors.

[BibT_eX]

[DOI]

Juan J. Navarro

,

José M. Llabería

,

Computer, 1987

1986

Computing Size-Independent Matrix Problems on Systolic Array Processors.

[BibT_eX]

[DOI]

Juan J. Navarro

,

José M. Llabería

,

Proceedings of the 13th Annual Symposium on Computer Architecture, Tokyo, Japan, June 1986, 1986

Solving Matrix Problems with No Size Restriction on a Systolic Array Processor.

[BibT_eX]

Juan J. Navarro

,

José M. Llabería

,

Proceedings of the International Conference on Parallel Processing, 1986

1985

Analysis and Simulation of Multiplexed Single-Bus Networks With and Without Buffering.

[BibT_eX]

[DOI]

José M. Llabería

,

,

Enrique Herrada Lillo

,

Proceedings of the 12th Annual Symposium on Computer Architecture, 1985

1983

Reduction of Connections for Multibus Organization.

[BibT_eX]

[DOI]

,

,

Miguel Angel Fiol

IEEE Trans. Computers, 1983

A performance evaluation of the multiple bus network for multiprocessor systems.

[BibT_eX]

[DOI]

,

José María Llabería

,

,

Emilio Sanvicente

,

Proceedings of the International Conference on Measurements and Modeling of Computer Systems, 1983

1982

Bandwidth of Crossbar and Multiple-Bus Connections for Multiprocessors.

[BibT_eX]

[DOI]

,

,

IEEE Trans. Computers, 1982

Loading...