Alex Ramírez

Dimitrios S. Nikolopoulos

David R. Kaeli

Satoshi Matsuoka

Proceedings of the Euro-Par 2012 Parallel Processing - 18th International Conference, 2012

2011

Dynamic Cache Partitioning Based on the MLP of Cache Misses.

[BibT_eX]

[DOI]

Trans. High Perform. Embed. Archit. Compil., 2011

A Highly Scalable Parallel Implementation of H.264.

[BibT_eX]

[DOI]

Arnaldo Azevedo

Ben H. H. Juurlink

Cor Meenderinck

Andrei Sergeevich Terechko

Trans. High Perform. Embed. Archit. Compil., 2011

Simulating Whole Supercomputer Applications.

[BibT_eX]

[DOI]

IEEE Micro, 2011

ACOTES Project: Advanced Compiler Technologies for Embedded Streaming.

[BibT_eX]

[DOI]

Int. J. Parallel Program., 2011

Scalable multicore architectures for long DNA sequence comparison.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2011

Breaking the bandwidth wall in chip multiprocessors.

[BibT_eX]

[DOI]

Proceedings of the 2011 International Conference on Embedded Computer Systems: Architectures, 2011

Supercomputing: Past, present, and a possible future.

[BibT_eX]

[DOI]

Proceedings of the 2011 International Conference on Embedded Computer Systems: Architectures, 2011

Trace-driven simulation of multithreaded applications.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2011

On the memory system requirements of future scientific applications: Four case-studies.

[BibT_eX]

[DOI]

Milan Pavlovic

Yoav Etsion

Proceedings of the 2011 IEEE International Symposium on Workload Characterization, 2011

FELI: HW/SW Support for On-Chip Distributed Shared Memory in Multicores.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2011 Parallel Processing - 17th International Conference, 2011

Scaling HMMER Performance on Multicore Architectures.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Complex, 2011

Parametrizing multicore architectures for multiple sequence alignment.

[BibT_eX]

[DOI]

Proceedings of the 8th Conference on Computing Frontiers, 2011

Scalability Evaluation of a Polymorphic Register File: A CG Case Study.

[BibT_eX]

[DOI]

Catalin Bogdan Ciobanu

Proceedings of the Architecture of Computing Systems - ARCS 2011, 2011

DiDi: Mitigating the Performance Impact of TLB Shootdowns Using a Shared TLB Directory.

[BibT_eX]

[DOI]

Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011

2010

Advancing Computational Science, Visualization and Homeland Security Research/ Education at Minority Serving Institutions National Model Promoted/ Implemented by MSI-CIEC (Minority Serving Institutions-CyberInfrastructure Empowerment Coalition).

[BibT_eX]

[DOI]

Proceedings of the International Conference on Computational Science, 2010

The SARC Architecture.

[BibT_eX]

[DOI]

Felipe Cabarcas

Ben H. H. Juurlink

Mauricio Alvarez-Mesa

Friman Sánchez

Arnaldo Azevedo

Cor Meenderinck

Catalin Bogdan Ciobanu

Sebastián Isaza

Georgi Gaydadjiev

IEEE Micro, 2010

ArchExplorer for Automatic Design Space Exploration.

[BibT_eX]

[DOI]

IEEE Micro, 2010

A Polymorphic Register File for matrix operations.

[BibT_eX]

[DOI]

Catalin Bogdan Ciobanu

Georgi Kuzmanov

Georgi Gaydadjiev

Proceedings of the 2010 International Conference on Embedded Computer Systems: Architectures, 2010

Interleaving granularity on high bandwidth memory architecture for CMPs.

[BibT_eX]

[DOI]

Proceedings of the 2010 International Conference on Embedded Computer Systems: Architectures, 2010

Task Superscalar: An Out-of-Order Task Pipeline.

[BibT_eX]

[DOI]

Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture, 2010

Can Manycores Support the Memory Requirements of Scientific Applications?

[BibT_eX]

[DOI]

Milan Pavlovic

Yoav Etsion

Proceedings of the Computer Architecture, 2010

Comparing last-level cache designs for CMP architectures.

[BibT_eX]

[DOI]

Proceedings of the Second International Forum on Next-Generation Multicore/Manycore Technologies, 2010

Buffer Sizing for Self-timed Stream Programs on Heterogeneous Distributed Memory Multiprocessors.

[BibT_eX]

[DOI]

Paul M. Carpenter

Eduard Ayguadé

Proceedings of the High Performance Embedded Architectures and Compilers, 2010

Long DNA Sequence Comparison on Multicore Architectures.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2010 - Parallel Processing, 16th International Euro-Par Conference, Ischia, Italy, August 31, 2010

Starsscheck: A Tool to Find Errors in Task-Based Parallel Programs.

[BibT_eX]

[DOI]

Paul M. Carpenter

Eduard Ayguadé

Proceedings of the Euro-Par 2010 - Parallel Processing, 16th International Euro-Par Conference, Ischia, Italy, August 31, 2010

Empowering Business Students - Using Web 2.0 Tools in the Classroom.

[BibT_eX]

Proceedings of the CSEDU 2010 - Proceedings of the Second International Conference on Computer Supported Education, Valencia, Spain, April 7-10, 2010, 2010

Scalability Analysis of Progressive Alignment on a Multicore.

[BibT_eX]

[DOI]

Proceedings of the CISIS 2010, 2010

2009

Parallel Scalability of Video Decoders.

[BibT_eX]

[DOI]

J. Signal Process. Syst., 2009

DIA: A Complexity-Effective Decoding Architecture.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2009

Available task-level parallelism on the Cell BE.

[BibT_eX]

[DOI]

Alejandro Rico

Andrei Sergeevich Terechko

Sci. Program., 2009

CellSs: Scheduling techniques to better exploit memory hierarchy.

[BibT_eX]

[DOI]

Sci. Program., 2009

FlexDCP: a QoS framework for CMP architectures.

[BibT_eX]

[DOI]

ACM SIGOPS Oper. Syst. Rev., 2009

Evaluación del rendimiento paralelo en el nivel macro bloque del decodificador H.264 en una arquitectura multiprocesador cc-NUMA.

[BibT_eX]

[DOI]

Rev. Avances en Sistemas Informática, 2009

Thread to Core Assignment in SMT On-Chip Multiprocessors.

[BibT_eX]

[DOI]

Proceedings of the 21st International Symposium on Computer Architecture and High Performance Computing, 2009

Scalability of Macroblock-level Parallelism for H.264 Decoding.

[BibT_eX]

[DOI]

Mauricio Alvarez-Mesa

Proceedings of the 15th IEEE International Conference on Parallel and Distributed Systems, 2009

Parallel H.264 Decoding on an Embedded Multicore Processor.

[BibT_eX]

[DOI]

Arnaldo Azevedo

Cor Meenderinck

Ben H. H. Juurlink

Jan Hoogerbrugge

Mauricio Alvarez

Proceedings of the High Performance Embedded Architectures and Compilers, 2009

Quantitative analysis of sequence alignment applications on multiprocessor architectures.

[BibT_eX]

[DOI]

Friman Sánchez

Proceedings of the 6th Conference on Computing Frontiers, 2009

Mapping stream programs onto heterogeneous multiprocessor systems.

[BibT_eX]

[DOI]

Paul M. Carpenter

Dionisios N. Pnevmatikatos

Eduard Ayguadé

Proceedings of the 2009 International Conference on Compilers, 2009

2008

Multicore Resource Management.

[BibT_eX]

[DOI]

IEEE Micro, 2008

Preliminary Analysis of the Cell BE Processor Limitations for Sequence Alignment Applications.

[BibT_eX]

[DOI]

Proceedings of the Embedded Computer Systems: Architectures, 2008

Analysis of video filtering on the cell processor.

[BibT_eX]

[DOI]

Proceedings of the International Symposium on Circuits and Systems (ISCAS 2008), 2008

MFLUSH: Handling Long-Latency Loads in SMT On-Chip Multiprocessors.

[BibT_eX]

[DOI]

Proceedings of the 2008 International Conference on Parallel Processing, 2008

2007

High-Performance Embedded Architecture and Compilation Roadmap.

[BibT_eX]

[DOI]

Michael F. P. O'Boyle

Trans. High Perform. Embed. Archit. Compil., 2007

Enlarging Instruction Streams.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2007

Explaining Dynamic Cache Partitioning Speed Ups.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2007

Online Prediction of Applications Cache Utility.

[BibT_eX]

[DOI]

Proceedings of the 2007 International Conference on Embedded Computer Systems: Architectures, 2007

On the Problem of Minimizing Workload Execution Time in SMT Processors.

[BibT_eX]

[DOI]

Enrique Fernández

Rizos Sakellariou

Proceedings of the 2007 International Conference on Embedded Computer Systems: Architectures, 2007

A Streaming Machine Description and Programming Model.

[BibT_eX]

[DOI]

Proceedings of the Embedded Computer Systems: Architectures, 2007

Performance Analysis of Cell Broadband Engine for High Memory Bandwidth Applications.

[BibT_eX]

[DOI]

Daniel Jiménez-González

Xavier Martorell

Proceedings of the 2007 IEEE International Symposium on Performance Analysis of Systems and Software, 2007

Performance Impact of Unaligned Memory Operations in SIMD Extensions for Video Codec Applications.

[BibT_eX]

[DOI]

Proceedings of the 2007 IEEE International Symposium on Performance Analysis of Systems and Software, 2007

HD-VideoBench. A Benchmark for Evaluating High Definition Digital Video Applications.

[BibT_eX]

[DOI]

Proceedings of the IEEE 10th International Symposium on Workload Characterization, 2007

MLP-Aware Dynamic Cache Partitioning.

[BibT_eX]

[DOI]

Proceedings of the 16th International Conference on Parallel Architectures and Compilation Techniques (PACT 2007), 2007

2006

Predictable Performance in SMT Processors: Synergy between the OS and SMTs.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2006

Performance Analysis of Sequence Alignment Applications.

[BibT_eX]

[DOI]

Proceedings of the 2006 IEEE International Symposium on Workload Characterization, 2006

Branch predictor guided instruction decoding.

[BibT_eX]

[DOI]

Proceedings of the 15th International Conference on Parallel Architectures and Compilation Techniques (PACT 2006), 2006

2005

Software Trace Cache.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2005

Better Branch Prediction Through Prophet/Critic Hybrids.

[BibT_eX]

[DOI]

IEEE Micro, 2005

On the Scalability of 1- and 2-Dimensional SIMD Extensions for Multimedia Applications.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005

Multiple Stream Prediction.

[BibT_eX]

[DOI]

Proceedings of the High-Performance Computing - 6th International Symposium, 2005

Effective Instruction Prefetching via Fetch Prestaging.

[BibT_eX]

[DOI]

Ayose Falcón

Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

A Complexity-Effective Simultaneous Multithreading Architecture.

[BibT_eX]

[DOI]

Proceedings of the 34th International Conference on Parallel Processing (ICPP 2005), 2005

Architectural support for real-time task scheduling in SMT processors.

[BibT_eX]

[DOI]

Proceedings of the 2005 International Conference on Compilers, 2005

2004

A low-complexity fetch architecture for high-performance superscalar processors.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2004

QoS for High-Performance SMT Processors in Embedded Systems.

[BibT_eX]

[DOI]

Rizos Sakellariou

Enrique Fernández

IEEE Micro, 2004

A latency-conscious SMT branch prediction architecture.

[BibT_eX]

[DOI]

Int. J. High Perform. Comput. Netw., 2004

Optimising long-latency-load-aware fetch policies for SMT processors.

[BibT_eX]

[DOI]

Int. J. High Perform. Comput. Netw., 2004

Dynamically Controlled Resource Allocation in SMT Processors.

[BibT_eX]

[DOI]

Proceedings of the 37th Annual International Symposium on Microarchitecture (MICRO-37 2004), 2004

Prophet/Critic Hybrid Branch Prediction.

[BibT_eX]

[DOI]

Proceedings of the 31st International Symposium on Computer Architecture (ISCA 2004), 2004

DCache Warn: An I-Fetch Policy to Increase SMT Efficiency.

[BibT_eX]

[DOI]

Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), 2004

A Low-Complexity, High-Performance Fetch Unit for Simultaneous Multithreading Processors.

[BibT_eX]

[DOI]

Ayose Falcón

Proceedings of the 10th International Conference on High-Performance Computer Architecture (HPCA-10 2004), 2004

Enabling SMT for real-time embedded systems.

[BibT_eX]

[DOI]

Proceedings of the 2004 12th European Signal Processing Conference, 2004

Feasibility of QoS for SMT.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2004 Parallel Processing, 2004

Implicit vs. Explicit Resource Allocation in SMT Processors.

[BibT_eX]

[DOI]

Proceedings of the 2004 Euromicro Symposium on Digital Systems Design (DSD 2004), Architectures, Methods and Tools, 31 August, 2004

Predictable performance in SMT processors.

[BibT_eX]

[DOI]

Proceedings of the First Conference on Computing Frontiers, 2004

Reducing Fetch Architecture Complexity Using Procedure Inlining.

[BibT_eX]

[DOI]

Proceedings of the 8th Annual Workshop on Interaction between Compilers and Computer Architecture (INTERACT-8 2004), 2004

2003

Tolerating Branch Predictor Latency on SMT.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing, 5th International Symposium, 2003

Improving Memory Latency Aware Fetch Policies for SMT Processors.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing, 5th International Symposium, 2003

2002

High performance instruction fetch using software and hardware co-design.

[BibT_eX]

[DOI]

PhD thesis, 2002

Software Trace Cache for Commercial Applications.

[BibT_eX]

[DOI]

Carlos Navarro

Josep Torrellas

Int. J. Parallel Program., 2002

Fetching instruction streams.

[BibT_eX]

[DOI]

Proceedings of the 35th Annual International Symposium on Microarchitecture, 2002

A Comprehensive Analysis of Indirect Branch Prediction.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing, 4th International Symposium, 2002

Studying New Ways for Improving Adaptive History Length Branch Predictors.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing, 4th International Symposium, 2002

A Comparative Study of Redundancy in Trace Caches (Research Note).

[BibT_eX]

[DOI]

Hans Vandierendonck

Koenraad De Bosschere

Proceedings of the Euro-Par 2002, 2002

2001

Instruction fetch architectures and code layout optimizations.

[BibT_eX]

[DOI]

Proc. IEEE, 2001

Code layout optimizations for transaction processing workloads.

[BibT_eX]

[DOI]

P. Geoffrey Lowney

Proceedings of the 28th Annual International Symposium on Computer Architecture, 2001

Branch Prediction Using Profile Data.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2001: Parallel Processing, 2001

2000

Trace Cache Redundancy: Red & Blue Traces.

[BibT_eX]

[DOI]

Proceedings of the Sixth International Symposium on High-Performance Computer Architecture, 2000

On the Performance of Fetch Engines Running DSS Workloads.

[BibT_eX]

[DOI]

Carlos Navarro

Proceedings of the Euro-Par 2000, Parallel Processing, 6th International Euro-Par Conference, Munich, Germany, August 29, 2000

The Effect of Code Reordering on Branch Prediction.

[BibT_eX]

[DOI]

Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques (PACT'00), 2000

1999

Software trace cache.

[BibT_eX]

[DOI]

Carlos Navarro

Josep Torrellas

Proceedings of the 13th international conference on Supercomputing, 1999

Optimization of Instruction Fetch for Decision Support Workloads.

[BibT_eX]

[DOI]