Salvador Petit

J. Parallel Distributed Comput., 2018

Accurately modeling the on-chip and off-chip GPU memory subsystem.

[BibT_eX]

[DOI]

Future Gener. Comput. Syst., 2018

A Workload Generator for Evaluating SMT Real-Time Systems.

[BibT_eX]

[DOI]

Proceedings of the 2018 International Conference on High Performance Computing & Simulation, 2018

Workload Characterization for Exascale Computing Networks.

[BibT_eX]

[DOI]

Proceedings of the 2018 International Conference on High Performance Computing & Simulation, 2018

Improving System Turnaround Time with Intel CAT by Identifying LLC Critical Applications.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2018: Parallel Processing, 2018

Improving GPU Cache Hierarchy Performance with a Fetch and Replacement Cache.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2018: Parallel Processing, 2018

2017

On Microarchitectural Mechanisms for Cache Wearout Reduction.

[BibT_eX]

[DOI]

IEEE Trans. Very Large Scale Integr. Syst., 2017

A Hardware Approach to Fairly Balance the Inter-Thread Interference in Shared Caches.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2017

Improving IBM POWER8 Performance Through Symbiotic Job Scheduling.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2017

Perf&Fair: A Progress-Aware Scheduler to Enhance Performance and Fairness in SMT Multicores.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2017

A research-oriented course on Advanced Multicore Architecture: Contents and active learning methodologies.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 2017

Exploiting Data Compression to Mitigate Aging in GPU Register Files.

[BibT_eX]

[DOI]

Proceedings of the 29th International Symposium on Computer Architecture and High Performance Computing, 2017

Modeling a Photonic Network for Exascale Computing.

[BibT_eX]

[DOI]

Proceedings of the 2017 International Conference on High Performance Computing & Simulation, 2017

Application Clustering Policies to Address System Fairness with Intel's Cache Allocation Technology.

[BibT_eX]

[DOI]

Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques, 2017

2016

Bandwidth-Aware On-Line Scheduling in SMT Multicores.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2016

A dynamic execution time estimation model to save energy in heterogeneous multicores running periodic tasks.

[BibT_eX]

[DOI]

Future Gener. Comput. Syst., 2016

Enhancing the L1 Data Cache Design to Mitigate HCI.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2016

Impact of Memory-Level Parallelism on the Performance of GPU Coherence Protocols.

[BibT_eX]

[DOI]

Proceedings of the 24th Euromicro International Conference on Parallel, 2016

Accurately modeling a photonic NoC in a detailed CMP simulation framework.

[BibT_eX]

[DOI]

Proceedings of the International Conference on High Performance Computing & Simulation, 2016

Symbiotic job scheduling on the IBM POWER8.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture, 2016

Student Research Poster: A Low Complexity Cache Sharing Mechanism to Address System Fairness.

[BibT_eX]

[DOI]

Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

2015

Design of Hybrid Second-Level Caches.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2015

A reuse-based refresh policy for energy-aware eDRAM caches.

[BibT_eX]

[DOI]

Microprocess. Microsystems, 2015

A Research-Oriented Course on Advanced Multicore Architecture.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, 2015

Addressing Fairness in SMT Multicores with a Progress-Aware Scheduler.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

Current challenges in simulations of HPC systems.

[BibT_eX]

[DOI]

Proceedings of the 2015 International Conference on High Performance Computing & Simulation, 2015

Accurately modeling the GPU memory subsystem.

[BibT_eX]

[DOI]

Proceedings of the 2015 International Conference on High Performance Computing & Simulation, 2015

2014

Efficient Register Renaming and Recovery for High-Performance Processors.

[BibT_eX]

[DOI]

IEEE Trans. Very Large Scale Integr. Syst., 2014

Cache-Hierarchy Contention-Aware Scheduling in CMPs.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2014

Addressing bandwidth contention in SMT multicores through scheduling.

[BibT_eX]

[DOI]

Proceedings of the 2014 International Conference on Supercomputing, 2014

Dynamic WCET Estimation for Real-Time Multicore Embedded Systems Supporting DVFS.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE International Conference on High Performance Computing and Communications, 2014

Analyzing the Optimal Voltage/Frequency Pair in Fault-Tolerant Caches.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE International Conference on High Performance Computing and Communications, 2014

2013

Hardware-Based Generation of Independent Subtraces of Instructions in Clustered Processors.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2013

Power-aware scheduling with effective task migration for real-time multicore embedded systems.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2013

Exploiting reuse information to reduce refresh energy in on-chip eDRAM caches.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Supercomputing, 2013

Using Huge Pages and Performance Counters to Determine the LLC Architecture.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Computational Science, 2013

Combining RAM technologies for hard-error recovery in L1 data caches working at very-low power modes.

[BibT_eX]

[DOI]

Proceedings of the Design, Automation and Test in Europe, 2013

L1-bandwidth aware thread allocation in multicore SMT processors.

[BibT_eX]

[DOI]

Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, 2013

2012

Impact on Performance and Energy of the Retention Time and Processor Frequency in L1 Macrocell-Based Data Caches.

[BibT_eX]

[DOI]

IEEE Trans. Very Large Scale Integr. Syst., 2012

A Sequentially Consistent Multiprocessor Architecture for Out-of-Order Retirement of Instructions.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2012

A cost-effective heuristic to schedule local and remote memory in cluster computers.

[BibT_eX]

[DOI]

J. Supercomput., 2012

Design, Performance, and Energy Consumption of eDRAM/SRAM Macrocells for L1 Data Caches.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2012

Combining recency of information with selective random and a victim cache in last-level caches.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2012

Efficiently Handling Memory Accesses to Improve QoS in Multicore Systems under Real-Time Constraints.

[BibT_eX]

[DOI]

Proceedings of the IEEE 24th International Symposium on Computer Architecture and High Performance Computing, 2012

Understanding Cache Hierarchy Contention in CMPs to Improve Job Scheduling.

[BibT_eX]

[DOI]

Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

Page-Based Memory Allocation Policies of Local and Remote Memory in Cluster Computers.

[BibT_eX]

[DOI]

Proceedings of the 18th IEEE International Conference on Parallel and Distributed Systems, 2012

Analyzing the optimal ratio of SRAM banks in hybrid caches.

[BibT_eX]

[DOI]

Proceedings of the 30th International IEEE Conference on Computer Design, 2012

OMHI 2012: First International Workshop on On-chip Memory Hierarchies and Interconnects: Organization, Management and Implementation.

[BibT_eX]

[DOI]

María Engracia Gómez

Proceedings of the Euro-Par 2012: Parallel Processing Workshops, 2012

2011

A New Energy-Aware Dynamic Task Set Partitioning Algorithm for Soft and Hard Embedded Real-Time Systems.

[BibT_eX]

[DOI]

Comput. J., 2011

MRU-Tour-based Replacement Algorithms for Last-Level Caches.

[BibT_eX]

[DOI]

Proceedings of the 23rd International Symposium on Computer Architecture and High Performance Computing, 2011

A Cluster Computer Performance Predictor for Memory Scheduling.

[BibT_eX]

[DOI]

Proceedings of the Algorithms and Architectures for Parallel Processing, 2011

A Dynamic Power-Aware Partitioner with Task Migration for Multicore Embedded Systems.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2011 Parallel Processing - 17th International Conference, 2011

Improving Last-Level Cache Performance by Exploiting the Concept of MRU-Tour.

[BibT_eX]

[DOI]

Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011

2010

Dynamic task set partitioning based on balancing resource requirements and utilization to reduce power consumption.

[BibT_eX]

[DOI]

Proceedings of the 2010 ACM Symposium on Applied Computing (SAC), 2010

Balancing Task Resource Requirements in Embedded Multithreaded Multicore Processors to Reduce Power Consumption.

[BibT_eX]

[DOI]

Diana Bautista Rayo

Julio Sahuquillo Borrás

Houcine Hassan Mohamed

Pedro Juan López Rodríguez

José Duato

Proceedings of the 18th Euromicro Conference on Parallel, 2010

Out-of-order retirement of instructions in sequentially consistent multiprocessors.

[BibT_eX]

[DOI]

Proceedings of the 28th International Conference on Computer Design, 2010

Extending a Multicore Multithread Simulator to Model Power-Aware Hard Real-Time Systems.

[BibT_eX]

[DOI]

Proceedings of the Algorithms and Architectures for Parallel Processing, 2010

A Scheduling Heuristic to Handle Local and Remote Memory in Cluster Computers.

[BibT_eX]

[DOI]

Proceedings of the 12th IEEE International Conference on High Performance Computing and Communications, 2010

Exploiting subtrace-level parallelism in clustered processors.

[BibT_eX]

[DOI]

Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, 2010

2009

A Complexity-Effective Out-of-Order Retirement Microarchitecture.

[BibT_eX]

[DOI]

Salvador Petit Marti

Julio Sahuquillo Borrás

Rafael Ubal Tena

José Duato Marín

IEEE Trans. Computers, 2009

Power Reduction In Advanced Embedded IPC Processors.

[BibT_eX]

[DOI]

Intell. Autom. Soft Comput., 2009

An hybrid eDRAM/SRAM macrocell to implement first-level data caches.

[BibT_eX]

[DOI]

Proceedings of the 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009), 2009

Dynamic task set partitioning based on balancing memory requirements to reduce power consumption.

[BibT_eX]

[DOI]

Proceedings of the 23rd international conference on Supercomputing, 2009

A power-aware hybrid RAM-CAM renaming mechanism for fast recovery.

[BibT_eX]

[DOI]

Proceedings of the 27th International Conference on Computer Design, 2009

Paired ROBs: A Cost-Effective Reorder Buffer Sharing Strategy for SMT Processors.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2009 Parallel Processing, 2009

An Efficient Low-Complexity Alternative to the ROB for Out-of-Order Retirement of Instructions.

[BibT_eX]

[DOI]

Proceedings of the 12th Euromicro Conference on Digital System Design, 2009

2008

The impact of out-of-order commit in coarse-grain, fine-grain and simultaneous multithreaded architectures.

[BibT_eX]

[DOI]

Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

A simple power-aware scheduling for multicore systems when running real-time applications.

[BibT_eX]

[DOI]

Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Reducing the Number of Bits in the BTB to Attack the Branch Predictor Hot-Spot.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2008, 2008

2007

Spim-Cache: A Pedagogical Tool for Teaching Cache Memories Through Code-Based Exercises.

[BibT_eX]

[DOI]

IEEE Trans. Educ., 2007

Multi2Sim: A Simulation Framework to Evaluate Multicore-Multithreaded Processors.

[BibT_eX]

[DOI]

Proceedings of the 19th Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2007), 2007

Leakage Current Reduction in Data Caches on Embedded Systems.

[BibT_eX]

[DOI]

Proceedings of the 2007 International Conference on Intelligent Pervasive Computing, 2007

VB-MT: Design Issues and Performance of the Validation Buffer Microarchitecture for Multithreaded Processors.

[BibT_eX]

[DOI]

Proceedings of the 16th International Conference on Parallel Architectures and Compilation Techniques (PACT 2007), 2007

2006

Addressing a workload characterization study to the design of consistency protocols.

[BibT_eX]

[DOI]

J. Supercomput., 2006

RACFP: a training tool to work with floating-point representation, algorithms, and circuits in undergraduate courses.

[BibT_eX]

[DOI]

IEEE Trans. Educ., 2006

An execution-driven simulation tool for teaching cache memories in introductory computer organization courses.

[BibT_eX]

[DOI]

Proceedings of the 2006 Workshop on Computer Architecture Education, 2006

Applying the zeros switch-off technique to reduce static energy in data caches.

[BibT_eX]

[DOI]

Proceedings of the 18th Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2006), 2006

2005

Exploring the performance of split data cache schemes on superscalar processors and symmetric multiprocessors.

[BibT_eX]

[DOI]

Veljko M. Milutinovic

J. Syst. Archit., 2005

A Comparison Study of the HLRC-DU Protocol versus a HLRC Hardware Assisted Protocol.

[BibT_eX]

[DOI]

Proceedings of the 13th Euromicro Workshop on Parallel, 2005

Exploiting temporal locality in drowsy cache policies.

[BibT_eX]

[DOI]

Proceedings of the Second Conference on Computing Frontiers, 2005

2004

Characterizing the Dynamic Behavior of Workload Execution in SVM systems.

[BibT_eX]

[DOI]

Proceedings of the 16th Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2004), 2004

2002

Characterizing Parallel Workloads to Reduce Multiple Writer Overhead in Shared Virtual Memory Systems.

[BibT_eX]

[DOI]

Proceedings of the 10th Euromicro Workshop on Parallel, 2002

2001

About the sensitivity of the HLRC-DU protocol on diff and page sizes.

[BibT_eX]

[DOI]