Salvador Petit

Orcid: 0000-0003-2426-4134

Affiliations:
  • Polytechnic University of Valencia, Spain


According to our database1, Salvador Petit authored at least 107 papers between 2000 and 2023.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2023
Main memory controller with multiple media technologies for big data workloads.
J. Big Data, 2023

Cloud White: Detecting and Estimating QoS Degradation of Latency-Critical Workloads in the Public Cloud.
Future Gener. Comput. Syst., 2023

SYNPA: SMT Performance Analysis and Allocation of Threads to Cores in ARM Processors.
CoRR, 2023

Stratus: A Hardware/Software Infrastructure for Controlled Cloud Research.
Proceedings of the 31st Euromicro International Conference on Parallel, 2023

Thread-to-Core Allocation in ARM Processors Building Synergistic Pairs.
Proceedings of the 32nd International Conference on Parallel Architectures and Compilation Techniques, 2023

2022
DeepP: Deep Learning Multi-Program Prefetch Configuration for the IBM POWER 8.
IEEE Trans. Computers, 2022

VMT: Virtualized Multi-Threading for Accelerating Graph Workloads on Commodity Processors.
IEEE Trans. Computers, 2022

Effect of Hyper-Threading in Latency-Critical Multithreaded Cloud Applications and Utilization Analysis of the Major System Resources.
Future Gener. Comput. Syst., 2022

A Neural Network to Estimate Isolated Performance from Multi-Program Execution.
Proceedings of the 30th Euromicro International Conference on Parallel, 2022

Fast-track cache: a huge racetrack memory L1 data cache.
Proceedings of the ICS '22: 2022 International Conference on Supercomputing, Virtual Event, June 28, 2022

Cache-Poll: Containing Pollution in Non-Inclusive Caches Through Cache Partitioning.
Proceedings of the 51st International Conference on Parallel Processing, 2022

2021
Segment Switching: A New Switching Strategy for Optical HPC Networks.
IEEE Access, 2021

2020
Phase-Aware Cache Partitioning to Target Both Turnaround Time and System Performance.
IEEE Trans. Parallel Distributed Syst., 2020

Bandwidth-Aware Dynamic Prefetch Configuration for IBM POWER8.
IEEE Trans. Parallel Distributed Syst., 2020

Thread Isolation to Improve Symbiotic Scheduling on SMT Multicore Processors.
IEEE Trans. Parallel Distributed Syst., 2020

An efficient cache flat storage organization for multithreaded workloads for low power processors.
Future Gener. Comput. Syst., 2020

Understanding Cloud Workloads Performance in a Production like Environment.
CoRR, 2020

Impact of the Array Shape and Memory Bandwidth on the Execution Time of CNN Systolic Arrays.
Proceedings of the 23rd Euromicro Conference on Digital System Design, 2020

2019
Way Combination for an Adaptive and Scalable Coherence Directory.
IEEE Trans. Parallel Distributed Syst., 2019

FOS: a low-power cache organization for multicores.
J. Supercomput., 2019

An Aging-Aware GPU Register File Design Based on Data Redundancy.
IEEE Trans. Computers, 2019

Efficient Management of Cache Accesses to Boost GPGPU Memory Subsystem Performance.
IEEE Trans. Computers, 2019

Modeling and analysis of the performance of exascale photonic networks.
Concurr. Comput. Pract. Exp., 2019

2018
Designing lab sessions focusing on real processors for computer architecture courses: A practical perspective.
J. Parallel Distributed Comput., 2018

Accurately modeling the on-chip and off-chip GPU memory subsystem.
Future Gener. Comput. Syst., 2018

A Workload Generator for Evaluating SMT Real-Time Systems.
Proceedings of the 2018 International Conference on High Performance Computing & Simulation, 2018

Workload Characterization for Exascale Computing Networks.
Proceedings of the 2018 International Conference on High Performance Computing & Simulation, 2018

Improving System Turnaround Time with Intel CAT by Identifying LLC Critical Applications.
Proceedings of the Euro-Par 2018: Parallel Processing, 2018

Improving GPU Cache Hierarchy Performance with a Fetch and Replacement Cache.
Proceedings of the Euro-Par 2018: Parallel Processing, 2018

2017
On Microarchitectural Mechanisms for Cache Wearout Reduction.
IEEE Trans. Very Large Scale Integr. Syst., 2017

A Hardware Approach to Fairly Balance the Inter-Thread Interference in Shared Caches.
IEEE Trans. Parallel Distributed Syst., 2017

Improving IBM POWER8 Performance Through Symbiotic Job Scheduling.
IEEE Trans. Parallel Distributed Syst., 2017

Perf&Fair: A Progress-Aware Scheduler to Enhance Performance and Fairness in SMT Multicores.
IEEE Trans. Computers, 2017

A research-oriented course on Advanced Multicore Architecture: Contents and active learning methodologies.
J. Parallel Distributed Comput., 2017

Exploiting Data Compression to Mitigate Aging in GPU Register Files.
Proceedings of the 29th International Symposium on Computer Architecture and High Performance Computing, 2017

Modeling a Photonic Network for Exascale Computing.
Proceedings of the 2017 International Conference on High Performance Computing & Simulation, 2017

Application Clustering Policies to Address System Fairness with Intel's Cache Allocation Technology.
Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques, 2017

2016
Bandwidth-Aware On-Line Scheduling in SMT Multicores.
IEEE Trans. Computers, 2016

A dynamic execution time estimation model to save energy in heterogeneous multicores running periodic tasks.
Future Gener. Comput. Syst., 2016

Enhancing the L1 Data Cache Design to Mitigate HCI.
IEEE Comput. Archit. Lett., 2016

Impact of Memory-Level Parallelism on the Performance of GPU Coherence Protocols.
Proceedings of the 24th Euromicro International Conference on Parallel, 2016

Accurately modeling a photonic NoC in a detailed CMP simulation framework.
Proceedings of the International Conference on High Performance Computing & Simulation, 2016

Symbiotic job scheduling on the IBM POWER8.
Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture, 2016

Student Research Poster: A Low Complexity Cache Sharing Mechanism to Address System Fairness.
Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

2015
Design of Hybrid Second-Level Caches.
IEEE Trans. Computers, 2015

A reuse-based refresh policy for energy-aware eDRAM caches.
Microprocess. Microsystems, 2015

A Research-Oriented Course on Advanced Multicore Architecture.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, 2015

Addressing Fairness in SMT Multicores with a Progress-Aware Scheduler.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

Current challenges in simulations of HPC systems.
Proceedings of the 2015 International Conference on High Performance Computing & Simulation, 2015

Accurately modeling the GPU memory subsystem.
Proceedings of the 2015 International Conference on High Performance Computing & Simulation, 2015

2014
Efficient Register Renaming and Recovery for High-Performance Processors.
IEEE Trans. Very Large Scale Integr. Syst., 2014

Cache-Hierarchy Contention-Aware Scheduling in CMPs.
IEEE Trans. Parallel Distributed Syst., 2014

Addressing bandwidth contention in SMT multicores through scheduling.
Proceedings of the 2014 International Conference on Supercomputing, 2014

Dynamic WCET Estimation for Real-Time Multicore Embedded Systems Supporting DVFS.
Proceedings of the 2014 IEEE International Conference on High Performance Computing and Communications, 2014

Analyzing the Optimal Voltage/Frequency Pair in Fault-Tolerant Caches.
Proceedings of the 2014 IEEE International Conference on High Performance Computing and Communications, 2014

2013
Hardware-Based Generation of Independent Subtraces of Instructions in Clustered Processors.
IEEE Trans. Computers, 2013

Power-aware scheduling with effective task migration for real-time multicore embedded systems.
Concurr. Comput. Pract. Exp., 2013

Exploiting reuse information to reduce refresh energy in on-chip eDRAM caches.
Proceedings of the International Conference on Supercomputing, 2013

Using Huge Pages and Performance Counters to Determine the LLC Architecture.
Proceedings of the International Conference on Computational Science, 2013

Combining RAM technologies for hard-error recovery in L1 data caches working at very-low power modes.
Proceedings of the Design, Automation and Test in Europe, 2013

L1-bandwidth aware thread allocation in multicore SMT processors.
Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, 2013

2012
Impact on Performance and Energy of the Retention Time and Processor Frequency in L1 Macrocell-Based Data Caches.
IEEE Trans. Very Large Scale Integr. Syst., 2012

A Sequentially Consistent Multiprocessor Architecture for Out-of-Order Retirement of Instructions.
IEEE Trans. Parallel Distributed Syst., 2012

A cost-effective heuristic to schedule local and remote memory in cluster computers.
J. Supercomput., 2012

Design, Performance, and Energy Consumption of eDRAM/SRAM Macrocells for L1 Data Caches.
IEEE Trans. Computers, 2012

Combining recency of information with selective random and a victim cache in last-level caches.
ACM Trans. Archit. Code Optim., 2012

Efficiently Handling Memory Accesses to Improve QoS in Multicore Systems under Real-Time Constraints.
Proceedings of the IEEE 24th International Symposium on Computer Architecture and High Performance Computing, 2012

Understanding Cache Hierarchy Contention in CMPs to Improve Job Scheduling.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

Page-Based Memory Allocation Policies of Local and Remote Memory in Cluster Computers.
Proceedings of the 18th IEEE International Conference on Parallel and Distributed Systems, 2012

Analyzing the optimal ratio of SRAM banks in hybrid caches.
Proceedings of the 30th International IEEE Conference on Computer Design, 2012

OMHI 2012: First International Workshop on On-chip Memory Hierarchies and Interconnects: Organization, Management and Implementation.
Proceedings of the Euro-Par 2012: Parallel Processing Workshops, 2012

2011
A New Energy-Aware Dynamic Task Set Partitioning Algorithm for Soft and Hard Embedded Real-Time Systems.
Comput. J., 2011

MRU-Tour-based Replacement Algorithms for Last-Level Caches.
Proceedings of the 23rd International Symposium on Computer Architecture and High Performance Computing, 2011

A Cluster Computer Performance Predictor for Memory Scheduling.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2011

A Dynamic Power-Aware Partitioner with Task Migration for Multicore Embedded Systems.
Proceedings of the Euro-Par 2011 Parallel Processing - 17th International Conference, 2011

Improving Last-Level Cache Performance by Exploiting the Concept of MRU-Tour.
Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011

2010
Dynamic task set partitioning based on balancing resource requirements and utilization to reduce power consumption.
Proceedings of the 2010 ACM Symposium on Applied Computing (SAC), 2010

Balancing Task Resource Requirements in Embedded Multithreaded Multicore Processors to Reduce Power Consumption.
Proceedings of the 18th Euromicro Conference on Parallel, 2010

Out-of-order retirement of instructions in sequentially consistent multiprocessors.
Proceedings of the 28th International Conference on Computer Design, 2010

Extending a Multicore Multithread Simulator to Model Power-Aware Hard Real-Time Systems.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2010

A Scheduling Heuristic to Handle Local and Remote Memory in Cluster Computers.
Proceedings of the 12th IEEE International Conference on High Performance Computing and Communications, 2010

Exploiting subtrace-level parallelism in clustered processors.
Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, 2010

2009
A Complexity-Effective Out-of-Order Retirement Microarchitecture.
IEEE Trans. Computers, 2009

Power Reduction In Advanced Embedded IPC Processors.
Intell. Autom. Soft Comput., 2009

An hybrid eDRAM/SRAM macrocell to implement first-level data caches.
Proceedings of the 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009), 2009

Dynamic task set partitioning based on balancing memory requirements to reduce power consumption.
Proceedings of the 23rd international conference on Supercomputing, 2009

A power-aware hybrid RAM-CAM renaming mechanism for fast recovery.
Proceedings of the 27th International Conference on Computer Design, 2009

Paired ROBs: A Cost-Effective Reorder Buffer Sharing Strategy for SMT Processors.
Proceedings of the Euro-Par 2009 Parallel Processing, 2009

An Efficient Low-Complexity Alternative to the ROB for Out-of-Order Retirement of Instructions.
Proceedings of the 12th Euromicro Conference on Digital System Design, 2009

2008
The impact of out-of-order commit in coarse-grain, fine-grain and simultaneous multithreaded architectures.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

A simple power-aware scheduling for multicore systems when running real-time applications.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Reducing the Number of Bits in the BTB to Attack the Branch Predictor Hot-Spot.
Proceedings of the Euro-Par 2008, 2008

2007
Spim-Cache: A Pedagogical Tool for Teaching Cache Memories Through Code-Based Exercises.
IEEE Trans. Educ., 2007

Multi2Sim: A Simulation Framework to Evaluate Multicore-Multithreaded Processors.
Proceedings of the 19th Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2007), 2007

Leakage Current Reduction in Data Caches on Embedded Systems.
Proceedings of the 2007 International Conference on Intelligent Pervasive Computing, 2007

VB-MT: Design Issues and Performance of the Validation Buffer Microarchitecture for Multithreaded Processors.
Proceedings of the 16th International Conference on Parallel Architectures and Compilation Techniques (PACT 2007), 2007

2006
Addressing a workload characterization study to the design of consistency protocols.
J. Supercomput., 2006

RACFP: a training tool to work with floating-point representation, algorithms, and circuits in undergraduate courses.
IEEE Trans. Educ., 2006

An execution-driven simulation tool for teaching cache memories in introductory computer organization courses.
Proceedings of the 2006 Workshop on Computer Architecture Education, 2006

Applying the zeros switch-off technique to reduce static energy in data caches.
Proceedings of the 18th Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2006), 2006

2005
Exploring the performance of split data cache schemes on superscalar processors and symmetric multiprocessors.
J. Syst. Archit., 2005

A Comparison Study of the HLRC-DU Protocol versus a HLRC Hardware Assisted Protocol.
Proceedings of the 13th Euromicro Workshop on Parallel, 2005

Exploiting temporal locality in drowsy cache policies.
Proceedings of the Second Conference on Computing Frontiers, 2005

2004
Characterizing the Dynamic Behavior of Workload Execution in SVM systems.
Proceedings of the 16th Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2004), 2004

2002
Characterizing Parallel Workloads to Reduce Multiple Writer Overhead in Shared Virtual Memory Systems.
Proceedings of the 10th Euromicro Workshop on Parallel, 2002

2001
About the sensitivity of the HLRC-DU protocol on diff and page sizes.
Proceedings of the 2001 IEEE International Symposium on Performance Analysis of Systems and Software, 2001

2000
LIDE: a simulation environment for shared virtual memory systems.
SIGARCH Comput. Archit. News, 2000


  Loading...