Alberto Ros

Orcid: 0000-0001-5757-1064

Affiliations:
  • University of Murcia, Computer Engineering Department, Spain


According to our database1, Alberto Ros authored at least 106 papers between 2005 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Wrong-Path-Aware Entangling Instruction Prefetcher.
IEEE Trans. Computers, February, 2024

On the interactions between ILP and TLP with hardware transactional memory.
Microprocess. Microsystems, 2024

Effective Context-Sensitive Memory Dependence Prediction.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2024

2023
Speculative inter-thread store-to-load forwarding in SMT architectures.
J. Parallel Distributed Comput., March, 2023

Fine-grain data classification to filter token coherence traffic.
J. Parallel Distributed Comput., January, 2023

MBPlib: Modular Branch Prediction Library.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2023

Rebasing Microarchitectural Research with Industry Traces.
Proceedings of the IEEE International Symposium on Workload Characterization, 2023

CELLO: Compiler-Assisted Efficient Load-Load Ordering in Data-Race-Free Regions.
Proceedings of the 32nd International Conference on Parallel Architectures and Compilation Techniques, 2023

2022
DeTraS: Delaying Stores for Friendly-Fire Mitigation in Hardware Transactional Memory.
IEEE Trans. Parallel Distributed Syst., 2022

Compiler-Assisted Compaction/Restoration of SIMD Instructions.
IEEE Trans. Parallel Distributed Syst., 2022

Analysing software prefetching opportunities in hardware transactional memory.
J. Supercomput., 2022

Analysis of the Interactions Between ILP and TLP With Hardware Transactional Memory.
Proceedings of the 30th Euromicro International Conference on Parallel, 2022

Exploring Instruction Fusion Opportunities in General Purpose Processors.
Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022

Berti: an Accurate Local-Delta Data Prefetcher.
Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022

Free atomics: hardware atomic operations without fences.
Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022

Splash-4: A Modern Benchmark Suite with Lock-Free Constructs.
Proceedings of the IEEE International Symposium on Workload Characterization, 2022

Composite Instruction Prefetching.
Proceedings of the IEEE 40th International Conference on Computer Design, 2022

2021
Efficient classification of private memory blocks.
J. Parallel Distributed Comput., 2021

On Value Recomputation to Accelerate Invisible Speculation.
CoRR, 2021

Do Not Predict - Recompute! How Value Recomputation Can Truly Boost the Performance of Invisible Speculation.
Proceedings of the 2021 International Symposium on Secure and Private Execution Environment Design (SEED), 2021

Efficient, Distributed, and Non-Speculative Multi-Address Atomic Operations.
Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

ITSLF: Inter-Thread Store-to-Load Forwardingin Simultaneous Multithreading.
Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

Splash-4: Improving Scalability with Lock-Free Constructs.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2021

A Cost-Effective Entangling Prefetcher for Instructions.
Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021

TSOPER: Efficient Coherence-Based Strict Persistency.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2021

2020
Concurrent Irrevocability in Best-Effort Hardware Transactional Memory.
IEEE Trans. Parallel Distributed Syst., 2020

Understanding Selective Delay as a Method for Efficient Secure Speculative Execution.
IEEE Trans. Computers, 2020

PfTouch: Concurrent page-fault handling for Intel restricted transactional memory.
J. Parallel Distributed Comput., 2020

The Entangling Instruction Prefetcher.
IEEE Comput. Archit. Lett., 2020

TLB-based Block-Grain Classification of Private Data.
Proceedings of the 28th Euromicro International Conference on Parallel, 2020

Speculative Enforcement of Store Atomicity.
Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020

Boosting Store Buffer Efficiency with Store-Prefetch Bursts.
Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020

Clearing the Shadows: Recovering Lost Performance for Invisible Speculative Execution through HW/SW Co-Design.
Proceedings of the PACT '20: International Conference on Parallel Architectures and Compilation Techniques, 2020

Regional Out-of-Order Writes in Total Store Order.
Proceedings of the PACT '20: International Conference on Parallel Architectures and Compilation Techniques, 2020

2019
Way Combination for an Adaptive and Scalable Coherence Directory.
IEEE Trans. Parallel Distributed Syst., 2019

Efficient invisible speculative execution through selective delay and value prediction.
Proceedings of the 46th International Symposium on Computer Architecture, 2019

Filter caching for free: the untapped potential of the store-buffer.
Proceedings of the 46th International Symposium on Computer Architecture, 2019

Ghost loads: what is the cost of invisible speculation?
Proceedings of the 16th ACM International Conference on Computing Frontiers, 2019

2018
Automatic Detection of Large Extended Data-Race-Free Regions with Conflict Isolation.
IEEE Trans. Parallel Distributed Syst., 2018

TokenTLB+CUP: A Token-Based Page Classification with Cooperative Usage Prediction.
IEEE Trans. Parallel Distributed Syst., 2018

Non-Speculative Load Reordering in Total Store Ordering.
IEEE Micro, 2018

Mending Fences with Self-Invalidation and Self-Downgrade.
Log. Methods Comput. Sci., 2018

Photonic-based express coherence notifications for many-core CMPs.
J. Parallel Distributed Comput., 2018

The Superfluous Load Queue.
Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018

Non-Speculative Store Coalescing in Total Store Order.
Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture, 2018

2017
Efficient Self-Invalidation/Self-Downgrade for Critical Sections with Relaxed Semantics.
IEEE Trans. Parallel Distributed Syst., 2017

TLB-Based Temporality-Aware Classification in CMPs with Multilevel TLBs.
IEEE Trans. Parallel Distributed Syst., 2017

To be silent or not: on the impact of evictions of clean data in cache-coherent multicores.
J. Supercomput., 2017

The Tag Filter Architecture: An energy-efficient cache and directory design.
J. Parallel Distributed Comput., 2017

A dedicated private-shared cache design for scalable multiprocessors.
Concurr. Comput. Pract. Exp., 2017

Non-Speculative Load-Load Reordering in TSO.
Proceedings of the 44th Annual International Symposium on Computer Architecture, 2017

Way-combining directory: an adaptive and scalable low-cost coherence directory.
Proceedings of the International Conference on Supercomputing, 2017

Automatic detection of extended data-race-free regions.
Proceedings of the 2017 International Symposium on Code Generation and Optimization, 2017

2016
A Hybrid Static-Dynamic Classification for Dual-Consistency Cache Coherence.
IEEE Trans. Parallel Distributed Syst., 2016

Efficient TLB-Based Detection of Private Pages in Chip Multiprocessors.
IEEE Trans. Parallel Distributed Syst., 2016

Are distributed sharing codes a solution to the scalability problem of coherence directories in manycores? An evaluation study.
J. Supercomput., 2016

Building Heterogeneous Unified Virtual Memories (UVMs) without the Overhead.
ACM Trans. Archit. Code Optim., 2016

Racer: TSO consistency via race detection.
Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016

Splash-3: A properly synchronized benchmark suite for contemporary research.
Proceedings of the 2016 IEEE International Symposium on Performance Analysis of Systems and Software, 2016

TokenTLB: A Token-Based Page Classification Approach.
Proceedings of the 2016 International Conference on Supercomputing, 2016

A Directory Cache with Dynamic Private-Shared Partitioning.
Proceedings of the 23rd IEEE International Conference on High Performance Computing, 2016

Fencing Programs with Self-Invalidation and Self-Downgrade.
Proceedings of the Formal Techniques for Distributed Objects, Components, and Systems, 2016

Optimization of a Linked Cache Coherence Protocol for Scalable Manycore Coherence.
Proceedings of the Architecture of Computing Systems - ARCS 2016, 2016

POSTER: Efficient Self-Invalidation/Self-Downgrade for Critical Sections with Relaxed Semantics.
Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

2015
PS directory: a scalable multilevel directory cache for CMPs.
J. Supercomput., 2015

PS-Cache: an energy-efficient cache design for chip multiprocessors.
J. Supercomput., 2015

DASC-DIR: a low-overhead coherence directory for many-core processors.
J. Supercomput., 2015

Adaptive Selection of Cache Indexing Bits for Removing Conflict Misses.
IEEE Trans. Computers, 2015

The Effects of Granularity and Adaptivity on Private/Shared Classification for Coherence.
ACM Trans. Archit. Code Optim., 2015

The Tag Filter Cache: An Energy-Efficient Approach.
Proceedings of the 23rd Euromicro International Conference on Parallel, 2015

Callback: efficient synchronization without invalidation with a directory just for spin-waiting.
Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

A Dual-Consistency Cache Coherence Protocol.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

Turning Centralized Coherence and Distributed Critical-Section Execution on their Head: A New Approach for Scalable Distributed Shared Memory.
Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing, 2015

Hierarchical private/shared classification: The key to simple and efficient coherence for clustered cache hierarchies.
Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015

Fast&Furious: A Tool for Detecting Covert Racing.
Proceedings of the 6th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures and the 4th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms, 2015

Early Experiences with Separate Caches for Private and Shared Data.
Proceedings of the 11th IEEE International Conference on e-Science, 2015

An Efficient, Self-Contained, On-chip Directory: DIR1-SISD.
Proceedings of the 2015 International Conference on Parallel Architectures and Compilation, 2015

2014
Characterization of a List-Based Directory Cache Coherence Protocol for Manycore CMPs.
Proceedings of the Euro-Par 2014: Parallel Processing Workshops, 2014

2013
Increasing the Effectiveness of Directory Caches by Avoiding the Tracking of Noncoherent Memory Blocks.
IEEE Trans. Computers, 2013

ECONO: Express coherence notifications for efficient cache coherency in many-core CMPs.
Proceedings of the 2013 International Conference on Embedded Computer Systems: Architectures, 2013

A new perspective for efficient virtual-cache coherence.
Proceedings of the 40th Annual International Symposium on Computer Architecture, 2013

Temporal-Aware Mechanism to Detect Private Data in Chip Multiprocessors.
Proceedings of the 42nd International Conference on Parallel Processing, 2013

Efficient Dir0B Cache Coherency for Many-Core CMPs.
Proceedings of the International Conference on Computational Science, 2013

2012
Extending Magny-Cours Cache Coherence.
IEEE Trans. Computers, 2012

DAPSCO: Distance-aware partially shared cache organization.
ACM Trans. Archit. Code Optim., 2012

Efficient, snoopless, System-on-Chip coherence.
Proceedings of the IEEE 25th International SOC Conference, 2012

Using Heterogeneous Networks to Improve Energy Efficiency in Direct Coherence Protocols for Many-Core CMPs.
Proceedings of the IEEE 24th International Symposium on Computer Architecture and High Performance Computing, 2012

Cache Miss Characterization in Hierarchical Large-Scale Cache-Coherent Systems.
Proceedings of the 10th IEEE International Symposium on Parallel and Distributed Processing with Applications, 2012

ASCIB: adaptive selection of cache indexing bits for removing conflict misses.
Proceedings of the International Symposium on Low Power Electronics and Design, 2012

PS-Dir: a scalable two-level directory cache.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

Complexity-effective multicore coherence.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

2011
Increasing the effectiveness of directory caches by deactivating coherence for private memory blocks.
Proceedings of the 38th International Symposium on Computer Architecture (ISCA 2011), 2011

Self-related traces: An alternative to full-system simulation for NoCs.
Proceedings of the 2011 International Conference on High Performance Computing & Simulation, 2011

Energy-Efficient Cache Coherence Protocols in Chip-Multiprocessors for Server Consolidation.
Proceedings of the International Conference on Parallel Processing, 2011

2010
A Direct Coherence Protocol for Many-Core Chip Multiprocessors.
IEEE Trans. Parallel Distributed Syst., 2010

A scalable organization for distributed directories.
J. Syst. Archit., 2010

EMC<sup>2</sup>: Extending Magny-Cours coherence for large-scale servers.
Proceedings of the 2010 International Conference on High Performance Computing, 2010

Evaluation of Low-Overhead Organizations for the Directory in Future Many-Core CMPs.
Proceedings of the Euro-Par 2010 Parallel Processing Workshops, 2010

2009
Distance-aware round-robin mapping for large NUCA caches.
Proceedings of the 16th International Conference on High Performance Computing, 2009

Dealing with Traffic-Area Trade-Off in Direct Coherence Protocols for Many-Core CMPs.
Proceedings of the Advanced Parallel Processing Technologies, 8th International Symposium, 2009

2008
Two proposals for the inclusion of directory information in the last-level private caches of glueless shared-memory multiprocessors.
J. Parallel Distributed Comput., 2008

DiCo-CMP: Efficient cache coherency in tiled CMP architectures.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Scalable Directory Organization for Tiled CMP Architectures.
Proceedings of the 2008 International Conference on Computer Design, 2008

2007
Direct Coherence: Bringing Together Performance and Scalability in Shared-Memory Multiprocessors.
Proceedings of the High Performance Computing, 2007

2006
An efficient cache design for scalable glueless shared-memory multiprocessors.
Proceedings of the Third Conference on Computing Frontiers, 2006

2005
A Novel Lightweight Directory Architecture for Scalable Shared-Memory Multiprocessors.
Proceedings of the Euro-Par 2005, Parallel Processing, 11th International Euro-Par Conference, Lisbon, Portugal, August 30, 2005


  Loading...