Rafael Ubal

According to our database1, Rafael Ubal authored at least 28 papers between 2006 and 2019.

Collaborative distances:
  • Dijkstra number2 of five.
  • Erdős number3 of four.



In proceedings 
PhD thesis 


On csauthors.net:


MGPUSim: enabling multi-GPU performance modeling and optimization.
Proceedings of the 46th International Symposium on Computer Architecture, 2019

MGSim + MGMark: A Framework for Multi-GPU System Research.
CoRR, 2018

Multi2Sim Kepler: A detailed architectural GPU simulator.
Proceedings of the 2017 IEEE International Symposium on Performance Analysis of Systems and Software, 2017

Hardware Support for Scratchpad Memory Transactions on GPU Architectures.
Proceedings of the Euro-Par 2017: Parallel Processing - 23rd International Conference on Parallel and Distributed Computing, Santiago de Compostela, Spain, August 28, 2017

TwinKernels: an execution model to improve GPU hardware scheduling at compile time.
Proceedings of the 2017 International Symposium on Code Generation and Optimization, 2017

UMH: A Hardware-Based Unified Memory Hierarchy for Systems with Multiple Discrete GPUs.
ACM Trans. Archit. Code Optim., 2016

Visualization of OpenCL application execution on CPU-GPU systems.
Proceedings of the Workshop on Computer Architecture Education, 2015

A framework for visualization of OpenCL applications execution: a tutorial.
Proceedings of the 3rd International Workshop on OpenCL, 2015

Leveraging Silicon-Photonic NoC for Designing Scalable GPUs.
Proceedings of the 29th ACM on International Conference on Supercomputing, 2015

Efficient Register Renaming and Recovery for High-Performance Processors.
IEEE Trans. Very Large Scale Integr. Syst., 2014

Exploring the Heterogeneous Design Space for both Performance and Reliability.
Proceedings of the 51st Annual Design Automation Conference 2014, 2014

Hardware-Based Generation of Independent Subtraces of Instructions in Clustered Processors.
IEEE Trans. Computers, 2013

A Sequentially Consistent Multiprocessor Architecture for Out-of-Order Retirement of Instructions.
IEEE Trans. Parallel Distributed Syst., 2012

Page-Based Memory Allocation Policies of Local and Remote Memory in Cluster Computers.
Proceedings of the 18th IEEE International Conference on Parallel and Distributed Systems, 2012

Multi2Sim: a simulation framework for CPU-GPU computing.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

Out-of-order retirement of instructions in sequentially consistent multiprocessors.
Proceedings of the 28th International Conference on Computer Design, 2010

Exploiting subtrace-level parallelism in clustered processors.
Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, 2010

A Complexity-Effective Out-of-Order Retirement Microarchitecture.
IEEE Trans. Computers, 2009

Power Reduction In Advanced Embedded IPC Processors.
Intell. Autom. Soft Comput., 2009

A power-aware hybrid RAM-CAM renaming mechanism for fast recovery.
Proceedings of the 27th International Conference on Computer Design, 2009

Paired ROBs: A Cost-Effective Reorder Buffer Sharing Strategy for SMT Processors.
Proceedings of the Euro-Par 2009 Parallel Processing, 2009

An Efficient Low-Complexity Alternative to the ROB for Out-of-Order Retirement of Instructions.
Proceedings of the 12th Euromicro Conference on Digital System Design, 2009

The impact of out-of-order commit in coarse-grain, fine-grain and simultaneous multithreaded architectures.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Multi2Sim: A Simulation Framework to Evaluate Multicore-Multithreaded Processors.
Proceedings of the 19th Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2007), 2007

Leakage Current Reduction in Data Caches on Embedded Systems.
Proceedings of the 2007 International Conference on Intelligent Pervasive Computing, 2007

VB-MT: Design Issues and Performance of the Validation Buffer Microarchitecture for Multithreaded Processors.
Proceedings of the 16th International Conference on Parallel Architectures and Compilation Techniques (PACT 2007), 2007

RACFP: a training tool to work with floating-point representation, algorithms, and circuits in undergraduate courses.
IEEE Trans. Educ., 2006

Applying the zeros switch-off technique to reduce static energy in data caches.
Proceedings of the 18th Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2006), 2006