Julio Sahuquillo

Orcid: 0000-0001-8630-4846

Affiliations:
  • Polytechnic University of Valencia, Spain


According to our database1, Julio Sahuquillo authored at least 174 papers between 1998 and 2023.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2023
Main memory controller with multiple media technologies for big data workloads.
J. Big Data, 2023

Cloud White: Detecting and Estimating QoS Degradation of Latency-Critical Workloads in the Public Cloud.
Future Gener. Comput. Syst., 2023

SYNPA: SMT Performance Analysis and Allocation of Threads to Cores in ARM Processors.
CoRR, 2023

Stratus: A Hardware/Software Infrastructure for Controlled Cloud Research.
Proceedings of the 31st Euromicro International Conference on Parallel, 2023

Dynamic Allocation of Processor Cores to Graph Applications on Commodity Servers.
Proceedings of the 32nd International Conference on Parallel Architectures and Compilation Techniques, 2023

Thread-to-Core Allocation in ARM Processors Building Synergistic Pairs.
Proceedings of the 32nd International Conference on Parallel Architectures and Compilation Techniques, 2023

2022
DeepP: Deep Learning Multi-Program Prefetch Configuration for the IBM POWER 8.
IEEE Trans. Computers, 2022

VMT: Virtualized Multi-Threading for Accelerating Graph Workloads on Commodity Processors.
IEEE Trans. Computers, 2022

Effect of Hyper-Threading in Latency-Critical Multithreaded Cloud Applications and Utilization Analysis of the Major System Resources.
Future Gener. Comput. Syst., 2022

A Neural Network to Estimate Isolated Performance from Multi-Program Execution.
Proceedings of the 30th Euromicro International Conference on Parallel, 2022

Fast-track cache: a huge racetrack memory L1 data cache.
Proceedings of the ICS '22: 2022 International Conference on Supercomputing, Virtual Event, June 28, 2022

Cache-Poll: Containing Pollution in Non-Inclusive Caches Through Cache Partitioning.
Proceedings of the 51st International Conference on Parallel Processing, 2022


2021
Hy-Sched: A Simple Hyperthreading-Aware Thread to Core Allocation Strategy.
IEEE Comput. Archit. Lett., 2021

Segment Switching: A New Switching Strategy for Optical HPC Networks.
IEEE Access, 2021

2020
Phase-Aware Cache Partitioning to Target Both Turnaround Time and System Performance.
IEEE Trans. Parallel Distributed Syst., 2020

Bandwidth-Aware Dynamic Prefetch Configuration for IBM POWER8.
IEEE Trans. Parallel Distributed Syst., 2020

Thread Isolation to Improve Symbiotic Scheduling on SMT Multicore Processors.
IEEE Trans. Parallel Distributed Syst., 2020

An efficient cache flat storage organization for multithreaded workloads for low power processors.
Future Gener. Comput. Syst., 2020

Understanding Cloud Workloads Performance in a Production like Environment.
CoRR, 2020

Impact of the Array Shape and Memory Bandwidth on the Execution Time of CNN Systolic Arrays.
Proceedings of the 23rd Euromicro Conference on Digital System Design, 2020

2019
Way Combination for an Adaptive and Scalable Coherence Directory.
IEEE Trans. Parallel Distributed Syst., 2019

FOS: a low-power cache organization for multicores.
J. Supercomput., 2019

An Aging-Aware GPU Register File Design Based on Data Redundancy.
IEEE Trans. Computers, 2019

Efficient Management of Cache Accesses to Boost GPGPU Memory Subsystem Performance.
IEEE Trans. Computers, 2019

Modeling and analysis of the performance of exascale photonic networks.
Concurr. Comput. Pract. Exp., 2019

Foreword to the Special Issue on Processors, Interconnects, Storage, and Caches for Exascale Systems.
Concurr. Comput. Pract. Exp., 2019

2018
Efficient selective multicore prefetching under limited memory bandwidth.
J. Parallel Distributed Comput., 2018

Designing lab sessions focusing on real processors for computer architecture courses: A practical perspective.
J. Parallel Distributed Comput., 2018

Accurately modeling the on-chip and off-chip GPU memory subsystem.
Future Gener. Comput. Syst., 2018

A Workload Generator for Evaluating SMT Real-Time Systems.
Proceedings of the 2018 International Conference on High Performance Computing & Simulation, 2018

Workload Characterization for Exascale Computing Networks.
Proceedings of the 2018 International Conference on High Performance Computing & Simulation, 2018

Improving System Turnaround Time with Intel CAT by Identifying LLC Critical Applications.
Proceedings of the Euro-Par 2018: Parallel Processing, 2018

Improving GPU Cache Hierarchy Performance with a Fetch and Replacement Cache.
Proceedings of the Euro-Par 2018: Parallel Processing, 2018

2017
On Microarchitectural Mechanisms for Cache Wearout Reduction.
IEEE Trans. Very Large Scale Integr. Syst., 2017

A Hardware Approach to Fairly Balance the Inter-Thread Interference in Shared Caches.
IEEE Trans. Parallel Distributed Syst., 2017

Improving IBM POWER8 Performance Through Symbiotic Job Scheduling.
IEEE Trans. Parallel Distributed Syst., 2017

Perf&Fair: A Progress-Aware Scheduler to Enhance Performance and Fairness in SMT Multicores.
IEEE Trans. Computers, 2017

The Tag Filter Architecture: An energy-efficient cache and directory design.
J. Parallel Distributed Comput., 2017

A research-oriented course on Advanced Multicore Architecture: Contents and active learning methodologies.
J. Parallel Distributed Comput., 2017

Exploiting Data Compression to Mitigate Aging in GPU Register Files.
Proceedings of the 29th International Symposium on Computer Architecture and High Performance Computing, 2017

Modeling a Photonic Network for Exascale Computing.
Proceedings of the 2017 International Conference on High Performance Computing & Simulation, 2017

Application Clustering Policies to Address System Fairness with Intel's Cache Allocation Technology.
Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques, 2017

2016
Bandwidth-Aware On-Line Scheduling in SMT Multicores.
IEEE Trans. Computers, 2016

A dynamic execution time estimation model to save energy in heterogeneous multicores running periodic tasks.
Future Gener. Comput. Syst., 2016

Enhancing the L1 Data Cache Design to Mitigate HCI.
IEEE Comput. Archit. Lett., 2016

A Simple Activation/Deactivation Prefetching Scheme for Chip Multiprocessors.
Proceedings of the 24th Euromicro International Conference on Parallel, 2016

Impact of Memory-Level Parallelism on the Performance of GPU Coherence Protocols.
Proceedings of the 24th Euromicro International Conference on Parallel, 2016

Accurately modeling a photonic NoC in a detailed CMP simulation framework.
Proceedings of the International Conference on High Performance Computing & Simulation, 2016

Symbiotic job scheduling on the IBM POWER8.
Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture, 2016

A Directory Cache with Dynamic Private-Shared Partitioning.
Proceedings of the 23rd IEEE International Conference on High Performance Computing, 2016


Student Research Poster: A Low Complexity Cache Sharing Mechanism to Address System Fairness.
Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

2015
PS directory: a scalable multilevel directory cache for CMPs.
J. Supercomput., 2015

PS-Cache: an energy-efficient cache design for chip multiprocessors.
J. Supercomput., 2015

Design of Hybrid Second-Level Caches.
IEEE Trans. Computers, 2015

A reuse-based refresh policy for energy-aware eDRAM caches.
Microprocess. Microsystems, 2015

Surfing the Web Using Browser Interface Facilities: A Performance Evaluation Approach.
J. Web Eng., 2015

Bringing real processors to labs.
Comput. Appl. Eng. Educ., 2015

The Tag Filter Cache: An Energy-Efficient Approach.
Proceedings of the 23rd Euromicro International Conference on Parallel, 2015

Methodologies and Performance Metrics to Evaluate Multiprogram Workloads.
Proceedings of the 23rd Euromicro International Conference on Parallel, 2015

Row Tables: Design Choices to Exploit Bank Locality in Multiprogram Workloads.
Proceedings of the 23rd Euromicro International Conference on Parallel, 2015

A Research-Oriented Course on Advanced Multicore Architecture.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, 2015

Addressing Fairness in SMT Multicores with a Progress-Aware Scheduler.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

Accurately modeling the GPU memory subsystem.
Proceedings of the 2015 International Conference on High Performance Computing & Simulation, 2015

Impact of Partitioning Cache Schemes on the Cache Hierarchy of SMT Processors.
Proceedings of the 17th IEEE International Conference on High Performance Computing and Communications, 2015

2014
Efficient Register Renaming and Recovery for High-Performance Processors.
IEEE Trans. Very Large Scale Integr. Syst., 2014

Cache-Hierarchy Contention-Aware Scheduling in CMPs.
IEEE Trans. Parallel Distributed Syst., 2014

Addressing bandwidth contention in SMT multicores through scheduling.
Proceedings of the 2014 International Conference on Supercomputing, 2014

Dynamic WCET Estimation for Real-Time Multicore Embedded Systems Supporting DVFS.
Proceedings of the 2014 IEEE International Conference on High Performance Computing and Communications, 2014

Analyzing the Optimal Voltage/Frequency Pair in Fault-Tolerant Caches.
Proceedings of the 2014 IEEE International Conference on High Performance Computing and Communications, 2014

2013
Hardware-Based Generation of Independent Subtraces of Instructions in Clustered Processors.
IEEE Trans. Computers, 2013

Power-aware scheduling with effective task migration for real-time multicore embedded systems.
Concurr. Comput. Pract. Exp., 2013

Analyzing web server performance under dynamic user workloads.
Comput. Commun., 2013

Referrer Graph: A cost-effective algorithm and pruning method for predicting web accesses.
Comput. Commun., 2013

The impact of user-browser interaction on web performance.
Proceedings of the 28th Annual ACM Symposium on Applied Computing, 2013

A New Methodology for Studying Realistic Processors in Computer Science Degrees.
Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013

Exploiting reuse information to reduce refresh energy in on-chip eDRAM caches.
Proceedings of the International Conference on Supercomputing, 2013

Using Huge Pages and Performance Counters to Determine the LLC Architecture.
Proceedings of the International Conference on Computational Science, 2013

Drowsy cache partitioning for reduced static and dynamic energy in the cache hierarchy.
Proceedings of the International Green Computing Conference, 2013

Combining RAM technologies for hard-error recovery in L1 data caches working at very-low power modes.
Proceedings of the Design, Automation and Test in Europe, 2013

L1-bandwidth aware thread allocation in multicore SMT processors.
Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, 2013

2012
Impact on Performance and Energy of the Retention Time and Processor Frequency in L1 Macrocell-Based Data Caches.
IEEE Trans. Very Large Scale Integr. Syst., 2012

A Sequentially Consistent Multiprocessor Architecture for Out-of-Order Retirement of Instructions.
IEEE Trans. Parallel Distributed Syst., 2012

A cost-effective heuristic to schedule local and remote memory in cluster computers.
J. Supercomput., 2012

Design, Performance, and Energy Consumption of eDRAM/SRAM Macrocells for L1 Data Caches.
IEEE Trans. Computers, 2012

Combining recency of information with selective random and a victim cache in last-level caches.
ACM Trans. Archit. Code Optim., 2012

Prediction Algorithms for Prefetching in the Current Web.
J. Web Eng., 2012

Key factors in web latency savings in an experimental prefetching system.
J. Intell. Inf. Syst., 2012

A taxonomy of web prediction algorithms.
Expert Syst. Appl., 2012

Providing TCP-W with web user dynamic behavior.
CLEI Electron. J., 2012

Efficiently Handling Memory Accesses to Improve QoS in Multicore Systems under Real-Time Constraints.
Proceedings of the IEEE 24th International Symposium on Computer Architecture and High Performance Computing, 2012

The Impact of User's Dynamic Behavior on Web Performance.
Proceedings of the 11th IEEE International Symposium on Network Computing and Applications, 2012

Understanding Cache Hierarchy Contention in CMPs to Improve Job Scheduling.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

Page-Based Memory Allocation Policies of Local and Remote Memory in Cluster Computers.
Proceedings of the 18th IEEE International Conference on Parallel and Distributed Systems, 2012

Analyzing the optimal ratio of SRAM banks in hybrid caches.
Proceedings of the 30th International IEEE Conference on Computer Design, 2012

OMHI 2012: First International Workshop on On-chip Memory Hierarchies and Interconnects: Organization, Management and Implementation.
Proceedings of the Euro-Par 2012: Parallel Processing Workshops, 2012

Effects of Process Variation on the Access Time in SRAM Cells.
Proceedings of the Euro-Par 2012: Parallel Processing Workshops, 2012

PS-Dir: a scalable two-level directory cache.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

2011
A New Energy-Aware Dynamic Task Set Partitioning Algorithm for Soft and Hard Embedded Real-Time Systems.
Comput. J., 2011

Web Workload Generators - A Survey Focusing on user Dynamism Representation.
Proceedings of the WEBIST 2011, 2011

MRU-Tour-based Replacement Algorithms for Last-Level Caches.
Proceedings of the 23rd International Symposium on Computer Architecture and High Performance Computing, 2011

A Cluster Computer Performance Predictor for Memory Scheduling.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2011

A Dynamic Power-Aware Partitioner with Task Migration for Multicore Embedded Systems.
Proceedings of the Euro-Par 2011 Parallel Processing - 17th International Conference, 2011

Energy Behaviour of NUCA Caches in CMPs.
Proceedings of the 14th Euromicro Conference on Digital System Design, 2011

Improving Last-Level Cache Performance by Exploiting the Concept of MRU-Tour.
Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011

2010
Using current web page structure to improve prefetching performance.
Comput. Networks, 2010

Referrer graph: a low-cost web prediction algorithm.
Proceedings of the 2010 ACM Symposium on Applied Computing (SAC), 2010

Dynamic task set partitioning based on balancing resource requirements and utilization to reduce power consumption.
Proceedings of the 2010 ACM Symposium on Applied Computing (SAC), 2010

Balancing Task Resource Requirements in Embedded Multithreaded Multicore Processors to Reduce Power Consumption.
Proceedings of the 18th Euromicro Conference on Parallel, 2010

Speculative Validation of Web Objects for Further Reducing the User-Perceived Latency.
Proceedings of the NETWORKING 2010, 2010

Out-of-order retirement of instructions in sequentially consistent multiprocessors.
Proceedings of the 28th International Conference on Computer Design, 2010

Extending a Multicore Multithread Simulator to Model Power-Aware Hard Real-Time Systems.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2010

A Scheduling Heuristic to Handle Local and Remote Memory in Cluster Computers.
Proceedings of the 12th IEEE International Conference on High Performance Computing and Communications, 2010

Exploiting subtrace-level parallelism in clustered processors.
Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, 2010

2009
A Complexity-Effective Out-of-Order Retirement Microarchitecture.
IEEE Trans. Computers, 2009

Power Reduction In Advanced Embedded IPC Processors.
Intell. Autom. Soft Comput., 2009

Dweb model: Representing Web 2.0 dynamism.
Comput. Commun., 2009

An Empirical Study on Maximum Latency Saving in Web Prefetching.
Proceedings of the 2009 IEEE/WIC/ACM International Conference on Web Intelligence, 2009

An hybrid eDRAM/SRAM macrocell to implement first-level data caches.
Proceedings of the 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009), 2009

Dynamic task set partitioning based on balancing memory requirements to reduce power consumption.
Proceedings of the 23rd international conference on Supercomputing, 2009

A power-aware hybrid RAM-CAM renaming mechanism for fast recovery.
Proceedings of the 27th International Conference on Computer Design, 2009

Paired ROBs: A Cost-Effective Reorder Buffer Sharing Strategy for SMT Processors.
Proceedings of the Euro-Par 2009 Parallel Processing, 2009

An Efficient Low-Complexity Alternative to the ROB for Out-of-Order Retirement of Instructions.
Proceedings of the 12th Euromicro Conference on Digital System Design, 2009

2008
The impact of out-of-order commit in coarse-grain, fine-grain and simultaneous multithreaded architectures.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

A simple power-aware scheduling for multicore systems when running real-time applications.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Reducing the Number of Bits in the BTB to Attack the Branch Predictor Hot-Spot.
Proceedings of the Euro-Par 2008, 2008

2007
Spim-Cache: A Pedagogical Tool for Teaching Cache Memories Through Code-Based Exercises.
IEEE Trans. Educ., 2007

A user-focused evaluation of web prefetching algorithms.
Comput. Commun., 2007

Analysis of Web-Proxy Cache Replacement Algorithms under Steady-state Conditions.
Proceedings of the WEBIST 2007, 2007

Understanding cache hierarchy interactions with a program-driven simulator.
Proceedings of the 2007 Workshop on Computer Architecture Education, 2007

Multi2Sim: A Simulation Framework to Evaluate Multicore-Multithreaded Processors.
Proceedings of the 19th Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2007), 2007

Web prefetch performance evaluation in a real environment.
Proceedings of the 4th International IFIP/ACM Latin American Networking Conference, 2007

Leakage Current Reduction in Data Caches on Embedded Systems.
Proceedings of the 2007 International Conference on Intelligent Pervasive Computing, 2007

Delfos: the Oracle to Predict NextWeb User's Accesses.
Proceedings of the 21st International Conference on Advanced Information Networking and Applications (AINA 2007), 2007

VB-MT: Design Issues and Performance of the Validation Buffer Microarchitecture for Multithreaded Processors.
Proceedings of the 16th International Conference on Parallel Architectures and Compilation Techniques (PACT 2007), 2007

2006
Addressing a workload characterization study to the design of consistency protocols.
J. Supercomput., 2006

RACFP: a training tool to work with floating-point representation, algorithms, and circuits in undergraduate courses.
IEEE Trans. Educ., 2006

Web prefetching performance metrics: A survey.
Perform. Evaluation, 2006

The Impact of the Web Prefetching Architecture on the Limits of Reducing User's Perceived Latency.
Proceedings of the 2006 IEEE / WIC / ACM International Conference on Web Intelligence (WI 2006), 2006

An execution-driven simulation tool for teaching cache memories in introductory computer organization courses.
Proceedings of the 2006 Workshop on Computer Architecture Education, 2006

Applying the zeros switch-off technique to reduce static energy in data caches.
Proceedings of the 18th Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2006), 2006

Cost-Benefit Analysis of Web Prefetching Algorithms from the User's Point of View.
Proceedings of the NETWORKING 2006, 2006

DDG: An Efficient Prefetching Algorithm for Current Web Generation.
Proceedings of the 1st IEEE Workshop on Hot Topics in Web Systems and Technologies, 2006

Design keys to adapt web prefetching algorithms to environment conditions.
Proceedings of the First International Conference on COMmunication System softWAre and MiddlewaRE (COMSWARE 2006), 2006

2005
On-Chip Interconnects and Instruction Steering Schemes for Clustered Microarchitectures.
IEEE Trans. Parallel Distributed Syst., 2005

Exploring the performance of split data cache schemes on superscalar processors and symmetric multiprocessors.
J. Syst. Archit., 2005

Modelling users' dynamic behaviour in e-business environments using navigations.
Int. J. Electron. Bus., 2005

Modeling continuous changes of the user's dynamic behavior in the WWW.
Proceedings of the Fifth International Workshop on Software and Performance, 2005

A Comparison Study of the HLRC-DU Protocol versus a HLRC Hardware Assisted Protocol.
Proceedings of the 13th Euromicro Workshop on Parallel, 2005

Emulating Web Cache Replacement Algorithms versus a Real System.
Proceedings of the 10th IEEE Symposium on Computers and Communications (ISCC 2005), 2005

CARENA: a tool to capture and replay Web navigation sessions.
Proceedings of the Third IEEE/IFIP Workshop on End-to-End Monitoring Techniques and Services, 2005

Exploiting temporal locality in drowsy cache policies.
Proceedings of the Second Conference on Computing Frontiers, 2005

Performance Comparison of a Web Cache Simulation Framework.
Proceedings of the 19th International Conference on Advanced Information Networking and Applications (AINA 2005), 2005

2004
Characterizing the Dynamic Behavior of Workload Execution in SVM systems.
Proceedings of the 16th Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2004), 2004

The Multikey Web Cache Simulator: A Platform for Designing Proxy Cache Management Techniques.
Proceedings of the 12th Euromicro Workshop on Parallel, 2004

About the Heterogeneity of Web Prefetching Performance Key Metrics.
Proceedings of the Intelligence in Communication Systems, IFIP International Conference, 2004

An Experimental Framework for Testing Web Prefetching Techniques.
Proceedings of the 30th EUROMICRO Conference 2004, 31 August, 2004

2002
A lab course of computer organization.
Proceedings of the 2002 workshop on Computer architecture education, 2002

Characterizing Parallel Workloads to Reduce Multiple Writer Overhead in Shared Virtual Memory Systems.
Proceedings of the 10th Euromicro Workshop on Parallel, 2002

Efficient Interconnects for Clustered Microarchitectures.
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques (PACT 2002), 2002

2001
About the sensitivity of the HLRC-DU protocol on diff and page sizes.
Proceedings of the 2001 IEEE International Symposium on Performance Analysis of Systems and Software, 2001

XEDU, A Framework for Developing XML-Based Didactic Resources.
Proceedings of the 27th EUROMICRO Conference 2001: A Net Odyssey, 2001

2000
LIDE: a simulation environment for shared virtual memory systems.
SIGARCH Comput. Archit. News, 2000

Splitting the data cache: a survey.
IEEE Concurr., 2000

The differences between distributed shared memory caching and proxy caching.
IEEE Concurr., 2000

Self-similarity in SPLASH-2 workloads on shared memory multiprocessors systems.
Proceedings of the Eight Euromicro Workshop on Parallel and Distributed Processing, 2000

Two management approaches of the split data cache in multiprocessor systems.
Proceedings of the Eight Euromicro Workshop on Parallel and Distributed Processing, 2000

The Filter Data Cache: A Tour Management Comparison with Related Split Data Cache Schemes Sensitive to Data Localities.
Proceedings of the High Performance Computing, Third International Symposium, 2000

WWW Client/Server Traffic Characterization: A Proxy Server Point of View.
Proceedings of the 33rd Annual Hawaii International Conference on System Sciences (HICSS-33), 2000

Designing Competitive Coherence Protocols Taking Advantage of Reuse Information.
Proceedings of the 26th EUROMICRO 2000 Conference, 2000

1999
The split data cache in multiprocessor systems: an initial hit ratio analysis.
Proceedings of the Seventh Euromicro Workshop on Parallel and Distributed Processing. PDP'99, 1999

The Filter Cache: A Run-Time Cache Management Approach1.
Proceedings of the 25th EUROMICRO '99 Conference, 1999

1998
Impact of Reducing Miss Write Latencies in Multiprocessors with Two Level Cache.
Proceedings of the 24th EUROMICRO '98 Conference, 1998


  Loading...