Miquel Pericàs

Orcid: 0000-0002-7583-6609

According to our database1, Miquel Pericàs authored at least 75 papers between 2003 and 2023.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2023
At the Locus of Performance: Quantifying the Effects of Copious 3D-Stacked Cache on HPC Workloads.
ACM Trans. Archit. Code Optim., December, 2023

Challenges and Opportunities in the Co-design of Convolutions and RISC-V Vector Processors.
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

Analysis and Characterization of Performance Variability for OpenMP Runtime.
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

Accelerating CNN inference on long vector architectures via co-design.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023

JOSS: Joint Exploration of CPU-Memory DVFS and Task Scheduling for Energy Efficiency.
Proceedings of the 52nd International Conference on Parallel Processing, 2023

ODIN: Overcoming Dynamic Interference in iNference Pipelines.
Proceedings of the Euro-Par 2023: Parallel Processing - 29th International Conference on Parallel and Distributed Computing, Limassol, Cyprus, August 28, 2023

2022
Cooperative Slack Management: Saving Energy of Multicore Processors by Trading Performance Slack Between QoS-Constrained Applications.
ACM Trans. Archit. Code Optim., 2022

ERASE: Energy Efficient Task Mapping and Resource Management for Work Stealing Runtimes.
ACM Trans. Archit. Code Optim., 2022

Task-RM: A Resource Manager for Energy Reduction in Task-Parallel Applications under Quality of Service Constraints.
ACM Trans. Archit. Code Optim., 2022

Energy-Efficiency Evaluation of OpenMP Loop Transformations and Runtime Constructs.
CoRR, 2022

At the Locus of Performance: A Case Study in Enhancing CPUs with Copious 3D-Stacked Cache.
CoRR, 2022

STEER: Asymmetry-aware Energy Efficient Task Scheduler for Cluster-based Multicore Architectures.
Proceedings of the 2022 IEEE 34th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), 2022

Shisha: Online Scheduling of CNN Pipelines on Heterogeneous Architectures.
Proceedings of the Parallel Processing and Applied Mathematics, 2022

2021
Mitigating inefficient task mappings with an Adaptive Resource-Moldable Scheduler (ARMS).
CoRR, 2021

Vectorized Barrier and Reduction in LLVM OpenMP Runtime.
Proceedings of the OpenMP: Enabling Massive Node-Level Parallelism, 2021

An online guided tuning approach to run CNN pipelines on edge devices.
Proceedings of the CF '21: Computing Frontiers Conference, 2021

CBP: Coordinated management of cache partitioning, bandwidth partitioning and prefetch throttling.
Proceedings of the 30th International Conference on Parallel Architectures and Compilation Techniques, 2021

2020
Coordinated management of DVFS and cache partitioning under QoS constraints to save energy in multi-core systems.
J. Parallel Distributed Comput., 2020

Proceedings of the Thirteenth International Workshop on Programmability and Architectures for Heterogeneous Multicores (MULTIPROG-2020).
CoRR, 2020

Coordinated Management of Processor Configuration and Cache Partitioning to Optimize Energy under QoS Constraints.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020

DELTA: Distributed Locality-Aware Cache Partitioning for Tile-based Chip Multiprocessors.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020

Scheduling Task-parallel Applications in Dynamically Asymmetric Environments.
Proceedings of the ICPP Workshops '20: Workshops, Edmonton, AB, Canada, August 17-20, 2020, 2020

Enhancing Multithreaded Performance of Asymmetric Multicores with SIMD Offloading.
Proceedings of the 2020 Design, Automation & Test in Europe Conference & Exhibition, 2020


Enhancing Thread-Level Parallelism in Asymmetric Multicores using Transparent Instruction Offloading.
Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020

2019
LEGaTO: Low-Energy, Secure, and Resilient Toolset for Heterogeneous Computing.
CoRR, 2019

An Adaptive Performance-oriented Scheduler for Static and Dynamic Heterogeneity.
CoRR, 2019

High performance scheduling of mixed-mode DAGs on heterogeneous multicores.
CoRR, 2019

QoS-Driven Coordinated Management of Resources to Save Energy in Multi-core Systems.
Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium, 2019

SaC: Exploiting Execution-Time Slack to Save Energy in Heterogeneous Multicore Systems.
Proceedings of the 48th International Conference on Parallel Processing, 2019

2018
Elastic Places: An Adaptive Resource Manager for Scalable and Portable Performance.
ACM Trans. Archit. Code Optim., 2018

Global Dead-Block Management for Task-Parallel Programs.
ACM Trans. Archit. Code Optim., 2018



2017
Trends in Data Locality Abstractions for HPC Systems.
IEEE Trans. Parallel Distributed Syst., 2017

Runtime-Assisted Global Cache Management for Task-Based Parallel Programs.
IEEE Comput. Archit. Lett., 2017

SWAS: Stealing Work Using Approximate System-Load Information.
Proceedings of the 46th International Conference on Parallel Processing Workshops, 2017

2016
Scaling FMM with Data-Driven OpenMP Tasks on Multicore Architectures.
Proceedings of the OpenMP: Memory, Devices, and Tasks, 2016

RADAR: Runtime-assisted dead region management for last-level caches.
Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture, 2016

POSTER: ξ-TAO: A Cache-centric Execution Model and Runtime for Deep Parallel Multicore Topologies.
Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

2015
DAGViz: a DAG visualization tool for analyzing task-parallel program traces.
Proceedings of the 2nd Workshop on Visual Performance Analysis, 2015

Self-Tuned Software-Managed Energy Reduction in InfiniBand Links.
Proceedings of the 21st IEEE International Conference on Parallel and Distributed Systems, 2015

2014
Analyzing Performance Improvements and Energy Savings in Infiniband Architecture using Network Compression.
Proceedings of the 26th IEEE International Symposium on Computer Architecture and High Performance Computing, 2014

Scalable analysis of multicore data reuse and sharing.
Proceedings of the 2014 International Conference on Supercomputing, 2014

Software-Managed Power Reduction in Infiniband Links.
Proceedings of the 43rd International Conference on Parallel Processing, 2014

Efficient String Sorting on Multi - and Many-Core Architectures.
Proceedings of the 2014 IEEE International Congress on Big Data, Anchorage, AK, USA, June 27, 2014

2013
Guest editorial: Workshop on Reconfigurable Computing.
J. Syst. Archit., 2013

A template system for the efficient compilation of domain abstractions onto reconfigurable computers.
J. Syst. Archit., 2013

Fork-Join and Data-Driven Execution Models on Multi-core Architectures: Case Study of the FMM.
Proceedings of the Supercomputing - 28th International Supercomputing Conference, 2013

Analysis of Data Reuse in Task-Parallel Runtimes.
Proceedings of the High Performance Computing Systems. Performance Modeling, Benchmarking and Simulation, 2013

2012
Assessing the Impact of Network Compression on Molecular Dynamics and Finite Element Methods.
Proceedings of the 14th IEEE International Conference on High Performance Computing and Communication & 9th IEEE International Conference on Embedded Software and Systems, 2012

PPMC: Hardware scheduling and memory management support for multi accelerators.
Proceedings of the 22nd International Conference on Field Programmable Logic and Applications (FPL), 2012

BSArc: blacksmith streaming architecture for HPC accelerators.
Proceedings of the Computing Frontiers Conference, CF'12, 2012

PPMC: A Programmable Pattern Based Memory Controller.
Proceedings of the Reconfigurable Computing: Architectures, Tools and Applications, 2012

2011
Assessing Accelerator-Based HPC Reverse Time Migration.
IEEE Trans. Parallel Distributed Syst., 2011

Implementation of a hierarchical N-body simulator using the Ompss programming model.
Proceedings of the first workshop on Irregular applications: architectures and algorithm, 2011

TARCAD: A template architecture for reconfigurable accelerator designs.
Proceedings of the IEEE 9th Symposium on Application Specific Processors, 2011

Implementation of a Reverse Time Migration kernel using the HCE High Level Synthesis tool.
Proceedings of the 2011 International Conference on Field-Programmable Technology, 2011

2010
FEM: A Step Towards a Common Memory Layout for FPGA Based Accelerators.
Proceedings of the International Conference on Field Programmable Logic and Applications, 2010

2009
Exploiting memory customization in FPGA for 3D stencil computations.
Proceedings of the 2009 International Conference on Field-Programmable Technology, 2009

2008
Affordable kilo-instruction processors.
PhD thesis, 2008

Power-efficient VLIW design using clustering and widening.
Int. J. Embed. Syst., 2008

Vectorized AES Core for High-throughput Secure Environments.
Proceedings of the High Performance Computing for Computational Science, 2008

A Two-Level Load/Store Queue Based on Execution Locality.
Proceedings of the 35th International Symposium on Computer Architecture (ISCA 2008), 2008

2007
A Flexible Heterogeneous Multi-Core Architecture.
Proceedings of the 16th International Conference on Parallel Architectures and Compilation Techniques (PACT 2007), 2007

2006
A decoupled KILO-instruction processor.
Proceedings of the 12th International Symposium on High-Performance Computer Architecture, 2006

2005
Kilo-Instruction Processors: Overcoming the Memory Wall.
IEEE Micro, 2005

Chained In-Order/Out-of-Order DoubleCore Architecture.
Proceedings of the 17th Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2005), 2005

Decoupled State-Execute Architecture.
Proceedings of the High-Performance Computing - 6th International Symposium, 2005

Exploiting Execution Locality with a Decoupled Kilo-Instruction Processor.
Proceedings of the High-Performance Computing - 6th International Symposium, 2005

An asymmetric clustered processor based on value content.
Proceedings of the 19th Annual International Conference on Supercomputing, 2005

2004
High-performance and low-power VLIW cores for numerical computations.
Int. J. High Perform. Comput. Netw., 2004

Performance and Power Evaluation of Clustered VLIW Processors with Wide Functional Units.
Proceedings of the Computer Systems: Architectures, 2004

An Optimized Front-End Physical Register File with Banking and Writeback Filtering.
Proceedings of the Power-Aware Computer Systems, 4th International Workshop, 2004

2003
Power-Performance Trade-Offs in Wide and Clustered VLIW Cores for Numerical Codes.
Proceedings of the High Performance Computing, 5th International Symposium, 2003


  Loading...