José L. Abellán

Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024

AXI4MLIR: User-Driven Automatic Host Code Generation for Custom AXI-Based Accelerators.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2024

2023

STIFT: A Spatio-Temporal Integrated Folding Tree for Efficient Reductions in Flexible DNN Accelerators.

[BibT_eX]

[DOI]

ACM J. Emerg. Technol. Comput. Syst., October, 2023

Puppeteer: A Random Forest Based Manager for Hardware Prefetchers Across the Memory Hierarchy.

[BibT_eX]

[DOI]

Furkan Eris

Marcia S. Louis

Kubra Eris

José Luis Abellán Miguel

Ajay Joshi

ACM Trans. Archit. Code Optim., March, 2023

Accelerating Finite Field Arithmetic for Homomorphic Encryption on GPUs.

[BibT_eX]

[DOI]

IEEE Micro, 2023

GME: GPU-based Microarchitectural Extensions to Accelerate Homomorphic Encryption.

[BibT_eX]

[DOI]

Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

Flexagon: A Multi-dataflow Sparse-Sparse Matrix Multiplication Accelerator for Efficient DNN Processing.

[BibT_eX]

[DOI]

Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

2022

Accelerating Polynomial Multiplication for Homomorphic Encryption on GPUs.

[BibT_eX]

[DOI]

Proceedings of the 2022 IEEE International Symposium on Secure and Private Execution Environment Design (SEED), 2022

Understanding the Design-Space of Sparse/Dense Multiphase GNN dataflows on Spatial Accelerators.

[BibT_eX]

[DOI]

Raveesh Garg

Eric Qin

Sivasankaran Rajamanickam

Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium, 2022

NaviSim: A Highly Accurate GPU Simulator for AMD RDNA GPUs.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2022

2021

Spartan: A Sparsity-Adaptive Framework to Accelerate Deep Neural Network Training on GPUs.

[BibT_eX]

[DOI]

Shi Dong

Yifan Sun

IEEE Trans. Parallel Distributed Syst., 2021

A Taxonomy for Classification and Comparison of Dataflows for GNN Accelerators.

[BibT_eX]

[DOI]

Raveesh Garg

Eric Qin

Sivasankaran Rajamanickam

CoRR, 2021

METADOCK 2: a high-throughput parallel metaheuristic scheme for molecular docking.

[BibT_eX]

[DOI]

Bioinform., 2021

The Challenge of Classification Confidence Estimation in Dynamically-Adaptive Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the Embedded Computer Systems: Architectures, Modeling, and Simulation, 2021

A novel network fabric for efficient spatio-temporal reduction in flexible DNN accelerators.

[BibT_eX]

[DOI]

Proceedings of the NOCS '21: International Symposium on Networks-on-Chip, 2021

GNNMark: A Benchmark Suite to Characterize Graph Neural Network Training on GPUs.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2021

STONNE: Enabling Cycle-Level Microarchitectural Simulation for DNN Inference Accelerators.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Workload Characterization, 2021

TAP-2.5D: A Thermally-Aware Chiplet Placement Methodology for 2.5D Systems.

[BibT_eX]

[DOI]

Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2021

2020

MGPU-TSM: A Multi-GPU System with Truly Shared Memory.

[BibT_eX]

[DOI]

CoRR, 2020

HALCONE : A Hardware-Level Timestamp-based Cache Coherence Scheme for Multi-GPU systems.

[BibT_eX]

[DOI]

CoRR, 2020

STONNE: A Detailed Architectural Simulator for Flexible Neural Network Accelerators.

[BibT_eX]

[DOI]

CoRR, 2020

QN-Docking: An innovative molecular docking methodology based on Q-Networks.

[BibT_eX]

[DOI]

Antonio Serrano

Baldomero Imbernón

Andrés Bueno-Crespo

Appl. Soft Comput., 2020

Design Space Exploration of Accelerators and End-to-End DNN Evaluation with TFLITE-SOC.

[BibT_eX]

[DOI]

Shi Dong

Elmira Karimi

Marti Torrents Lapuerta

José Cano

David R. Kaeli

Proceedings of the 32nd IEEE International Symposium on Computer Architecture and High Performance Computing, 2020

Griffin: Hardware-Software Support for Efficient Page Migration in Multi-GPU Systems.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2020

Valkyrie: Leveraging Inter-TLB Locality to Enhance GPU Performance.

[BibT_eX]

[DOI]

Proceedings of the PACT '20: International Conference on Parallel Architectures and Compilation Techniques, 2020

2019

InsideNet: A tool for characterizing convolutional neural networks.

[BibT_eX]

[DOI]

Future Gener. Comput. Syst., 2019

MGPUSim: enabling multi-GPU performance modeling and optimization.

[BibT_eX]

[DOI]

Proceedings of the 46th International Symposium on Computer Architecture, 2019

CNN-SIM: A Detailed Arquitectural Simulator of CNN Accelerators.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2019: Parallel Processing Workshops, 2019

2018

High-throughput Ant Colony Optimization on graphics processing units.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 2018

Photonic-based express coherence notifications for many-core CMPs.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 2018

MGSim + MGMark: A Framework for Multi-GPU System Research.

[BibT_eX]

[DOI]

CoRR, 2018

Profiling DNN Workloads on a Volta-based DGX-1 System.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Symposium on Workload Characterization, 2018

Accelerating Drugs Discovery with Deep Reinforcement Learning: An Early Approach.

[BibT_eX]

[DOI]

Antonio Serrano

Baldomero Imbernón

Andrés Bueno-Crespo

Fernando Pereñíguez-Garcia

Proceedings of the 47th International Conference on Parallel Processing, 2018

2017

Adaptive Tuning of Photonic Devices in a Photonic NoC Through Dynamic Workload Allocation.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2017

Secure communications in wireless network-on-chips.

[BibT_eX]

[DOI]

Proceedings of the 2nd International Workshop on Advanced Interconnect Solutions and Technologies for Emerging Computing Systems, 2017

2016

UMH: A Hardware-Based Unified Memory Hierarchy for Systems with Multiple Discrete GPUs.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2016

Electro-Photonic NoC Designs for Kilocore Systems.

[BibT_eX]

[DOI]

Chao Chen

Ajay Joshi

ACM J. Emerg. Technol. Comput. Syst., 2016

2015

Efficient Hardware-Supported Synchronization Mechanisms for Manycores.

[BibT_eX]

[DOI]

Proceedings of the Handbook on Data Centers, 2015

Fast and efficient commits for Lazy-Lazy hardware transactional memory.

[BibT_eX]

[DOI]

Epifanio Gaona-Ramírez

J. Supercomput., 2015

Managing Laser Power in Silicon-Photonic NoC Through Cache and NoC Reconfiguration.

[BibT_eX]

[DOI]

Chao Chen

Ajay Joshi

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2015

Asymmetric NoC Architectures for GPU Systems.

[BibT_eX]

[DOI]

Proceedings of the 9th International Symposium on Networks-on-Chip, 2015

Enhancing the Parallelization of Non-bonded Interactions Kernel for Virtual Screening on GPUs.

[BibT_eX]

[DOI]

Proceedings of the Bioinformatics and Biomedical Engineering, 2015

Leveraging Silicon-Photonic NoC for Designing Scalable GPUs.

[BibT_eX]

[DOI]

Proceedings of the 29th ACM on International Conference on Supercomputing, 2015

2014

Thermal management of manycore systems with silicon-photonic networks.

[BibT_eX]

[DOI]

Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2014

2013

Design of an efficient communication infrastructure for highly contended locks in many-core CMPs.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 2013

ECONO: Express coherence notifications for efficient cache coherency in many-core CMPs.

[BibT_eX]

[DOI]

Alberto Ros

Juan Fernández Peinador

Proceedings of the 2013 International Conference on Embedded Computer Systems: Architectures, 2013

Efficient Dir0B Cache Coherency for Many-Core CMPs.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Computational Science, 2013

Deploying Hardware Locks to Improve Performance and Energy Efficiency of Hardware Transactional Memory.

[BibT_eX]

[DOI]

Epifanio Gaona-Ramírez

Proceedings of the Architecture of Computing Systems - ARCS 2013, 2013

2012

Efficient Hardware Barrier Synchronization in Many-Core CMPs.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2012

Stencil computations on heterogeneous platforms for the Jacobi method: GPUs versus Cell BE.

[BibT_eX]

[DOI]

J. Supercomput., 2012

Design of a collective communication infrastructure for barrier synchronization in cluster-based nanoscale MPSoCs.

[BibT_eX]

[DOI]

Juan Fernández Peinador

Proceedings of the 2012 Design, Automation & Test in Europe Conference & Exhibition, 2012

2011

GLocks: Efficient Support for Highly-Contended Locks in Many-Core CMPs.

[BibT_eX]

[DOI]

Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

2010

Characterizing the basic synchronization and communication operations in Dual Cell-based Blades through CellStats.

[BibT_eX]

[DOI]

J. Supercomput., 2010

A G-Line-Based Network for Fast and Efficient Barrier Synchronization in Many-Core CMPs.

[BibT_eX]

[DOI]

Proceedings of the 39th International Conference on Parallel Processing, 2010

Efficient and scalable barrier synchronization for many-core CMPs.

[BibT_eX]

[DOI]

Proceedings of the 7th Conference on Computing Frontiers, 2010

2008

CellStats: A Tool to Evaluate the Basic Synchronization and Communication Operations of the Cell BE.

[BibT_eX]

[DOI]

Proceedings of the 16th Euromicro International Conference on Parallel, 2008

Characterizing the Basic Synchronization and Communication Operations in Dual Cell-Based Blades.

[BibT_eX]

[DOI]