José L. Abellán

Orcid: 0000-0003-3550-720X

According to our database1, José L. Abellán authored at least 59 papers between 2008 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Scalability Limitations of Processing-in-Memory using Real System Evaluations.
Proc. ACM Meas. Anal. Comput. Syst., 2024

AXI4MLIR: User-Driven Automatic Host Code Generation for Custom AXI-Based Accelerators.
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2024

2023
STIFT: A Spatio-Temporal Integrated Folding Tree for Efficient Reductions in Flexible DNN Accelerators.
ACM J. Emerg. Technol. Comput. Syst., October, 2023

Puppeteer: A Random Forest Based Manager for Hardware Prefetchers Across the Memory Hierarchy.
ACM Trans. Archit. Code Optim., March, 2023

Accelerating Finite Field Arithmetic for Homomorphic Encryption on GPUs.
IEEE Micro, 2023

GME: GPU-based Microarchitectural Extensions to Accelerate Homomorphic Encryption.
CoRR, 2023

GME: GPU-based Microarchitectural Extensions to Accelerate Homomorphic Encryption.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

Flexagon: A Multi-dataflow Sparse-Sparse Matrix Multiplication Accelerator for Efficient DNN Processing.
Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

2022
Accelerating Polynomial Multiplication for Homomorphic Encryption on GPUs.
Proceedings of the 2022 IEEE International Symposium on Secure and Private Execution Environment Design (SEED), 2022

Understanding the Design-Space of Sparse/Dense Multiphase GNN dataflows on Spatial Accelerators.
Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium, 2022

NaviSim: A Highly Accurate GPU Simulator for AMD RDNA GPUs.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2022

2021
Spartan: A Sparsity-Adaptive Framework to Accelerate Deep Neural Network Training on GPUs.
IEEE Trans. Parallel Distributed Syst., 2021

A Taxonomy for Classification and Comparison of Dataflows for GNN Accelerators.
CoRR, 2021

STONNE: Enabling Cycle-Level Microarchitectural Simulation for DNN Inference Accelerators.
IEEE Comput. Archit. Lett., 2021

METADOCK 2: a high-throughput parallel metaheuristic scheme for molecular docking.
Bioinform., 2021

The Challenge of Classification Confidence Estimation in Dynamically-Adaptive Neural Networks.
Proceedings of the Embedded Computer Systems: Architectures, Modeling, and Simulation, 2021

A novel network fabric for efficient spatio-temporal reduction in flexible DNN accelerators.
Proceedings of the NOCS '21: International Symposium on Networks-on-Chip, 2021

GNNMark: A Benchmark Suite to Characterize Graph Neural Network Training on GPUs.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2021

TAP-2.5D: A Thermally-Aware Chiplet Placement Methodology for 2.5D Systems.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2021

2020
MGPU-TSM: A Multi-GPU System with Truly Shared Memory.
CoRR, 2020

HALCONE : A Hardware-Level Timestamp-based Cache Coherence Scheme for Multi-GPU systems.
CoRR, 2020

STONNE: A Detailed Architectural Simulator for Flexible Neural Network Accelerators.
CoRR, 2020

QN-Docking: An innovative molecular docking methodology based on Q-Networks.
Appl. Soft Comput., 2020

Design Space Exploration of Accelerators and End-to-End DNN Evaluation with TFLITE-SOC.
Proceedings of the 32nd IEEE International Symposium on Computer Architecture and High Performance Computing, 2020

Griffin: Hardware-Software Support for Efficient Page Migration in Multi-GPU Systems.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2020

Valkyrie: Leveraging Inter-TLB Locality to Enhance GPU Performance.
Proceedings of the PACT '20: International Conference on Parallel Architectures and Compilation Techniques, 2020

2019
InsideNet: A tool for characterizing convolutional neural networks.
Future Gener. Comput. Syst., 2019

MGPUSim: enabling multi-GPU performance modeling and optimization.
Proceedings of the 46th International Symposium on Computer Architecture, 2019

CNN-SIM: A Detailed Arquitectural Simulator of CNN Accelerators.
Proceedings of the Euro-Par 2019: Parallel Processing Workshops, 2019

2018
High-throughput Ant Colony Optimization on graphics processing units.
J. Parallel Distributed Comput., 2018

Photonic-based express coherence notifications for many-core CMPs.
J. Parallel Distributed Comput., 2018

MGSim + MGMark: A Framework for Multi-GPU System Research.
CoRR, 2018

Profiling DNN Workloads on a Volta-based DGX-1 System.
Proceedings of the 2018 IEEE International Symposium on Workload Characterization, 2018

Accelerating Drugs Discovery with Deep Reinforcement Learning: An Early Approach.
Proceedings of the 47th International Conference on Parallel Processing, 2018

2017
Adaptive Tuning of Photonic Devices in a Photonic NoC Through Dynamic Workload Allocation.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2017

Secure communications in wireless network-on-chips.
Proceedings of the 2nd International Workshop on Advanced Interconnect Solutions and Technologies for Emerging Computing Systems, 2017

2016
UMH: A Hardware-Based Unified Memory Hierarchy for Systems with Multiple Discrete GPUs.
ACM Trans. Archit. Code Optim., 2016

Electro-Photonic NoC Designs for Kilocore Systems.
ACM J. Emerg. Technol. Comput. Syst., 2016

2015
Efficient Hardware-Supported Synchronization Mechanisms for Manycores.
Proceedings of the Handbook on Data Centers, 2015

Fast and efficient commits for Lazy-Lazy hardware transactional memory.
J. Supercomput., 2015

Managing Laser Power in Silicon-Photonic NoC Through Cache and NoC Reconfiguration.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2015

Asymmetric NoC Architectures for GPU Systems.
Proceedings of the 9th International Symposium on Networks-on-Chip, 2015

Enhancing the Parallelization of Non-bonded Interactions Kernel for Virtual Screening on GPUs.
Proceedings of the Bioinformatics and Biomedical Engineering, 2015

Leveraging Silicon-Photonic NoC for Designing Scalable GPUs.
Proceedings of the 29th ACM on International Conference on Supercomputing, 2015

2014
Thermal management of manycore systems with silicon-photonic networks.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2014

2013
Design of an efficient communication infrastructure for highly contended locks in many-core CMPs.
J. Parallel Distributed Comput., 2013

ECONO: Express coherence notifications for efficient cache coherency in many-core CMPs.
Proceedings of the 2013 International Conference on Embedded Computer Systems: Architectures, 2013

Efficient Dir0B Cache Coherency for Many-Core CMPs.
Proceedings of the International Conference on Computational Science, 2013

Deploying Hardware Locks to Improve Performance and Energy Efficiency of Hardware Transactional Memory.
Proceedings of the Architecture of Computing Systems - ARCS 2013, 2013

2012
Efficient Hardware Barrier Synchronization in Many-Core CMPs.
IEEE Trans. Parallel Distributed Syst., 2012

Stencil computations on heterogeneous platforms for the Jacobi method: GPUs versus Cell BE.
J. Supercomput., 2012

Design of a collective communication infrastructure for barrier synchronization in cluster-based nanoscale MPSoCs.
Proceedings of the 2012 Design, Automation & Test in Europe Conference & Exhibition, 2012

2011
GLocks: Efficient Support for Highly-Contended Locks in Many-Core CMPs.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

2010
Characterizing the basic synchronization and communication operations in Dual Cell-based Blades through CellStats.
J. Supercomput., 2010

A G-Line-Based Network for Fast and Efficient Barrier Synchronization in Many-Core CMPs.
Proceedings of the 39th International Conference on Parallel Processing, 2010

Efficient and scalable barrier synchronization for many-core CMPs.
Proceedings of the 7th Conference on Computing Frontiers, 2010

2008
CellStats: A Tool to Evaluate the Basic Synchronization and Communication Operations of the Cell BE.
Proceedings of the 16th Euromicro International Conference on Parallel, 2008

Characterizing the Basic Synchronization and Communication Operations in Dual Cell-Based Blades.
Proceedings of the Computational Science, 2008

Multicore Platforms for Scientific Computing: Cell BE and NVIDIA Tesla.
Proceedings of the 2008 International Conference on Scientific Computing, 2008


  Loading...