Juan Gómez-Luna

Orcid: 0000-0002-6514-1571

According to our database1, Juan Gómez-Luna authored at least 131 papers between 2008 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
SparseACC: A Generalized Linear Model Accelerator for Sparse Datasets.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., March, 2024

ApHMM: Accelerating Profile Hidden Markov Models for Fast and Energy-efficient Genome Analysis.
ACM Trans. Archit. Code Optim., March, 2024

PUMA: Efficient and Low-Cost Memory Allocation and Alignment Support for Processing-Using-Memory Architectures.
CoRR, 2024

MIMDRAM: An End-to-End Processing-Using-DRAM System for High-Throughput, Energy-Efficient and Programmer-Transparent Multiple-Instruction Multiple-Data Processing.
CoRR, 2024

Accelerating Graph Neural Networks on Real Processing-In-Memory Systems.
CoRR, 2024

MATSA: An MRAM-Based Energy-Efficient Accelerator for Time Series Analysis.
IEEE Access, 2024

Functionally-Complete Boolean Logic in Real DRAM Chips: Experimental Characterization and Analysis.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2024

MIMDRAM: An End-to-End Processing-Using-DRAM System for High-Throughput, Energy-Efficient and Programmer-Transparent Multiple-Instruction Multiple-Data Computing.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2024

2023
GVLE: a highly optimized GPU-based implementation of variable-length encoding.
J. Supercomput., May, 2023

Scrooge: a fast and memory-frugal genomic sequence aligner for CPUs, GPUs, and ASICs.
Bioinform., May, 2023

A framework for high-throughput sequence alignment using real processing-in-memory systems.
Bioinform., May, 2023

PiDRAM: A Holistic End-to-end FPGA-based Framework for Processing-in-DRAM.
ACM Trans. Archit. Code Optim., March, 2023

ALP: Alleviating CPU-Memory Data Movement Overheads in Memory-Centric Systems.
IEEE Trans. Emerg. Top. Comput., 2023

PULSAR: Simultaneous Many-Row Activation for Reliable and High-Performance Computing in Off-the-Shelf DRAM Chips.
CoRR, 2023

Understanding Read Disturbance in High Bandwidth Memory: An Experimental Analysis of Real HBM2 DRAM Chips.
CoRR, 2023

DaPPA: A Data-Parallel Framework for Processing-in-Memory Architectures.
CoRR, 2023

TransPimLib: A Library for Efficient Transcendental Functions on Processing-in-Memory Systems.
CoRR, 2023

Extending Memory Capacity in Modern Consumer Systems With Emerging Non-Volatile Memory: Experimental Analysis and Characterization Using the Intel Optane SSD.
IEEE Access, 2023

Casper: Accelerating Stencil Computations Using Near-Cache Processing.
IEEE Access, 2023

High-Performance and Scalable Agent-Based Simulation with BioDynaMo.
Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2023

TransPimLib: Efficient Transcendental Functions for Processing-in-Memory Systems.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2023

Evaluating Machine LearningWorkloads on Memory-Centric Computing Systems.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2023

Venice: Improving Solid-State Drive Parallelism at Low Cost via Conflict-Free Accesses.
Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

Evaluating Homomorphic Operations on a Real-World Processing-In-Memory System.
Proceedings of the IEEE International Symposium on Workload Characterization, 2023

SPARTA: Spatial Acceleration for Efficient and Scalable Horizontal Diffusion Weather Stencil Computation.
Proceedings of the 37th International Conference on Supercomputing, 2023

An Experimental Analysis of RowHammer in HBM2 DRAM Chips.
Proceedings of the 53rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2023

SimplePIM: A Software Framework for Productive and Efficient Processing-in-Memory.
Proceedings of the 32nd International Conference on Parallel Architectures and Compilation Techniques, 2023

2022
pLUTo: Enabling Massively Parallel Computation In DRAM via Lookup Tables.
Dataset, July, 2022

Accelerating Weather Prediction Using Near-Memory Reconfigurable Fabric.
ACM Trans. Reconfigurable Technol. Syst., 2022

CAVLCU: an efficient GPU-based implementation of CAVLC.
J. Supercomput., 2022

SparseP: Towards Efficient Sparse Matrix Vector Multiplication on Real Processing-In-Memory Architectures.
Proc. ACM Meas. Anal. Comput. Syst., 2022

Accelerating Neural Network Inference With Processing-in-DRAM: From the Edge to the Cloud.
IEEE Micro, 2022

GUD-Canny: a real-time GPU-based unsupervised and distributed Canny edge detector.
J. Real Time Image Process., 2022

Accelerating Time Series Analysis via Processing using Non-Volatile Memories.
CoRR, 2022

RevaMp3D: Architecting the Processor Core and Cache Hierarchy for Systems with Monolithically-Integrated Logic and Memory.
CoRR, 2022

LEAPER: Modeling Cloud FPGA-based Systems via Transfer Learning.
CoRR, 2022

An Experimental Evaluation of Machine Learning Training on a Real Processing-in-Memory System.
CoRR, 2022

Going From Molecules to Genomic Variations to Scientific Discovery: Intelligent Algorithms and Architectures for Intelligent Genome Analysis.
CoRR, 2022

Towards Efficient Sparse Matrix Vector Multiplication on Real Processing-In-Memory Systems.
CoRR, 2022

EcoFlow: Efficient Convolutional Dataflows for Low-Power Neural Network Accelerators.
CoRR, 2022

SparseP: Towards Efficient Sparse Matrix Vector Multiplication on Real Processing-In-Memory Systems.
CoRR, 2022

Benchmarking a New Paradigm: Experimental Analysis and Characterization of a Real Processing-in-Memory System.
IEEE Access, 2022

Towards Efficient Sparse Matrix Vector Multiplication on Real Processing-In-Memory Architectures.
Proceedings of the SIGMETRICS/PERFORMANCE '22: ACM SIGMETRICS/IFIP PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems, Mumbai, India, June 6, 2022

pLUTo: Enabling Massively Parallel Computation in DRAM via Lookup Tables.
Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022

Morpheus: Extending the Last Level Cache Capacity in GPU Systems Using Idle GPU Core Resources.
Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022

Flash-Cosmos: In-Flash Bulk Bitwise Operations Using Inherent Computation Capability of NAND Flash Memory.
Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022

Methodologies, Workloads, and Tools for Processing-in-Memory: Enabling the Adoption of Data-Centric Architectures.
Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2022

Heterogeneous Data-Centric Architectures for Modern Data-Intensive Applications: Case Studies in Machine Learning and Databases.
Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2022

PiDRAM: An FPGA-based Framework for End-to-end Evaluation of Processing-in-DRAM Techniques.
Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2022

Machine Learning Training on a Real Processing-in-Memory System.
Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2022

SparseP: Efficient Sparse Matrix Vector Multiplication on Real Processing-In-Memory Architectures.
Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2022

Exploiting Near-Data Processing to Accelerate Time Series Analysis.
Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2022

SeGraM: a universal hardware accelerator for genomic sequence-to-graph and sequence-to-sequence mapping.
Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022

Sibyl: adaptive and extensible data placement in hybrid storage systems using online reinforcement learning.
Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022

Algorithmic Improvement and GPU Acceleration of the GenASM Algorithm.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2022

High-throughput Pairwise Alignment with the Wavefront Algorithm using Processing-in-Memory.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2022

LEAPER: Fast and Accurate FPGA-based System Performance Prediction via Transfer Learning.
Proceedings of the IEEE 40th International Conference on Computer Design, 2022

A Compiler Framework for Optimizing Dynamic Parallelism on GPUs.
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2022

2021
FPGA-Based Near-Memory Acceleration of Modern Data-Intensive Applications.
IEEE Micro, 2021

Casper: Accelerating Stencil Computation using Near-cache Processing.
CoRR, 2021

Extending Memory Capacity in Consumer Devices with Emerging Non-Volatile Memory: An Experimental Study.
CoRR, 2021

NERO: Accelerating Weather Prediction using Near-Memory Reconfigurable Fabric.
CoRR, 2021

SIMDRAM: An End-to-End Framework for Bit-Serial SIMD Computing in DRAM.
CoRR, 2021

Benchmarking a New Paradigm: An Experimental Analysis of a Real Processing-in-Memory Architecture.
CoRR, 2021

pLUTo: In-DRAM Lookup Tables to Enable Massively Parallel General-Purpose Computation.
CoRR, 2021

SISA: Set-Centric Instruction Set Architecture for Graph Mining on Processing-in-Memory Systems.
CoRR, 2021

BurstLink: Techniques for Energy-Efficient Conventional and Virtual Reality Video Display.
CoRR, 2021

SneakySnake: a fast and accurate universal genome pre-alignment filter for CPUs, GPUs and FPGAs.
Bioinform., 2021

DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement Bottlenecks.
IEEE Access, 2021

BurstLink: Techniques for Energy-Efficient Video Display for Conventional and Virtual Reality Systems.
Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

SISA: Set-Centric Instruction Set Architecture for Graph Mining on Processing-in-Memory Systems.
Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

IChannels: Exploiting Current Management Mechanisms to Create Covert Channels in Modern Processors.
Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021

CODIC: A Low-Cost Substrate for Enabling Custom In-DRAM Functionalities and Optimizations.
Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021

SynCron: Efficient Synchronization Support for Near-Data-Processing Architectures.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2021

Benchmarking Memory-Centric Computing Systems: Analysis of Real Processing-In-Memory Hardware.
Proceedings of the 12th International Green and Sustainable Computing Workshops, 2021

Modeling FPGA-Based Systems via Few-Shot Learning.
Proceedings of the FPGA '21: The 2021 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, Virtual Event, USA, February 28, 2021

SIMDRAM: a framework for bit-serial SIMD processing using DRAM.
Proceedings of the ASPLOS '21: 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2021

2020
A Modern Primer on Processing in Memory.
CoRR, 2020

Accelerating B-spline interpolation on GPUs: Application to medical image registration.
Comput. Methods Programs Biomed., 2020

Fast parallel vessel segmentation.
Comput. Methods Programs Biomed., 2020

GPU acceleration of liver enhancement for tumor segmentation.
Comput. Methods Programs Biomed., 2020

Accelerating sparse matrix-matrix multiplication with GPU Tensor Cores.
Comput. Electr. Eng., 2020

Accelerating Chan-Vese model with cross-modality guided contrast enhancement for liver segmentation.
Comput. Biol. Medicine, 2020

FIGARO: Improving System Performance via Fine-Grained In-DRAM Data Relocation and Caching.
Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020

GenASM: A High-Performance, Low-Power Approximate String Matching Acceleration Framework for Genome Sequence Analysis.
Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020

NATSA: A Near-Data Processing Accelerator for Time Series Analysis.
Proceedings of the 38th IEEE International Conference on Computer Design, 2020

NERO: A Near High-Bandwidth Memory Stencil Accelerator for Weather Prediction Modeling.
Proceedings of the 30th International Conference on Field-Programmable Logic and Applications, 2020

Boyi: A Systematic Framework for Automatically Deciding the Right Execution Model of OpenCL Applications on FPGAs.
Proceedings of the FPGA '20: The 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2020

2019
Processing data where it makes sense: Enabling in-memory computation.
Microprocess. Microsystems, 2019

Processing-in-memory: A workload-driven perspective.
IBM J. Res. Dev., 2019

A Workload and Programming Ease Driven Perspective of Processing-in-Memory.
CoRR, 2019

Dataplant: In-DRAM Security Mechanisms for Low-Cost Devices.
CoRR, 2019

Analysis and Modeling of Collaborative Execution Strategies for Heterogeneous CPU-FPGA Architectures.
Proceedings of the 2019 ACM/SPEC International Conference on Performance Engineering, 2019

SMASH: Co-designing Software Compression and Hardware-Accelerated Indexing for Efficient Sparse Matrix Operations.
Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019

NAPEL: Near-Memory Computing Application Performance Prediction via Ensemble Learning.
Proceedings of the 56th Annual Design Automation Conference 2019, 2019

Enabling Practical Processing in and near Memory for Data-Intensive Computing.
Proceedings of the 56th Annual Design Automation Conference 2019, 2019

Automatic Generation of Warp-Level Primitives and Atomic Instructions for Fast and Portable Parallel Reduction on GPUs.
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2019

2018
High-throughput Ant Colony Optimization on graphics processing units.
J. Parallel Distributed Comput., 2018

High-Performance Computation of Bézier Surfaces on Parallel and Heterogeneous Platforms.
Int. J. Parallel Program., 2018

Enabling Efficient RDMA-based Synchronous Mirroring of Persistent Memory Transactions.
CoRR, 2018

Improving tasks throughput on accelerators using OpenCL command concurrency.
CoRR, 2018

FLIN: Enabling Fairness and Enhancing Performance in Modern NVMe Solid State Drives.
Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture, 2018

MQSim: A Framework for Enabling Realistic Studies of Modern Multi-Queue SSD Devices.
Proceedings of the 16th USENIX Conference on File and Storage Technologies, 2018

2017
A tasks reordering model to reduce transfers overhead on GPUs.
J. Parallel Distributed Comput., 2017

Collaborative Computing for Heterogeneous Integrated Systems.
Proceedings of the 8th ACM/SPEC on International Conference on Performance Engineering, 2017

Chai: Collaborative heterogeneous applications for integrated-architectures.
Proceedings of the 2017 IEEE International Symposium on Performance Analysis of Systems and Software, 2017

Efficient OpenCL-based concurrent tasks offloading on accelerators.
Proceedings of the International Conference on Computational Science, 2017

2016
In-Place Matrix Transposition on GPUs.
IEEE Trans. Parallel Distributed Syst., 2016

Configurable XOR Hash Functions for Banked Scratchpad Memories in GPUs.
IEEE Trans. Computers, 2016

A programming system for future proofing performance critical libraries.
Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2016

KLAP: Kernel launch aggregation and promotion for optimizing dynamic parallelism.
Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016

Efficient kernel synthesis for performance portable programming.
Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016

Evaluating the effect of last-level cache sharing on integrated GPU-CPU systems with heterogeneous applications.
Proceedings of the 2016 IEEE International Symposium on Workload Characterization, 2016

2015
Calculation of dense trajectory descriptors on a heterogeneous embedded architecture.
J. Syst. Archit., 2015

In-Place Data Sliding Algorithms for Many-Core Architectures.
Proceedings of the 44th International Conference on Parallel Processing, 2015

2014
In-place transposition of rectangular matrices on accelerators.
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2014

Low-textured regions detection for improving stereoscopy algorithms.
Proceedings of the International Conference on High Performance Computing & Simulation, 2014

CUVLE: Variable-length encoding on CUDA.
Proceedings of the 2014 Conference on Design and Architectures for Signal and Image Processing, 2014

2013
Performance Modeling of Atomic Additions on GPU Scratchpad Memory.
IEEE Trans. Parallel Distributed Syst., 2013

An optimized approach to histogram computation on GPU.
Mach. Vis. Appl., 2013

A robust and low resource FPGA-based stereoscopic vision algorithm.
Proceedings of the 2012 International Conference on Reconfigurable Computing and FPGAs, 2013

Simulation and architecture improvements of atomic operations on GPU scratchpad memory.
Proceedings of the 2013 IEEE 31st International Conference on Computer Design, 2013

2012
Performance models for asynchronous data transfers on consumer Graphics Processing Units.
J. Parallel Distributed Comput., 2012

2011
Load Balancing versus Occupancy Maximization on Graphics Processing Units: The Generalized Hough Transform as a Case Study.
Int. J. High Perform. Comput. Appl., 2011

simARQ, An Automatic Repeat Request Simulator for Teaching Purposes.
Proceedings of the IT Revolutions, 2011

Egomotion compensation and moving objects detection algorithm on GPU.
Proceedings of the Applications, Tools and Techniques on the Road to Exascale Computing, Proceedings of the conference ParCo 2011, 31 August, 2011

2010
Parallelizing and Optimizing LIP-Canny Using NVIDIA CUDA.
Proceedings of the Trends in Applied Intelligent Systems, 2010

2009
MESI Cache Coherence Simulator for Teaching Purposes.
CLEI Electron. J., 2009

FPGA Implementation of the Generalized Hough Transform.
Proceedings of the ReConFig'09: 2009 International Conference on Reconfigurable Computing and FPGAs, 2009

Parallelization of a Video Segmentation Algorithm on CUDA-Enabled Graphics Processing Units.
Proceedings of the Euro-Par 2009 Parallel Processing, 2009

2008
Biprocessor SoC in an FPGA for Teaching Purposes.
Proceedings of the 8th IEEE International Conference on Advanced Learning Technologies, 2008


  Loading...