Juan Gómez-Luna

According to our database1, Juan Gómez-Luna authored at least 54 papers between 2008 and 2020.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2020
NATSA: A Near-Data Processing Accelerator for Time Series Analysis.
CoRR, 2020

Accelerating Sparse Matrix-Matrix Multiplication with GPU Tensor Cores.
CoRR, 2020

Accelerating B-spline interpolation on GPUs: Application to medical image registration.
Comput. Methods Programs Biomed., 2020

Fast parallel vessel segmentation.
Comput. Methods Programs Biomed., 2020

GPU acceleration of liver enhancement for tumor segmentation.
Comput. Methods Programs Biomed., 2020

Accelerating Chan-Vese model with cross-modality guided contrast enhancement for liver segmentation.
Comput. Biol. Medicine, 2020

FIGARO: Improving System Performance via Fine-Grained In-DRAM Data Relocation and Caching.
Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020

GenASM: A High-Performance, Low-Power Approximate String Matching Acceleration Framework for Genome Sequence Analysis.
Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020

NERO: A Near High-Bandwidth Memory Stencil Accelerator for Weather Prediction Modeling.
Proceedings of the 30th International Conference on Field-Programmable Logic and Applications, 2020

Boyi: A Systematic Framework for Automatically Deciding the Right Execution Model of OpenCL Applications on FPGAs.
Proceedings of the FPGA '20: The 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2020

2019
Processing data where it makes sense: Enabling in-memory computation.
Microprocess. Microsystems, 2019

Processing-in-memory: A workload-driven perspective.
IBM J. Res. Dev., 2019

SneakySnake: A Fast and Accurate Universal Genome Pre-Alignment Filter for CPUs, GPUs, and FPGAs.
CoRR, 2019

A Workload and Programming Ease Driven Perspective of Processing-in-Memory.
CoRR, 2019

Dataplant: In-DRAM Security Mechanisms for Low-Cost Devices.
CoRR, 2019

Analysis and Modeling of Collaborative Execution Strategies for Heterogeneous CPU-FPGA Architectures.
Proceedings of the 2019 ACM/SPEC International Conference on Performance Engineering, 2019

SMASH: Co-designing Software Compression and Hardware-Accelerated Indexing for Efficient Sparse Matrix Operations.
Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019

NAPEL: Near-Memory Computing Application Performance Prediction via Ensemble Learning.
Proceedings of the 56th Annual Design Automation Conference 2019, 2019

Enabling Practical Processing in and near Memory for Data-Intensive Computing.
Proceedings of the 56th Annual Design Automation Conference 2019, 2019

Automatic Generation of Warp-Level Primitives and Atomic Instructions for Fast and Portable Parallel Reduction on GPUs.
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2019

2018
High-throughput Ant Colony Optimization on graphics processing units.
J. Parallel Distributed Comput., 2018

High-Performance Computation of Bézier Surfaces on Parallel and Heterogeneous Platforms.
Int. J. Parallel Program., 2018

Enabling Efficient RDMA-based Synchronous Mirroring of Persistent Memory Transactions.
CoRR, 2018

Improving tasks throughput on accelerators using OpenCL command concurrency.
CoRR, 2018

FLIN: Enabling Fairness and Enhancing Performance in Modern NVMe Solid State Drives.
Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture, 2018

MQSim: A Framework for Enabling Realistic Studies of Modern Multi-Queue SSD Devices.
Proceedings of the 16th USENIX Conference on File and Storage Technologies, 2018

2017
A tasks reordering model to reduce transfers overhead on GPUs.
J. Parallel Distributed Comput., 2017

Collaborative Computing for Heterogeneous Integrated Systems.
Proceedings of the 8th ACM/SPEC on International Conference on Performance Engineering, 2017

Chai: Collaborative heterogeneous applications for integrated-architectures.
Proceedings of the 2017 IEEE International Symposium on Performance Analysis of Systems and Software, 2017

Efficient OpenCL-based concurrent tasks offloading on accelerators.
Proceedings of the International Conference on Computational Science, 2017

2016
In-Place Matrix Transposition on GPUs.
IEEE Trans. Parallel Distributed Syst., 2016

Configurable XOR Hash Functions for Banked Scratchpad Memories in GPUs.
IEEE Trans. Computers, 2016

A programming system for future proofing performance critical libraries.
Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2016

KLAP: Kernel launch aggregation and promotion for optimizing dynamic parallelism.
Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016

Efficient kernel synthesis for performance portable programming.
Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016

Evaluating the effect of last-level cache sharing on integrated GPU-CPU systems with heterogeneous applications.
Proceedings of the 2016 IEEE International Symposium on Workload Characterization, 2016

2015
Calculation of dense trajectory descriptors on a heterogeneous embedded architecture.
J. Syst. Archit., 2015

In-Place Data Sliding Algorithms for Many-Core Architectures.
Proceedings of the 44th International Conference on Parallel Processing, 2015

2014
In-place transposition of rectangular matrices on accelerators.
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2014

Low-textured regions detection for improving stereoscopy algorithms.
Proceedings of the International Conference on High Performance Computing & Simulation, 2014

CUVLE: Variable-length encoding on CUDA.
Proceedings of the 2014 Conference on Design and Architectures for Signal and Image Processing, 2014

2013
Performance Modeling of Atomic Additions on GPU Scratchpad Memory.
IEEE Trans. Parallel Distributed Syst., 2013

An optimized approach to histogram computation on GPU.
Mach. Vis. Appl., 2013

A robust and low resource FPGA-based stereoscopic vision algorithm.
Proceedings of the 2012 International Conference on Reconfigurable Computing and FPGAs, 2013

Simulation and architecture improvements of atomic operations on GPU scratchpad memory.
Proceedings of the 2013 IEEE 31st International Conference on Computer Design, 2013

2012
Performance models for asynchronous data transfers on consumer Graphics Processing Units.
J. Parallel Distributed Comput., 2012

2011
Load Balancing versus Occupancy Maximization on Graphics Processing Units: The Generalized Hough Transform as a Case Study.
Int. J. High Perform. Comput. Appl., 2011

simARQ, An Automatic Repeat Request Simulator for Teaching Purposes.
Proceedings of the IT Revolutions, 2011

Egomotion compensation and moving objects detection algorithm on GPU.
Proceedings of the Applications, Tools and Techniques on the Road to Exascale Computing, Proceedings of the conference ParCo 2011, 31 August, 2011

2010
Parallelizing and Optimizing LIP-Canny Using NVIDIA CUDA.
Proceedings of the Trends in Applied Intelligent Systems, 2010

2009
MESI Cache Coherence Simulator for Teaching Purposes.
CLEI Electron. J., 2009

FPGA Implementation of the Generalized Hough Transform.
Proceedings of the ReConFig'09: 2009 International Conference on Reconfigurable Computing and FPGAs, 2009

Parallelization of a Video Segmentation Algorithm on CUDA-Enabled Graphics Processing Units.
Proceedings of the Euro-Par 2009 Parallel Processing, 2009

2008
Biprocessor SoC in an FPGA for Teaching Purposes.
Proceedings of the 8th IEEE International Conference on Advanced Learning Technologies, 2008


  Loading...