According to our database1, Nacho Navarro authored at least 72 papers between 1995 and 2017.
Legend:Book In proceedings Article PhD thesis Other
Data stream classification using random feature functions and novel method combinations.
Journal of Systems and Software, 2017
Adaptive Runtime-Assisted Block Prefetching on Chip-Multiprocessors.
International Journal of Parallel Programming, 2017
Efficient exception handling support for GPUs.
Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017
An open benchmark implementation for multi-CPU multi-GPU pedestrian detection in automotive systems.
Proceedings of the 2017 IEEE/ACM International Conference on Computer-Aided Design, 2017
Direct Inter-Process Communication (dIPC): Repurposing the CODOMs Architecture to Accelerate IPC.
Proceedings of the Twelfth European Conference on Computer Systems, 2017
The AXIOM software layers.
Microprocessors and Microsystems - Embedded Hardware Design, 2016
Optimization of atmospheric transport models on HPC platforms.
Computers & Geosciences, 2016
Runtime and Architecture Support for Efficient Data Exchange in Multi-Accelerator Applications.
IEEE Trans. Parallel Distrib. Syst., 2015
The AXIOM project (Agile, eXtensible, fast I/O Module).
Proceedings of the 2015 International Conference on Embedded Computer Systems: Architectures, 2015
GPU-SM: shared memory multi-GPU programming.
Proceedings of the 8th Workshop on General Purpose Processing using GPUs, 2015
Coherence protocol for transparent management of scratchpad memories in shared memory manycore architectures.
Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015
Automatic Parallelization of Kernels in Shared-Memory Multi-GPU Nodes.
Proceedings of the 29th ACM on International Conference on Supercomputing, 2015
The AXIOM Software Layers.
Proceedings of the 2015 Euromicro Conference on Digital System Design, 2015
TERAFLUX: Harnessing dataflow in next generation teradevices.
Microprocessors and Microsystems - Embedded Hardware Design, 2014
Analyzing Performance Improvements and Energy Savings in Infiniband Architecture using Network Compression.
Proceedings of the 26th IEEE International Symposium on Computer Architecture and High Performance Computing, 2014
CODOMs: Protecting software with Code-centric memory Domains.
Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture, 2014
Enabling preemptive multiprogramming on GPUs.
Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture, 2014
Software-Managed Power Reduction in Infiniband Links.
Proceedings of the 43rd International Conference on Parallel Processing, 2014
Automatic execution of single-GPU computations across multiple GPUs.
Proceedings of the International Conference on Parallel Architectures and Compilation, 2014
A Systematic Methodology to Generate Decomposable and Responsive Power Models for CMPs.
IEEE Trans. Computers, 2013
A template system for the efficient compilation of domain abstractions onto reconfigurable computers.
Journal of Systems Architecture - Embedded Systems Design, 2013
Counter-Based Power Modeling Methods: Top-Down vs. Bottom-Up.
Comput. J., 2013
The TERAFLUX Project: Exploiting the DataFlow Paradigm in Next Generation Teradevices.
Proceedings of the 2013 Euromicro Conference on Digital System Design, 2013
Comparison based sorting for systems with multiple GPUs.
Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units, 2013
Energy accounting for shared virtualized environments under DVFS using PMC-based power models.
Future Generation Comp. Syst., 2012
POTRA: a framework for building power models for next generation multicore architectures.
Proceedings of the ACM SIGMETRICS/PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems, 2012
Hardware-software coherence protocol for the coexistence of caches and local memories.
Proceedings of the SC Conference on High Performance Computing Networking, 2012
Assessing the Impact of Network Compression on Molecular Dynamics and Finite Element Methods.
Proceedings of the 14th IEEE International Conference on High Performance Computing and Communication & 9th IEEE International Conference on Embedded Software and Systems, 2012
PPMC: Hardware scheduling and memory management support for multi accelerators.
Proceedings of the 22nd International Conference on Field Programmable Logic and Applications (FPL), 2012
BSArc: blacksmith streaming architecture for HPC accelerators.
Proceedings of the Computing Frontiers Conference, CF'12, 2012
PPMC: A Programmable Pattern Based Memory Controller.
Proceedings of the Reconfigurable Computing: Architectures, Tools and Applications, 2012
Assessing Accelerator-Based HPC Reverse Time Migration.
IEEE Trans. Parallel Distrib. Syst., 2011
Local Memory Design Space Exploration for High-Performance Computing.
Comput. J., 2011
TARCAD: A template architecture for reconfigurable accelerator designs.
Proceedings of the IEEE 9th Symposium on Application Specific Processors, 2011
Design space exploration for aggressive core replication schemes in CMPs.
Proceedings of the 20th ACM International Symposium on High Performance Distributed Computing, 2011
Implementation of a Reverse Time Migration kernel using the HCE High Level Synthesis tool.
Proceedings of the 2011 International Conference on Field-Programmable Technology, 2011
FELI: HW/SW Support for On-Chip Distributed Shared Memory in Multicores.
Proceedings of the Euro-Par 2011 Parallel Processing - 17th International Conference, 2011
DiDi: Mitigating the Performance Impact of TLB Shootdowns Using a Shared TLB Directory.
Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011
Multicore: The View from Europe.
IEEE Micro, 2010
Decomposable and responsive power models for multicore processors using performance counters.
Proceedings of the 24th International Conference on Supercomputing, 2010
FEM: A Step Towards a Common Memory Layout for FPGA Based Accelerators.
Proceedings of the International Conference on Field Programmable Logic and Applications, 2010
An asymmetric distributed shared memory model for heterogeneous parallel systems.
Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems, 2010
Linux Kernel Compaction through Cold Code Swapping.
Trans. HiPEAC, 2009
REMOTE, a Wireless Sensor Network Based System to Monitor Rowing Performance.
CASES 2007 guest editors' introduction.
Design Autom. for Emb. Sys., 2009
High-Performance Reverse Time Migration on GPU.
Proceedings of the 2009 International Conference of the Chilean Computer Science Society, 2009
Cetra: A trace and analysis framework for the evaluation of Cell BE systems.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2009
Predictive Runtime Code Scheduling for Heterogeneous Architectures.
Proceedings of the High Performance Embedded Architectures and Compilers, 2009
Exploiting memory customization in FPGA for 3D stencil computations.
Proceedings of the 2009 International Conference on Field-Programmable Technology, 2009
CUBA: an architecture for efficient CPU/co-processor data communication.
Proceedings of the 22nd Annual International Conference on Supercomputing, 2008
High-Performance Embedded Architecture and Compilation Roadmap.
Trans. HiPEAC, 2007
Implicitly Parallel Programming Models for Thousand-Core Microprocessors.
Proceedings of the 44th Design Automation Conference, 2007
CIGAR: Application Partitioning for a CPU/Coprocessor Architecture.
Proceedings of the 16th International Conference on Parallel Architectures and Compilation Techniques (PACT 2007), 2007
Beating In-Order Stalls with "Flea-Flicker" Two-Pass Pipelining.
IEEE Trans. Computers, 2006
Java Virtual Machine: the key for accurated memory prefetching.
Proceedings of the International Conference on Software Engineering Research and Practice & Conference on Programming Languages and Compilers, 2006
Beating in-order stalls with "flea-flicker" two-pass pipelining.
Proceedings of the 36th Annual International Symposium on Microarchitecture, 2003
Evaluating the importance of virtual memory for Java.
Proceedings of the 2003 IEEE International Symposium on Performance Analysis of Systems and Software, 2003
Strategies for the efficient exploitation of loop-level parallelism in Java.
Concurrency and Computation: Practice and Experience, 2001
NanosCompiler: supporting flexible multilevel parallelism exploitation in OpenMP.
Concurrency - Practice and Experience, 2000
DITools: Application-level Support for Dynamic Extension and Flexible Composition.
Proceedings of the General Track: 2000 USENIX Annual Technical Conference, 2000
OpenMP Extensions for Thread Groups and Their Run-Time Support.
Proceedings of the Languages and Compilers for Parallel Computing, 2000
A Tool to Schedule Parallel Applications on Multiprocessors: The NANOS CPU MANAGER.
Proceedings of the Job Scheduling Strategies for Parallel Processing, IPDPS 2000 Workshop, 2000
Towards an efficient exploitation of loop-level parallelism in Java.
Proceedings of the ACM 2000 Java Grande Conference, San Francisco, CA, USA, 2000
Applying Interposition Techniques for Performance Analysis of OpenMP Parallel Applications.
Proceedings of the 14th International Parallel & Distributed Processing Symposium (IPDPS'00), 2000
Thread fork/join techniques for multi-level parallelism exploitation in NUMA multiprocessors.
Proceedings of the 13th international conference on Supercomputing, 1999
Exploiting Multiple Levels of Parallelism in OpenMP: A Case Study.
Proceedings of the International Conference on Parallel Processing 1999, 1999
Experiences on implementing PARMACS macros to run the SPLASH-2 suite on multiprocessors.
Kernel-level Scheduling for the Nano-threads Programming Model.
Proceedings of the 12th international conference on Supercomputing, 1998
Exploiting Parallelism Through Directives on the Nano-Threads Programming Model.
Proceedings of the Languages and Compilers for Parallel Computing, 1997
Analysis of Several Scheduling Algorithms under the Nano-Thread Programming Model.
Proceedings of the 11th International Parallel Processing Symposium (IPPS '97), 1997
A Library Implementation of the Nano-Threads Programming Model.
Proceedings of the Euro-Par '96 Parallel Processing, 1996
The eXc Model: Scheduler-Activations on Mach 3.0.
Proceedings of the Seventh IASTED/ISMM International Conference on Parallel and Distributed Computing and Systems, 1995