Nacho Navarro

According to our database1, Nacho Navarro authored at least 72 papers between 1995 and 2017.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Other 

Links

On csauthors.net:

Bibliography

2017
Data stream classification using random feature functions and novel method combinations.
Journal of Systems and Software, 2017

Adaptive Runtime-Assisted Block Prefetching on Chip-Multiprocessors.
International Journal of Parallel Programming, 2017

Efficient exception handling support for GPUs.
Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

An open benchmark implementation for multi-CPU multi-GPU pedestrian detection in automotive systems.
Proceedings of the 2017 IEEE/ACM International Conference on Computer-Aided Design, 2017

Direct Inter-Process Communication (dIPC): Repurposing the CODOMs Architecture to Accelerate IPC.
Proceedings of the Twelfth European Conference on Computer Systems, 2017

2016
The AXIOM software layers.
Microprocessors and Microsystems - Embedded Hardware Design, 2016

Optimization of atmospheric transport models on HPC platforms.
Computers & Geosciences, 2016

2015
Runtime and Architecture Support for Efficient Data Exchange in Multi-Accelerator Applications.
IEEE Trans. Parallel Distrib. Syst., 2015

The AXIOM project (Agile, eXtensible, fast I/O Module).
Proceedings of the 2015 International Conference on Embedded Computer Systems: Architectures, 2015

GPU-SM: shared memory multi-GPU programming.
Proceedings of the 8th Workshop on General Purpose Processing using GPUs, 2015

Coherence protocol for transparent management of scratchpad memories in shared memory manycore architectures.
Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

Automatic Parallelization of Kernels in Shared-Memory Multi-GPU Nodes.
Proceedings of the 29th ACM on International Conference on Supercomputing, 2015


2014
TERAFLUX: Harnessing dataflow in next generation teradevices.
Microprocessors and Microsystems - Embedded Hardware Design, 2014

Analyzing Performance Improvements and Energy Savings in Infiniband Architecture using Network Compression.
Proceedings of the 26th IEEE International Symposium on Computer Architecture and High Performance Computing, 2014

CODOMs: Protecting software with Code-centric memory Domains.
Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture, 2014

Enabling preemptive multiprogramming on GPUs.
Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture, 2014

Software-Managed Power Reduction in Infiniband Links.
Proceedings of the 43rd International Conference on Parallel Processing, 2014

Automatic execution of single-GPU computations across multiple GPUs.
Proceedings of the International Conference on Parallel Architectures and Compilation, 2014

2013
A Systematic Methodology to Generate Decomposable and Responsive Power Models for CMPs.
IEEE Trans. Computers, 2013

A template system for the efficient compilation of domain abstractions onto reconfigurable computers.
Journal of Systems Architecture - Embedded Systems Design, 2013

Counter-Based Power Modeling Methods: Top-Down vs. Bottom-Up.
Comput. J., 2013


Comparison based sorting for systems with multiple GPUs.
Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units, 2013

2012
Energy accounting for shared virtualized environments under DVFS using PMC-based power models.
Future Generation Comp. Syst., 2012

POTRA: a framework for building power models for next generation multicore architectures.
Proceedings of the ACM SIGMETRICS/PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems, 2012

Hardware-software coherence protocol for the coexistence of caches and local memories.
Proceedings of the SC Conference on High Performance Computing Networking, 2012

Assessing the Impact of Network Compression on Molecular Dynamics and Finite Element Methods.
Proceedings of the 14th IEEE International Conference on High Performance Computing and Communication & 9th IEEE International Conference on Embedded Software and Systems, 2012

PPMC: Hardware scheduling and memory management support for multi accelerators.
Proceedings of the 22nd International Conference on Field Programmable Logic and Applications (FPL), 2012

BSArc: blacksmith streaming architecture for HPC accelerators.
Proceedings of the Computing Frontiers Conference, CF'12, 2012

PPMC: A Programmable Pattern Based Memory Controller.
Proceedings of the Reconfigurable Computing: Architectures, Tools and Applications, 2012

2011
Assessing Accelerator-Based HPC Reverse Time Migration.
IEEE Trans. Parallel Distrib. Syst., 2011

Local Memory Design Space Exploration for High-Performance Computing.
Comput. J., 2011

TARCAD: A template architecture for reconfigurable accelerator designs.
Proceedings of the IEEE 9th Symposium on Application Specific Processors, 2011

Design space exploration for aggressive core replication schemes in CMPs.
Proceedings of the 20th ACM International Symposium on High Performance Distributed Computing, 2011

Implementation of a Reverse Time Migration kernel using the HCE High Level Synthesis tool.
Proceedings of the 2011 International Conference on Field-Programmable Technology, 2011

FELI: HW/SW Support for On-Chip Distributed Shared Memory in Multicores.
Proceedings of the Euro-Par 2011 Parallel Processing - 17th International Conference, 2011

DiDi: Mitigating the Performance Impact of TLB Shootdowns Using a Shared TLB Directory.
Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011

2010
Multicore: The View from Europe.
IEEE Micro, 2010

Decomposable and responsive power models for multicore processors using performance counters.
Proceedings of the 24th International Conference on Supercomputing, 2010

FEM: A Step Towards a Common Memory Layout for FPGA Based Accelerators.
Proceedings of the International Conference on Field Programmable Logic and Applications, 2010

An asymmetric distributed shared memory model for heterogeneous parallel systems.
Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems, 2010

2009
Linux Kernel Compaction through Cold Code Swapping.
Trans. HiPEAC, 2009

REMOTE, a Wireless Sensor Network Based System to Monitor Rowing Performance.
Sensors, 2009

CASES 2007 guest editors' introduction.
Design Autom. for Emb. Sys., 2009

High-Performance Reverse Time Migration on GPU.
Proceedings of the 2009 International Conference of the Chilean Computer Science Society, 2009

Cetra: A trace and analysis framework for the evaluation of Cell BE systems.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2009

Predictive Runtime Code Scheduling for Heterogeneous Architectures.
Proceedings of the High Performance Embedded Architectures and Compilers, 2009

Exploiting memory customization in FPGA for 3D stencil computations.
Proceedings of the 2009 International Conference on Field-Programmable Technology, 2009

2008
CUBA: an architecture for efficient CPU/co-processor data communication.
Proceedings of the 22nd Annual International Conference on Supercomputing, 2008

2007
High-Performance Embedded Architecture and Compilation Roadmap.
Trans. HiPEAC, 2007

Implicitly Parallel Programming Models for Thousand-Core Microprocessors.
Proceedings of the 44th Design Automation Conference, 2007

CIGAR: Application Partitioning for a CPU/Coprocessor Architecture.
Proceedings of the 16th International Conference on Parallel Architectures and Compilation Techniques (PACT 2007), 2007

2006
Beating In-Order Stalls with "Flea-Flicker" Two-Pass Pipelining.
IEEE Trans. Computers, 2006

Java Virtual Machine: the key for accurated memory prefetching.
Proceedings of the International Conference on Software Engineering Research and Practice & Conference on Programming Languages and Compilers, 2006

2003
Beating in-order stalls with "flea-flicker" two-pass pipelining.
Proceedings of the 36th Annual International Symposium on Microarchitecture, 2003

Evaluating the importance of virtual memory for Java.
Proceedings of the 2003 IEEE International Symposium on Performance Analysis of Systems and Software, 2003

2001
Strategies for the efficient exploitation of loop-level parallelism in Java.
Concurrency and Computation: Practice and Experience, 2001

2000
NanosCompiler: supporting flexible multilevel parallelism exploitation in OpenMP.
Concurrency - Practice and Experience, 2000

DITools: Application-level Support for Dynamic Extension and Flexible Composition.
Proceedings of the General Track: 2000 USENIX Annual Technical Conference, 2000

OpenMP Extensions for Thread Groups and Their Run-Time Support.
Proceedings of the Languages and Compilers for Parallel Computing, 2000

A Tool to Schedule Parallel Applications on Multiprocessors: The NANOS CPU MANAGER.
Proceedings of the Job Scheduling Strategies for Parallel Processing, IPDPS 2000 Workshop, 2000

Towards an efficient exploitation of loop-level parallelism in Java.
Proceedings of the ACM 2000 Java Grande Conference, San Francisco, CA, USA, 2000

Applying Interposition Techniques for Performance Analysis of OpenMP Parallel Applications.
Proceedings of the 14th International Parallel & Distributed Processing Symposium (IPDPS'00), 2000

1999
Thread fork/join techniques for multi-level parallelism exploitation in NUMA multiprocessors.
Proceedings of the 13th international conference on Supercomputing, 1999

Exploiting Multiple Levels of Parallelism in OpenMP: A Case Study.
Proceedings of the International Conference on Parallel Processing 1999, 1999

1998
Experiences on implementing PARMACS macros to run the SPLASH-2 suite on multiprocessors.
PDP, 1998

Kernel-level Scheduling for the Nano-threads Programming Model.
Proceedings of the 12th international conference on Supercomputing, 1998

1997
Exploiting Parallelism Through Directives on the Nano-Threads Programming Model.
Proceedings of the Languages and Compilers for Parallel Computing, 1997

Analysis of Several Scheduling Algorithms under the Nano-Thread Programming Model.
Proceedings of the 11th International Parallel Processing Symposium (IPPS '97), 1997

1996
A Library Implementation of the Nano-Threads Programming Model.
Proceedings of the Euro-Par '96 Parallel Processing, 1996

1995
The eXc Model: Scheduler-Activations on Mach 3.0.
Proceedings of the Seventh IASTED/ISMM International Conference on Parallel and Distributed Computing and Systems, 1995


  Loading...