Federico Silla

Extending rCUDA with Support for P2P Memory Copies between Remote GPUs.

[BibT_eX]

[DOI]

Remote GPU Virtualization: Is It Useful?

[BibT_eX]

[DOI]

Proceedings of the 2nd IEEE International Workshop on High-Performance Interconnection Networks in the Exascale and Big-Data Era HiPINEB@HPCA 2016, 2016

Providing CUDA Acceleration to KVM Virtual Machines in InfiniBand Clusters with rCUDA.

[BibT_eX]

[DOI]

Ferran Perez

Proceedings of the Distributed Applications and Interoperable Systems, 2016

Increasing the Performance of Data Centers by Combining Remote GPU Virtualization with Slurm.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM 16th International Symposium on Cluster, 2016

2015

On the design of a new dynamic credit-based end-to-end flow control mechanism for HPC clusters.

[BibT_eX]

[DOI]

Parallel Comput., 2015

Improving the user experience of the rCUDA remote GPU virtualization framework.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2015

Local and Remote GPUs Perform Similar with EDR 100G InfiniBand.

[BibT_eX]

[DOI]

Proceedings of the Industrial Track of the 16th International Middleware Conference, 2015

A Live Demo on Remote GPU Accelerated Deep Learning Using the rCUDA Middleware.

[BibT_eX]

[DOI]

Proceedings of the Posters and Demos Session of the 16th International Middleware Conference, 2015

Acceleration-as-a-Service: Exploiting Virtualised GPUs for a Financial Application.

[BibT_eX]

[DOI]

Proceedings of the 11th IEEE International Conference on e-Science, 2015

On the Execution of Computationally Intensive CPU-Based Libraries on Remote Accelerators for Increasing Performance: Early Experience with the OpenBLAS and FFTW Libraries.

[BibT_eX]

[DOI]

Santiago Mislata Valero

Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

InfiniBand Verbs Optimizations for Remote GPU Virtualization.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

A Performance Comparison of CUDA Remote GPU Virtualization Frameworks.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

On the Design of a Demo for Exhibiting rCUDA.

[BibT_eX]

[DOI]

Ferran Perez

Proceedings of the 15th IEEE/ACM International Symposium on Cluster, 2015

2014

A complete and efficient CUDA-sharing solution for HPC clusters.

[BibT_eX]

[DOI]

Parallel Comput., 2014

Special issue on unconventional cluster architectures and applications.

[BibT_eX]

[DOI]

Holger Fröning

Clust. Comput., 2014

SLURM Support for Remote GPU Virtualization: Implementation and Performance Study.

[BibT_eX]

[DOI]

Sergio Iserte

Adrián Castelló

Rafael Mayo

Proceedings of the 26th IEEE International Symposium on Computer Architecture and High Performance Computing, 2014

Boosting the performance of remote GPU virtualization using InfiniBand connect-IB and PCIe 3.0.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE International Conference on Cluster Computing, 2014

2013

Silicon-aware distributed switch architecture for on-chip networks.

[BibT_eX]

[DOI]

J. Syst. Archit., 2013

Influence of InfiniBand FDR on the performance of remote GPU virtualization.

[BibT_eX]

[DOI]

Rafael Mayo

Antonio J. Peña

Proceedings of the 2013 IEEE International Conference on Cluster Computing, 2013

2012

On the Impact of Within-Die Process Variation in GALS-Based NoC Performance.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2012

A new degree of freedom for memory allocation in clusters.

[BibT_eX]

[DOI]

Clust. Comput., 2012

Enabling High-Performance Crossbars through a Floorplan-Aware Design.

[BibT_eX]

[DOI]

Proceedings of the 41st International Conference on Parallel Processing, 2012

CU2rCU: Towards the complete rCUDA remote GPU virtualization and sharing solution.

[BibT_eX]

[DOI]

Proceedings of the 19th International Conference on High Performance Computing, 2012

Addressing Link Degradation in NoC-Based ULSI Designs.

[BibT_eX]

[DOI]

Carles Hernández

Proceedings of the Euro-Par 2012: Parallel Processing Workshops, 2012

A New End-to-End Flow-Control Mechanism for High Performance Computing Clusters.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Conference on Cluster Computing, 2012

2011

HyperTransport.

[BibT_eX]

[DOI]

Proceedings of the Encyclopedia of Parallel Computing, 2011

Cost-Efficient On-Chip Routing Implementations for CMP and MPSoC Systems.

[BibT_eX]

[DOI]

Jesús Camacho Villanueva

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2011

A low-latency modular switch for CMP systems.

[BibT_eX]

[DOI]

Microprocess. Microsystems, 2011

Characterizing the impact of process variation on 45 nm NoC-based CMPs.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 2011

Self-Calibrating Source Synchronous Communication for Delay Variation Tolerant GALS Network-on-Chip Design.

[BibT_eX]

[DOI]

Int. J. Embed. Real Time Commun. Syst., 2011

Fault-Tolerant Vertical Link Design for Effective 3D Stacking.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2011

A Distributed Switch Architecture for On-Chip Networks.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Parallel Processing, 2011

Energy and Performance Efficient Thread Mapping in NoC-Based CMPs under Process Variations.

[BibT_eX]

[DOI]

Carles Hernández

Proceedings of the International Conference on Parallel Processing, 2011

Performance of CUDA Virtualized Remote GPUs in High Performance Clusters.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Parallel Processing, 2011

MEMSCALE<sup>TM</sup>: A Scalable Environment for Databases.

[BibT_eX]

[DOI]

Proceedings of the 13th IEEE International Conference on High Performance Computing & Communication, 2011

Unleash Your Memory-Constrained Applications: A 32-Node Non-coherent Distributed-Memory Prototype Cluster.

[BibT_eX]

[DOI]

Proceedings of the 13th IEEE International Conference on High Performance Computing & Communication, 2011

Highly scalable barriers for future high-performance computing clusters.

[BibT_eX]

[DOI]

Proceedings of the 18th International Conference on High Performance Computing, 2011

Enabling CUDA acceleration within virtual machines using rCUDA.

[BibT_eX]

[DOI]

Antonio J. Peña

Juan Carlos Fernández

Rafael Mayo

Dionisios N. Pnevmatikatos

Proceedings of the 18th International Conference on High Performance Computing, 2011

MEMSCALE: in-cluster-memory databases.

[BibT_eX]

[DOI]

Proceedings of the 20th ACM Conference on Information and Knowledge Management, 2011

2010

Explicit Communication and Synchronization in SARC.

[BibT_eX]

[DOI]

Manolis Katevenis

Vassilis Papaefstathiou

Stamatis G. Kavadias

Dimitrios S. Nikolopoulos

IEEE Micro, 2010

Addressing Manufacturing Challenges with Cost-Efficient Fault Tolerant Routing.

[BibT_eX]

[DOI]

Jesús Camacho Villanueva

Proceedings of the NOCS 2010, 2010

Improving the Performance of GALS-Based NoCs in the Presence of Process Variation.

[BibT_eX]

[DOI]

Proceedings of the NOCS 2010, 2010

Process variation and layout mismatch tolerant design of source synchronous links for GALS networks-on-chip.

[BibT_eX]

[DOI]

Proceedings of the 2010 International Symposium on System on Chip, SoC 2010, Tampere, 2010

rCUDA: Reducing the number of GPU-based accelerators in high performance clusters.

[BibT_eX]

[DOI]

Proceedings of the 2010 International Conference on High Performance Computing & Simulation, 2010

A practical way to extend shared memory support beyond a motherboard at low cost.

[BibT_eX]

[DOI]

Héctor Montaner

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, 2010

VCTlite: Towards an efficient implementation of virtual cut-through switching in on-chip networks.

[BibT_eX]

[DOI]

Proceedings of the 2010 International Conference on High Performance Computing, 2010

A Latency-Efficient Router Architecture for CMP Systems.

[BibT_eX]

[DOI]

Proceedings of the 13th Euromicro Conference on Digital System Design, 2010

A methodology for the characterization of process variation in NoC links.

[BibT_eX]

[DOI]

Carles Hernández

Proceedings of the Design, Automation and Test in Europe, 2010

Getting Rid of Coherency Overhead for Memory-Hungry Applications.

[BibT_eX]

[DOI]

Proceedings of the 2010 IEEE International Conference on Cluster Computing, 2010

2009

Yield-oriented evaluation methodology of network-on-chip routing implementations.

[BibT_eX]

[DOI]

Proceedings of the 2008 IEEE International Symposium on System-on-Chip, 2009

A new mechanism to deal with process variability in NoC links.

[BibT_eX]

[DOI]

Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

An Efficient Implementation of GPU Virtualization in High Performance Clusters.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2009, 2009

2008

Evaluation of memory performance on the cell BE with the SARC programming model.

[BibT_eX]

[DOI]

Proceedings of the 9th workshop on MEmory performance, 2008

Network Reconfiguration Suitability for Scientific Applications.

[BibT_eX]

[DOI]

Proceedings of the 2008 International Conference on Parallel Processing, 2008

2004

On the development of a communication-aware task mapping technique.

[BibT_eX]

[DOI]

J. Syst. Archit., 2004

2003

LSOM: A Link State Protocol Over Mac Addresses for Metropolitan Backbones Using Optical Ethernet Switches.

[BibT_eX]

[DOI]

Román García

Proceedings of the 2nd IEEE International Symposium on Network Computing and Applications (NCA 2003), 2003

2002

A Clustering Method for Modeling the Communication Requirements of Message-Passing Applications.

[BibT_eX]

[DOI]

Comput. Artif. Intell., 2002

A comparative study of arbitration algorithms for the Alpha 21364 pipelined router.

[BibT_eX]

[DOI]

Shubhendu S. Mukherjee

Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-X), 2002

2001

A Comparison of Router Architectures for Virtual Cut-Through and Wormhole Switching in a NOW Environment.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 2001

Towards a Communication-Aware Task Scheduling Strategy for Heterogeneous Systems.

[BibT_eX]

[DOI]

Comput. Artif. Intell., 2001

On the Impact of Message Packetization in Networks of Workstations with Irregular Topology.

[BibT_eX]

[DOI]

Proceedings of the Ninth Euromicro Workshop on Parallel and Distributed Processing, 2001

On the Scalability of Topologies for Storage Area Networks in Building Environments.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Network Computing and Applications (NCA 2001), 2001

On the Interconnection Topology for Storage Area Networks.

[BibT_eX]

[DOI]

Proceedings of the 15th International Parallel & Distributed Processing Symposium (IPDPS-01), 2001

A New Task Mapping Technique for Communication-Aware Scheduling Strategies.

[BibT_eX]

[DOI]

Proceedings of the 30th International Workshops on Parallel Processing (ICPP 2001 Workshops), 2001

On the Switch Architecture for Fibre Channel Storage Area Networks.

[BibT_eX]

[DOI]

Proceedings of the Eigth International Conference on Parallel and Distributed Systems, 2001

Improving Network Performance by Efficiently Dealing with Short Control Messages in Fibre Channel SANs.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2001: Parallel Processing, 2001

A Tool for the Design and Evaluation of Fibre Channel Storage Area Networks.

[BibT_eX]

[DOI]

Proceedings of the Proceedings 34th Annual Simulation Symposium (SS 2001), 2001

2000

High-Performance Routing in Networks of Workstations with Irregular Topology.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2000

Modeling and Simulation of Storage Area Networks.

[BibT_eX]

[DOI]

Proceedings of the MASCOTS 2000, Proceedings of the 8th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, 29 August, 2000

On the Effect of Link Failures in Fibre Channel Storage Area Networks.

[BibT_eX]

[DOI]

Proceedings of the 5th International Symposium on Parallel Architectures, 2000

Performance Sensitivity of Routing Algorithms to Failures in Networks of Worksations.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing, Third International Symposium, 2000

On the Influence of the Selection Function on the Performance of Networks of Workstations.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing, Third International Symposium, 2000

Performance analysis of storage area networks using high-speed LAN interconnects.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Networks 2000: Networking Trends and Challenges in the New Millennium, 2000

Modeling and Simulation of a Network of Workstations with Wormhole Switching.

[BibT_eX]

[DOI]

Xavier Molero

Vicente Santonja

Proceedings of the Proceedings 33th Annual Simulation Symposium (SS 2000), 2000

1999

A Comparison of Router Architectures for Virtual Cut-Through and Wormhole Switching in a NOWEnvironment.

[BibT_eX]

[DOI]

Proceedings of the 13th International Parallel Processing Symposium / 10th Symposium on Parallel and Distributed Processing (IPPS / SPDP '99), 1999

Is It Worth the Flexibility Provided by Irregular Topologies in Networks of Workstations?

[BibT_eX]

[DOI]

Proceedings of the Network-Based Parallel Computing: Communication, 1999

1998

Improving Performance of Networks of Workstations by using Disha Concurrent.

[BibT_eX]

[DOI]

Antonio Robles

Proceedings of the 1998 International Conference on Parallel Processing (ICPP '98), 1998

Impact of Adaptivity on the Behaviour of Networks of Workstations under Bursty Traffic.

[BibT_eX]

[DOI]

Proceedings of the 1998 International Conference on Parallel Processing (ICPP '98), 1998

Virtual channel multiplexing in networks of workstations with irregular topology.

[BibT_eX]

[DOI]

Anand Sivasubramaniam

Chita R. Das

Proceedings of the 5th International Conference On High Performance Computing, 1998

1997

On the Use of Virtual Channels in Networks of Workstations with Irregular Topology.

[BibT_eX]

[DOI]

Proceedings of the Parallel Computer Routing and Communication, 1997

Improving the efficiency of adaptive routing in networks with irregular topology.

[BibT_eX]

[DOI]

Proceedings of the Fourth International on High-Performance Computing, 1997

Efficient Adaptive Routing in Networks of Workstations with Irregular Topology.

[BibT_eX]

[DOI]

Proceedings of the Communication and Architectural Support for Network-Based Parallel Computing, 1997

1996

A High Performance Router Architecture for Interconnection Networks.

[BibT_eX]

[DOI]

Pedro López