Federico Silla

According to our database1, Federico Silla authored at least 102 papers between 1996 and 2018.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Other 

Links

On csauthors.net:

Bibliography

2018
Intra-Node Memory Safe GPU Co-Scheduling.
IEEE Trans. Parallel Distrib. Syst., 2018

Enhancing large-scale docking simulation on heterogeneous systems: An MPI vs rCUDA study.
Future Generation Comp. Syst., 2018

Heterogeneous and unconventional cluster architectures and applications.
Concurrency and Computation: Practice and Experience, 2018

Accelerator Virtualization in Fog Computing: Moving from the Cloud to the Edge.
IEEE Cloud Computing, 2018

Leveraging rCUDA for Enhancing Low-Power Deployments in the Physics Domain.
Proceedings of the 47th International Conference on Parallel Processing, 2018

Exploring the Use of Remote GPU Virtualization in Low-Power Systems for Bioinformatics Applications.
Proceedings of the 47th International Conference on Parallel Processing, 2018

Made-to-Measure GPUs on Virtual Machines with rCUDA.
Proceedings of the 47th International Conference on Parallel Processing, 2018

Increasing Molecular Dynamics Simulations Throughput by Virtualizing Remote GPUs with rCUDA.
Proceedings of the 47th International Conference on Parallel Processing, 2018

Improving the Efficiency of Future Exascale Systems with rCUDA.
Proceedings of the 4th IEEE International Workshop on High-Performance Interconnection Networks in the Exascale and Big-Data Era, 2018

2017
Multi-tenant virtual GPUs for optimising performance of a financial risk application.
J. Parallel Distrib. Comput., 2017

On the benefits of the remote GPU virtualization mechanism: The rCUDA case.
Concurrency and Computation: Practice and Experience, 2017

A Comparative Performance Analysis of Remote GPU Virtualization over Three Generations of GPUs.
Proceedings of the 46th International Conference on Parallel Processing Workshops, 2017

Turning GPUs into Floating Devices over the Cluster: The Beauty of GPU Migration.
Proceedings of the 46th International Conference on Parallel Processing Workshops, 2017

Enhancing the rCUDA Remote GPU Virtualization Framework: from a Prototype to a Production Solution.
Proceedings of the 17th IEEE/ACM International Symposium on Cluster, 2017

A Live Demo for Showing the Benefits of Applying the Remote GPU Virtualization Technique to Cloud Computing.
Proceedings of the 17th IEEE/ACM International Symposium on Cluster, 2017

2016
Tuning remote GPU virtualization for InfiniBand networks.
The Journal of Supercomputing, 2016

Heterogeneous cluster architectures and applications.
Concurrency and Computation: Practice and Experience, 2016

CUDA acceleration for Xen virtual machines in infiniband clusters with rCUDA.
Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2016

Reducing the performance gap of remote GPU virtualization with InfiniBand Connect-IB.
Proceedings of the IEEE Symposium on Computers and Communication, 2016

schedGPU: Fine-grain dynamic and adaptative scheduling for GPUs.
Proceedings of the International Conference on High Performance Computing & Simulation, 2016

Using Remote Accelerators to Improve the Performance of the FFTW Library.
Proceedings of the 18th IEEE International Conference on High Performance Computing and Communications; 14th IEEE International Conference on Smart City; 2nd IEEE International Conference on Data Science and Systems, 2016

Performance Evaluation of the NVIDIA Pascal GPU Architecture: Early Experiences.
Proceedings of the 18th IEEE International Conference on High Performance Computing and Communications; 14th IEEE International Conference on Smart City; 2nd IEEE International Conference on Data Science and Systems, 2016

Extending rCUDA with Support for P2P Memory Copies between Remote GPUs.
Proceedings of the 18th IEEE International Conference on High Performance Computing and Communications; 14th IEEE International Conference on Smart City; 2nd IEEE International Conference on Data Science and Systems, 2016

Remote GPU Virtualization: Is It Useful?
Proceedings of the 2nd IEEE International Workshop on High-Performance Interconnection Networks in the Exascale and Big-Data Era HiPINEB@HPCA 2016, 2016

Providing CUDA Acceleration to KVM Virtual Machines in InfiniBand Clusters with rCUDA.
Proceedings of the Distributed Applications and Interoperable Systems, 2016

Increasing the Performance of Data Centers by Combining Remote GPU Virtualization with Slurm.
Proceedings of the IEEE/ACM 16th International Symposium on Cluster, 2016

2015
On the design of a new dynamic credit-based end-to-end flow control mechanism for HPC clusters.
Parallel Computing, 2015

Improving the user experience of the rCUDA remote GPU virtualization framework.
Concurrency and Computation: Practice and Experience, 2015

Local and Remote GPUs Perform Similar with EDR 100G InfiniBand.
Proceedings of the Industrial Track of the 16th International Middleware Conference, 2015

A Live Demo on Remote GPU Accelerated Deep Learning Using the rCUDA Middleware.
Proceedings of the Posters and Demos Session of the 16th International Middleware Conference, 2015

Acceleration-as-a-Service: Exploiting Virtualised GPUs for a Financial Application.
Proceedings of the 11th IEEE International Conference on e-Science, 2015

On the Execution of Computationally Intensive CPU-Based Libraries on Remote Accelerators for Increasing Performance: Early Experience with the OpenBLAS and FFTW Libraries.
Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

InfiniBand Verbs Optimizations for Remote GPU Virtualization.
Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

A Performance Comparison of CUDA Remote GPU Virtualization Frameworks.
Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

On the Design of a Demo for Exhibiting rCUDA.
Proceedings of the 15th IEEE/ACM International Symposium on Cluster, 2015

2014
A complete and efficient CUDA-sharing solution for HPC clusters.
Parallel Computing, 2014

Special issue on unconventional cluster architectures and applications.
Cluster Computing, 2014

SLURM Support for Remote GPU Virtualization: Implementation and Performance Study.
Proceedings of the 26th IEEE International Symposium on Computer Architecture and High Performance Computing, 2014

Boosting the performance of remote GPU virtualization using InfiniBand connect-IB and PCIe 3.0.
Proceedings of the 2014 IEEE International Conference on Cluster Computing, 2014

2013
Silicon-aware distributed switch architecture for on-chip networks.
Journal of Systems Architecture - Embedded Systems Design, 2013

Influence of InfiniBand FDR on the performance of remote GPU virtualization.
Proceedings of the 2013 IEEE International Conference on Cluster Computing, 2013

2012
On the Impact of Within-Die Process Variation in GALS-Based NoC Performance.
IEEE Trans. on CAD of Integrated Circuits and Systems, 2012

A new degree of freedom for memory allocation in clusters.
Cluster Computing, 2012

Enabling High-Performance Crossbars through a Floorplan-Aware Design.
Proceedings of the 41st International Conference on Parallel Processing, 2012

CU2rCU: Towards the complete rCUDA remote GPU virtualization and sharing solution.
Proceedings of the 19th International Conference on High Performance Computing, 2012

Addressing Link Degradation in NoC-Based ULSI Designs.
Proceedings of the Euro-Par 2012: Parallel Processing Workshops, 2012

A New End-to-End Flow-Control Mechanism for High Performance Computing Clusters.
Proceedings of the 2012 IEEE International Conference on Cluster Computing, 2012

2011
HyperTransport.
Proceedings of the Encyclopedia of Parallel Computing, 2011

Cost-Efficient On-Chip Routing Implementations for CMP and MPSoC Systems.
IEEE Trans. on CAD of Integrated Circuits and Systems, 2011

A low-latency modular switch for CMP systems.
Microprocessors and Microsystems - Embedded Hardware Design, 2011

Characterizing the impact of process variation on 45 nm NoC-based CMPs.
J. Parallel Distrib. Comput., 2011

Self-Calibrating Source Synchronous Communication for Delay Variation Tolerant GALS Network-on-Chip Design.
IJERTCS, 2011

Fault-Tolerant Vertical Link Design for Effective 3D Stacking.
Computer Architecture Letters, 2011

A Distributed Switch Architecture for On-Chip Networks.
Proceedings of the International Conference on Parallel Processing, 2011

Energy and Performance Efficient Thread Mapping in NoC-Based CMPs under Process Variations.
Proceedings of the International Conference on Parallel Processing, 2011

Performance of CUDA Virtualized Remote GPUs in High Performance Clusters.
Proceedings of the International Conference on Parallel Processing, 2011

MEMSCALETM: A Scalable Environment for Databases.
Proceedings of the 13th IEEE International Conference on High Performance Computing & Communication, 2011

Unleash Your Memory-Constrained Applications: A 32-Node Non-coherent Distributed-Memory Prototype Cluster.
Proceedings of the 13th IEEE International Conference on High Performance Computing & Communication, 2011

Highly scalable barriers for future high-performance computing clusters.
Proceedings of the 18th International Conference on High Performance Computing, 2011

Enabling CUDA acceleration within virtual machines using rCUDA.
Proceedings of the 18th International Conference on High Performance Computing, 2011

MEMSCALE: in-cluster-memory databases.
Proceedings of the 20th ACM Conference on Information and Knowledge Management, 2011

2010
Explicit Communication and Synchronization in SARC.
IEEE Micro, 2010

Addressing Manufacturing Challenges with Cost-Efficient Fault Tolerant Routing.
Proceedings of the NOCS 2010, 2010

Improving the Performance of GALS-Based NoCs in the Presence of Process Variation.
Proceedings of the NOCS 2010, 2010

rCUDA: Reducing the number of GPU-based accelerators in high performance clusters.
Proceedings of the 2010 International Conference on High Performance Computing & Simulation, 2010

A practical way to extend shared memory support beyond a motherboard at low cost.
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, 2010

VCTlite: Towards an efficient implementation of virtual cut-through switching in on-chip networks.
Proceedings of the 2010 International Conference on High Performance Computing, 2010

A Latency-Efficient Router Architecture for CMP Systems.
Proceedings of the 13th Euromicro Conference on Digital System Design, 2010

A methodology for the characterization of process variation in NoC links.
Proceedings of the Design, Automation and Test in Europe, 2010

Getting Rid of Coherency Overhead for Memory-Hungry Applications.
Proceedings of the 2010 IEEE International Conference on Cluster Computing, 2010

2009
A new mechanism to deal with process variability in NoC links.
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

An Efficient Implementation of GPU Virtualization in High Performance Clusters.
Proceedings of the Euro-Par 2009, 2009

2008
Network Reconfiguration Suitability for Scientific Applications.
Proceedings of the 2008 International Conference on Parallel Processing, 2008

2004
On the development of a communication-aware task mapping technique.
Journal of Systems Architecture, 2004

2003
LSOM: A Link State Protocol Over Mac Addresses for Metropolitan Backbones Using Optical Ethernet Switches.
Proceedings of the 2nd IEEE International Symposium on Network Computing and Applications (NCA 2003), 2003

2002
A Clustering Method for Modeling the Communication Requirements of Message-Passing Applications.
Computers and Artificial Intelligence, 2002

A comparative study of arbitration algorithms for the Alpha 21364 pipelined router.
Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-X), 2002

2001
A Comparison of Router Architectures for Virtual Cut-Through and Wormhole Switching in a NOW Environment.
J. Parallel Distrib. Comput., 2001

Towards a Communication-Aware Task Scheduling Strategy for Heterogeneous Systems.
Computers and Artificial Intelligence, 2001

On the Impact of Message Packetization in Networks of Workstations with Irregular Topology.
Proceedings of the Ninth Euromicro Workshop on Parallel and Distributed Processing, 2001

On the Scalability of Topologies for Storage Area Networks in Building Environments.
Proceedings of the IEEE International Symposium on Network Computing and Applications (NCA 2001), 2001

On the Interconnection Topology for Storage Area Networks.
Proceedings of the 15th International Parallel & Distributed Processing Symposium (IPDPS-01), 2001

A New Task Mapping Technique for Communication-Aware Scheduling Strategies.
Proceedings of the 30th International Workshops on Parallel Processing (ICPP 2001 Workshops), 2001

On the Switch Architecture for Fibre Channel Storage Area Networks.
Proceedings of the Eigth International Conference on Parallel and Distributed Systems, 2001

Improving Network Performance by Efficiently Dealing with Short Control Messages in Fibre Channel SANs.
Proceedings of the Euro-Par 2001: Parallel Processing, 2001

A Tool for the Design and Evaluation of Fibre Channel Storage Area Networks.
Proceedings of the Proceedings 34th Annual Simulation Symposium (SS 2001), 2001

2000
High-Performance Routing in Networks of Workstations with Irregular Topology.
IEEE Trans. Parallel Distrib. Syst., 2000

Modeling and Simulation of Storage Area Networks.
Proceedings of the MASCOTS 2000, Proceedings of the 8th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, 29 August, 2000

On the Effect of Link Failures in Fibre Channel Storage Area Networks.
Proceedings of the 5th International Symposium on Parallel Architectures, 2000

Performance Sensitivity of Routing Algorithms to Failures in Networks of Worksations.
Proceedings of the High Performance Computing, Third International Symposium, 2000

On the Influence of the Selection Function on the Performance of Networks of Workstations.
Proceedings of the High Performance Computing, Third International Symposium, 2000

Performance analysis of storage area networks using high-speed LAN interconnects.
Proceedings of the IEEE International Conference on Networks 2000: Networking Trends and Challenges in the New Millennium, 2000

Modeling and Simulation of a Network of Workstations with Wormhole Switching.
Proceedings of the Proceedings 33th Annual Simulation Symposium (SS 2000), 2000

1999
A Comparison of Router Architectures for Virtual Cut-Through and Wormhole Switching in a NOWEnvironment.
Proceedings of the 13th International Parallel Processing Symposium / 10th Symposium on Parallel and Distributed Processing (IPPS / SPDP '99), 1999

Is It Worth the Flexibility Provided by Irregular Topologies in Networks of Workstations?
Proceedings of the Network-Based Parallel Computing: Communication, 1999

1998
Improving Performance of Networks of Workstations by using Disha Concurrent.
Proceedings of the 1998 International Conference on Parallel Processing (ICPP '98), 1998

Impact of Adaptivity on the Behaviour of Networks of Workstations under Bursty Traffic.
Proceedings of the 1998 International Conference on Parallel Processing (ICPP '98), 1998

Virtual channel multiplexing in networks of workstations with irregular topology.
Proceedings of the 5th International Conference On High Performance Computing, 1998

1997
On the Use of Virtual Channels in Networks of Workstations with Irregular Topology.
Proceedings of the Parallel Computer Routing and Communication, 1997

Improving the efficiency of adaptive routing in networks with irregular topology.
Proceedings of the Fourth International on High-Performance Computing, 1997

Efficient Adaptive Routing in Networks of Workstations with Irregular Topology.
Proceedings of the Communication and Architectural Support for Network-Based Parallel Computing, 1997

1996
A High Performance Router Architecture for Interconnection Networks.
Proceedings of the 1996 International Conference on Parallel Processing, 1996


  Loading...