José Duato

Orcid: 0000-0002-7785-0607

Affiliations:
  • Polytechnic University of Valencia, Spain


According to our database1, José Duato authored at least 453 papers between 1991 and 2023.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2023
Analyzing the impact of the MPI allreduce in distributed training of convolutional neural networks.
Computing, May, 2023

GreenLightningAI: An Efficient AI System with Decoupled Structural and Quantitative Knowledge.
CoRR, 2023

2021
UPR: deadlock-free dynamic network reconfiguration by exploiting channel dependency graph compatibility.
J. Supercomput., 2021

Enforcing Predictability of Many-Cores With DCFNoC.
IEEE Trans. Computers, 2021

DVL-Lossy: Isolating Congesting Flows to Optimize Packet Dropping in Lossy Data-Center Networks.
IEEE Micro, 2021

Accelerating distributed deep neural network training with pipelined MPI allreduce.
Clust. Comput., 2021

Evaluation of MPI Allreduce for Distributed Training of Convolutional Neural Networks.
Proceedings of the 29th Euromicro International Conference on Parallel, 2021

Performance Modeling for Distributed Training of Convolutional Neural Networks.
Proceedings of the 29th Euromicro International Conference on Parallel, 2021

2020
Path2SL: Leveraging InfiniBand Resources to Reduce Head-of-Line Blocking in Fat Trees.
IEEE Micro, 2020

HP-DCFNoC: High Performance Distributed Dynamic TDM Scheduler Based on DCFNoC Theory.
IEEE Access, 2020

Bundlefly: a low-diameter topology for multicore fiber.
Proceedings of the ICS '20: 2020 International Conference on Supercomputing, 2020

Optimizing Packet Dropping by Efficient Congesting-Flow Isolation in Lossy Data-Center Networks.
Proceedings of the IEEE Symposium on High-Performance Interconnects, 2020

2019
Combining Source-adaptive and Oblivious Routing with Congestion Control in High-performance Interconnects using Hybrid and Direct Topologies.
ACM Trans. Archit. Code Optim., 2019

Constructing virtual 5-dimensional tori out of lower-dimensional network cards.
Concurr. Comput. Pract. Exp., 2019

Efficient Dynamic Isolation of Congestion in Lossless DataCenter Networks.
Proceedings of the ACM SIGCOMM 2019 Workshop on Networking for Emerging Applications and Technologies, 2019

Analysis of model parallelism for distributed neural networks.
Proceedings of the 26th European MPI Users' Group Meeting, 2019

Modeling Traffic Workloads in Data-center Network Simulation Tools.
Proceedings of the 17th International Conference on High Performance Computing & Simulation, 2019

Path2SL: Optimizing Head-of-Line Blocking Reduction in InfiniBand-Based Fat-Tree Networks.
Proceedings of the 2019 IEEE Symposium on High-Performance Interconnects, 2019

DCFNoC: A Delayed Conflict-Free Time Division Multiplexing Network on Chip.
Proceedings of the 56th Annual Design Automation Conference 2019, 2019

Theoretical Scalability Analysis of Distributed Deep Convolutional Neural Networks.
Proceedings of the 19th IEEE/ACM International Symposium on Cluster, 2019

2018
Feasible enhancements to congestion control in InfiniBand-based networks.
J. Parallel Distributed Comput., 2018

Accurately modeling the on-chip and off-chip GPU memory subsystem.
Future Gener. Comput. Syst., 2018

2017
TLB-Based Temporality-Aware Classification in CMPs with Multilevel TLBs.
IEEE Trans. Parallel Distributed Syst., 2017

Perf&Fair: A Progress-Aware Scheduler to Enhance Performance and Fairness in SMT Multicores.
IEEE Trans. Computers, 2017

Speeding up the execution of numerical computations and simulations with rCUDA.
Proceedings of the 14th International Joint Conference on e-Business and Telecommunications (ICETE 2017), 2017

A Case Study on Implementing Virtual 5D Torus Networks Using Network Components of Lower Dimensionality.
Proceedings of the 3rd IEEE International Workshop on High-Performance Interconnection Networks in the Exascale and Big-Data Era, 2017

Enhancing the rCUDA Remote GPU Virtualization Framework: from a Prototype to a Production Solution.
Proceedings of the 17th IEEE/ACM International Symposium on Cluster, 2017

2016
A Family of Fault-Tolerant Efficient Indirect Topologies.
IEEE Trans. Parallel Distributed Syst., 2016

Efficient TLB-Based Detection of Private Pages in Chip Multiprocessors.
IEEE Trans. Parallel Distributed Syst., 2016

The k-ary n-direct s-indirect family of topologies for large-scale interconnection networks.
J. Supercomput., 2016

Bandwidth-Aware On-Line Scheduling in SMT Multicores.
IEEE Trans. Computers, 2016

Adaptive Routing for N-Dimensional Twin Torus.
IEEE Trans. Computers, 2016

A dynamic execution time estimation model to save energy in heterogeneous multicores running periodic tasks.
Future Gener. Comput. Syst., 2016

Impact of Memory-Level Parallelism on the Performance of GPU Coherence Protocols.
Proceedings of the 24th Euromicro International Conference on Parallel, 2016

TokenTLB: A Token-Based Page Classification Approach.
Proceedings of the 2016 International Conference on Supercomputing, 2016

2015
Efficient and Cost-Effective Hybrid Congestion Control for HPC Interconnection Networks.
IEEE Trans. Parallel Distributed Syst., 2015

Optimizing the configuration of combined high-radix switches.
J. Supercomput., 2015

A HoL-blocking aware mechanism for selecting the upward path in fat-tree topologies.
J. Supercomput., 2015

Design of Hybrid Second-Level Caches.
IEEE Trans. Computers, 2015

N-Dimensional Twin Torus Topology.
IEEE Trans. Computers, 2015

On the design of a new dynamic credit-based end-to-end flow control mechanism for HPC clusters.
Parallel Comput., 2015

A reuse-based refresh policy for energy-aware eDRAM caches.
Microprocess. Microsystems, 2015

Improving the user experience of the rCUDA remote GPU virtualization framework.
Concurr. Comput. Pract. Exp., 2015

A parallel and sensitive software tool for methylation analysis on multicore platforms.
Bioinform., 2015

Addressing Fairness in SMT Multicores with a Progress-Aware Scheduler.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

Accurately modeling the GPU memory subsystem.
Proceedings of the 2015 International Conference on High Performance Computing & Simulation, 2015

2014
Cache-Hierarchy Contention-Aware Scheduling in CMPs.
IEEE Trans. Parallel Distributed Syst., 2014

Formalization and configuration methodology for high-radix combined switches.
J. Supercomput., 2014

Efficient Routing in Heterogeneous SoC Designs with Small Implementation Overhead.
IEEE Trans. Computers, 2014

Building 3D Torus Using Low-Profile Expansion Cards.
IEEE Trans. Computers, 2014

A complete and efficient CUDA-sharing solution for HPC clusters.
Parallel Comput., 2014

A new proposal to deal with congestion in InfiniBand-based fat-trees.
J. Parallel Distributed Comput., 2014

SLURM Support for Remote GPU Virtualization: Implementation and Performance Study.
Proceedings of the 26th IEEE International Symposium on Computer Architecture and High Performance Computing, 2014

FT-RUFT: A Performance and Fault-Tolerant Efficient Indirect Topology.
Proceedings of the 22nd Euromicro International Conference on Parallel, 2014

Achieving balanced buffer utilization with a proper co-design of flow control and routing algorithm.
Proceedings of the Eighth IEEE/ACM International Symposium on Networks-on-Chip, 2014

Optimal Configuration for N-Dimensional Twin Torus Networks.
Proceedings of the 2014 IEEE 13th International Symposium on Network Computing and Applications, 2014

Addressing bandwidth contention in SMT multicores through scheduling.
Proceedings of the 2014 International Conference on Supercomputing, 2014

Dynamic WCET Estimation for Real-Time Multicore Embedded Systems Supporting DVFS.
Proceedings of the 2014 IEEE International Conference on High Performance Computing and Communications, 2014

HoL-Blocking Avoidance Routing Algorithms in Direct Topologies.
Proceedings of the 2014 IEEE International Conference on High Performance Computing and Communications, 2014

Deadlock-free routing mechanism for 3D twin torus networks.
Proceedings of the 8th International Workshop on Interconnection Network Architecture, 2014

Combining HoL-blocking avoidance and differentiated services in high-speed interconnects.
Proceedings of the 21st International Conference on High Performance Computing, 2014

Boosting the performance of remote GPU virtualization using InfiniBand connect-IB and PCIe 3.0.
Proceedings of the 2014 IEEE International Conference on Cluster Computing, 2014

2013
An Effective and Feasible Congestion Management Technique for High-Performance MINs with Tag-Based Distributed Routing.
IEEE Trans. Parallel Distributed Syst., 2013

Hardware-Based Generation of Independent Subtraces of Instructions in Clustered Processors.
IEEE Trans. Computers, 2013

Increasing the Effectiveness of Directory Caches by Avoiding the Tracking of Noncoherent Memory Blocks.
IEEE Trans. Computers, 2013

Silicon-aware distributed switch architecture for on-chip networks.
J. Syst. Archit., 2013

Obtaining the optimal configuration of high-radix Combined switches.
J. Parallel Distributed Comput., 2013

Power-aware scheduling with effective task migration for real-time multicore embedded systems.
Concurr. Comput. Pract. Exp., 2013

Exploiting reuse information to reduce refresh energy in on-chip eDRAM caches.
Proceedings of the International Conference on Supercomputing, 2013

Temporal-Aware Mechanism to Detect Private Data in Chip Multiprocessors.
Proceedings of the 42nd International Conference on Parallel Processing, 2013

Deterministic Routing with HoL-Blocking-Awareness for Direct Topologies.
Proceedings of the International Conference on Computational Science, 2013

Using Huge Pages and Performance Counters to Determine the LLC Architecture.
Proceedings of the International Conference on Computational Science, 2013

BBQ: A Straightforward Queuing Scheme to Reduce HoL-Blocking in High-Performance Hybrid Networks.
Proceedings of the Euro-Par 2013 Parallel Processing, 2013

Combining RAM technologies for hard-error recovery in L1 data caches working at very-low power modes.
Proceedings of the Design, Automation and Test in Europe, 2013

Influence of InfiniBand FDR on the performance of remote GPU virtualization.
Proceedings of the 2013 IEEE International Conference on Cluster Computing, 2013

L1-bandwidth aware thread allocation in multicore SMT processors.
Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, 2013

2012
Impact on Performance and Energy of the Retention Time and Processor Frequency in L1 Macrocell-Based Data Caches.
IEEE Trans. Very Large Scale Integr. Syst., 2012

A Survey and Evaluation of Topology-Agnostic Deterministic Routing Algorithms.
IEEE Trans. Parallel Distributed Syst., 2012

A cost-effective heuristic to schedule local and remote memory in cluster computers.
J. Supercomput., 2012

On the Impact of Within-Die Process Variation in GALS-Based NoC Performance.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2012

Design, Performance, and Energy Consumption of eDRAM/SRAM Macrocells for L1 Data Caches.
IEEE Trans. Computers, 2012

Extending Magny-Cours Cache Coherence.
IEEE Trans. Computers, 2012

Progressive Congestion Management Based on Packet Marking and Validation Techniques.
IEEE Trans. Computers, 2012

Combining recency of information with selective random and a victim cache in last-level caches.
ACM Trans. Archit. Code Optim., 2012

Switch-based packing technique to reduce traffic and latency in token coherence.
J. Parallel Distributed Comput., 2012

A new degree of freedom for memory allocation in clusters.
Clust. Comput., 2012

A Topology-Independent Mapping Technique for Application-Specific Networks-on-Chip.
Comput. Informatics, 2012

Efficiently Handling Memory Accesses to Improve QoS in Multicore Systems under Real-Time Constraints.
Proceedings of the IEEE 24th International Symposium on Computer Architecture and High Performance Computing, 2012

Optimal Configuration of High-Radix Combined Switches.
Proceedings of the 20th Euromicro International Conference on Parallel, 2012

A New Family of Hybrid Topologies for Large-Scale Interconnection Networks.
Proceedings of the 11th IEEE International Symposium on Network Computing and Applications, 2012

Cache Miss Characterization in Hierarchical Large-Scale Cache-Coherent Systems.
Proceedings of the 10th IEEE International Symposium on Parallel and Distributed Processing with Applications, 2012

Understanding Cache Hierarchy Contention in CMPs to Improve Job Scheduling.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

Enabling High-Performance Crossbars through a Floorplan-Aware Design.
Proceedings of the 41st International Conference on Parallel Processing, 2012

Page-Based Memory Allocation Policies of Local and Remote Memory in Cluster Computers.
Proceedings of the 18th IEEE International Conference on Parallel and Distributed Systems, 2012

IODET: A HoL-blocking-aware Deterministic Routing Algorithm for Direct Topologies.
Proceedings of the 18th IEEE International Conference on Parallel and Distributed Systems, 2012

Analyzing the optimal ratio of SRAM banks in hybrid caches.
Proceedings of the 30th International IEEE Conference on Computer Design, 2012

CU2rCU: Towards the complete rCUDA remote GPU virtualization and sharing solution.
Proceedings of the 19th International Conference on High Performance Computing, 2012

Addressing Link Degradation in NoC-Based ULSI Designs.
Proceedings of the Euro-Par 2012: Parallel Processing Workshops, 2012

Towards an Efficient Fat-Tree like Topology.
Proceedings of the Euro-Par 2012 Parallel Processing - 18th International Conference, 2012

A New End-to-End Flow-Control Mechanism for High Performance Computing Clusters.
Proceedings of the 2012 IEEE International Conference on Cluster Computing, 2012

Exploiting SIMD Instructions in Current Processors to Improve Classical String Algorithms.
Proceedings of the Advances in Databases and Information Systems, 2012

PS-Dir: a scalable two-level directory cache.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

2011
Efficient and Scalable Starvation Prevention Mechanism for Token Coherence.
IEEE Trans. Parallel Distributed Syst., 2011

Cost-Efficient On-Chip Routing Implementations for CMP and MPSoC Systems.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2011

Dynamic Fault Tolerance in Fat Trees.
IEEE Trans. Computers, 2011

A low-latency modular switch for CMP systems.
Microprocess. Microsystems, 2011

Characterizing the impact of process variation on 45 nm NoC-based CMPs.
J. Parallel Distributed Comput., 2011

OBQA: Smart and cost-efficient queue scheme for Head-of-Line blocking elimination in fat-trees.
J. Parallel Distributed Comput., 2011

A Communication-Driven Routing Technique for Application-Specific NoCs.
Int. J. Parallel Program., 2011

How to reduce packet dropping in a bufferless NoC.
Concurr. Comput. Pract. Exp., 2011

Cost-effective queue schemes for reducing head-of-line blocking in fat-trees.
Concurr. Comput. Pract. Exp., 2011

A New Energy-Aware Dynamic Task Set Partitioning Algorithm for Soft and Hard Embedded Real-Time Systems.
Comput. J., 2011

Fault-Tolerant Vertical Link Design for Effective 3D Stacking.
IEEE Comput. Archit. Lett., 2011

MRU-Tour-based Replacement Algorithms for Last-Level Caches.
Proceedings of the 23rd International Symposium on Computer Architecture and High Performance Computing, 2011

Efficient routing implementation in complex systems-on-chip.
Proceedings of the NOCS 2011, 2011

Evaluation of an Alternative for Increasing Switch Radix.
Proceedings of The Tenth IEEE International Symposium on Networking Computing and Applications, 2011

Increasing the effectiveness of directory caches by deactivating coherence for private memory blocks.
Proceedings of the 38th International Symposium on Computer Architecture (ISCA 2011), 2011

A Distributed Switch Architecture for On-Chip Networks.
Proceedings of the International Conference on Parallel Processing, 2011

Energy and Performance Efficient Thread Mapping in NoC-Based CMPs under Process Variations.
Proceedings of the International Conference on Parallel Processing, 2011

Combining Congested-Flow Isolation and Injection Throttling in HPC Interconnection Networks.
Proceedings of the International Conference on Parallel Processing, 2011

Performance of CUDA Virtualized Remote GPUs in High Performance Clusters.
Proceedings of the International Conference on Parallel Processing, 2011

PC-Mesh: A Dynamic Parallel Concentrated Mesh.
Proceedings of the International Conference on Parallel Processing, 2011

A Cluster Computer Performance Predictor for Memory Scheduling.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2011

C-Switches: Increasing Switch Radix with Current Integration Scale.
Proceedings of the 13th IEEE International Conference on High Performance Computing & Communication, 2011

MEMSCALE<sup>TM</sup>: A Scalable Environment for Databases.
Proceedings of the 13th IEEE International Conference on High Performance Computing & Communication, 2011

Unleash Your Memory-Constrained Applications: A 32-Node Non-coherent Distributed-Memory Prototype Cluster.
Proceedings of the 13th IEEE International Conference on High Performance Computing & Communication, 2011

A power-efficient network on-chip topology.
Proceedings of the Fifth International Workshop on Interconnection Network Architecture, 2011

Highly scalable barriers for future high-performance computing clusters.
Proceedings of the 18th International Conference on High Performance Computing, 2011

Enabling CUDA acceleration within virtual machines using rCUDA.
Proceedings of the 18th International Conference on High Performance Computing, 2011

A Dynamic Power-Aware Partitioner with Task Migration for Multicore Embedded Systems.
Proceedings of the Euro-Par 2011 Parallel Processing - 17th International Conference, 2011

Towards an Efficient NoC Topology through Multiple Injection Ports.
Proceedings of the 14th Euromicro Conference on Digital System Design, 2011

MEMSCALE: in-cluster-memory databases.
Proceedings of the 20th ACM Conference on Information and Knowledge Management, 2011

Improving Last-Level Cache Performance by Exploiting the Concept of MRU-Tour.
Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011

2010
Dealing with Transient Faults in the Interconnection Network of CMPs at the Cache Coherence Level.
IEEE Trans. Parallel Distributed Syst., 2010

Buffer Management Strategies to Reduce HoL Blocking.
IEEE Trans. Parallel Distributed Syst., 2010

Power saving in regular interconnection networks.
Parallel Comput., 2010

Ensuring the performance and scalability of peer-to-peer distributed virtual environments.
Future Gener. Comput. Syst., 2010

Dynamic task set partitioning based on balancing resource requirements and utilization to reduce power consumption.
Proceedings of the 2010 ACM Symposium on Applied Computing (SAC), 2010

Balancing Task Resource Requirements in Embedded Multithreaded Multicore Processors to Reduce Power Consumption.
Proceedings of the 18th Euromicro Conference on Parallel, 2010

A Scalable and Early Congestion Management Mechanism for MINs.
Proceedings of the 18th Euromicro Conference on Parallel, 2010

Addressing Manufacturing Challenges with Cost-Efficient Fault Tolerant Routing.
Proceedings of the NOCS 2010, 2010

Improving the Performance of GALS-Based NoCs in the Presence of Process Variation.
Proceedings of the NOCS 2010, 2010

rCUDA: Reducing the number of GPU-based accelerators in high performance clusters.
Proceedings of the 2010 International Conference on High Performance Computing & Simulation, 2010

Cost-Effective Congestion Management for Interconnection Networks Using Distributed Deterministic Routing.
Proceedings of the 16th IEEE International Conference on Parallel and Distributed Systems, 2010

Extending a Multicore Multithread Simulator to Model Power-Aware Hard Real-Time Systems.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2010

A practical way to extend shared memory support beyond a motherboard at low cost.
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, 2010

A Scheduling Heuristic to Handle Local and Remote Memory in Cluster Computers.
Proceedings of the 12th IEEE International Conference on High Performance Computing and Communications, 2010

EMC<sup>2</sup>: Extending Magny-Cours coherence for large-scale servers.
Proceedings of the 2010 International Conference on High Performance Computing, 2010

VCTlite: Towards an efficient implementation of virtual cut-through switching in on-chip networks.
Proceedings of the 2010 International Conference on High Performance Computing, 2010

An Efficient Strategy for Reducing Head-of-Line Blocking in Fat-Trees.
Proceedings of the Euro-Par 2010 - Parallel Processing, 16th International Euro-Par Conference, Ischia, Italy, August 31, 2010

A Latency-Efficient Router Architecture for CMP Systems.
Proceedings of the 13th Euromicro Conference on Digital System Design, 2010

A methodology for the characterization of process variation in NoC links.
Proceedings of the Design, Automation and Test in Europe, 2010

Getting Rid of Coherency Overhead for Memory-Hungry Applications.
Proceedings of the 2010 IEEE International Conference on Cluster Computing, 2010

Exploiting subtrace-level parallelism in clustered processors.
Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, 2010

Scalable hardware support for conditional parallelization.
Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, 2010

2009
Region-Based Routing: A Mechanism to Support Efficient Routing Algorithms in NoCs.
IEEE Trans. Very Large Scale Integr. Syst., 2009

M-GRASP: A GRASP With Memory for Latency-Aware Partitioning Methods in DVE Systems.
IEEE Trans. Syst. Man Cybern. Part A, 2009

A Switch Architecture Guaranteeing QoS Provision and HOL Blocking Elimination.
IEEE Trans. Parallel Distributed Syst., 2009

Efficient and Scalable Hardware-Based Multicast in Fat-Tree Networks.
IEEE Trans. Parallel Distributed Syst., 2009

A Complexity-Effective Out-of-Order Retirement Microarchitecture.
IEEE Trans. Computers, 2009

A new strategy to manage the InfiniBand arbitration tables.
J. Parallel Distributed Comput., 2009

Efficient implementation of distributed routing algorithms for NoCs.
IET Comput. Digit. Tech., 2009

A performance evaluation of 2D-mesh, ring, and crossbar interconnects for chip multi-processors.
Proceedings of the Second International Workshop on Network on Chip Architectures, 2009

An hybrid eDRAM/SRAM macrocell to implement first-level data caches.
Proceedings of the 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009), 2009

Yield-oriented evaluation methodology of network-on-chip routing implementations.
Proceedings of the 2008 IEEE International Symposium on System-on-Chip, 2009

A new mechanism to deal with process variability in NoC links.
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

Dynamic task set partitioning based on balancing memory requirements to reduce power consumption.
Proceedings of the 23rd international conference on Supercomputing, 2009

Tutorial #1: Modern system interconnects.
Proceedings of the 2009 IEEE Hot Chips 21 Symposium (HCS), 2009

HyperTransport™ technology tutorial.
Proceedings of the 2009 IEEE Hot Chips 21 Symposium (HCS), 2009

An Efficient Implementation of GPU Virtualization in High Performance Clusters.
Proceedings of the Euro-Par 2009, 2009

Dependability Analysis of a Fault-Tolerant Network Reconfiguring Strategy.
Proceedings of the Euro-Par 2009 Parallel Processing, 2009

An Efficient Low-Complexity Alternative to the ROB for Out-of-Order Retirement of Instructions.
Proceedings of the 12th Euromicro Conference on Digital System Design, 2009

2008
Extending the TokenCMP Cache Coherence Protocol for Low Overhead Fault Tolerance in CMP Architectures.
IEEE Trans. Parallel Distributed Syst., 2008

Efficient Deadline-Based QoS Algorithms for High-Performance Networks.
IEEE Trans. Computers, 2008

An Efficient and Deadlock-Free Network Reconfiguration Protocol.
IEEE Trans. Computers, 2008

On the Potential of NoC Virtualization for Multicore Chips.
Scalable Comput. Pract. Exp., 2008

A proposal for managing ASI fabrics.
J. Syst. Archit., 2008

Beyond Fat-tree: Unidirectional Load--Balanced Multistage Interconnection Network.
IEEE Comput. Archit. Lett., 2008

Logic-Based Distributed Routing for NoCs.
IEEE Comput. Archit. Lett., 2008

High-radix crossbar switches enabled by proximity communication.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2008

Exploiting Wiring Resources on Interconnection Network: Increasing Path Diversity.
Proceedings of the 16th Euromicro International Conference on Parallel, 2008

Improving Token Coherence by Multicast Coherence Messages.
Proceedings of the 16th Euromicro International Conference on Parallel, 2008

Switch-Based Packing Technique for Improving Token Coherence Scalability.
Proceedings of the Ninth International Conference on Parallel and Distributed Computing, 2008

Exploring High-Dimensional Topologies for NoC Design Through an Integrated Analysis and Synthesis Framework.
Proceedings of the Second International Symposium on Networks-on-Chips, 2008

An Efficient Implementation of Distributed Routing Algorithms for NoCs.
Proceedings of the Second International Symposium on Networks-on-Chips, 2008

Efficient unicast and multicast support for CMPs.
Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-41 2008), 2008

The impact of out-of-order commit in coarse-grain, fine-grain and simultaneous multithreaded architectures.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Epoch-based reconfiguration: Fast, simple, and effective dynamic network reconfiguration.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

A simple power-aware scheduling for multicore systems when running real-time applications.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Network Reconfiguration Suitability for Scientific Applications.
Proceedings of the 2008 International Conference on Parallel Processing, 2008

On the Potentials of Segment-Based Routing for NoCs.
Proceedings of the 2008 International Conference on Parallel Processing, 2008

RUFT: Simplifying the Fat-Tree Topology.
Proceedings of the 14th International Conference on Parallel and Distributed Systems, 2008

An Efficient Switching Technique for NoCs with Reduced Buffer Requirements.
Proceedings of the 14th International Conference on Parallel and Distributed Systems, 2008

Fault-Tolerant Cache Coherence Protocols for CMPs: Evaluation and Trade-Offs.
Proceedings of the High Performance Computing, 2008

FBICM: Efficient Congestion Management for High-Performance Networks Using Distributed Deterministic Routing.
Proceedings of the High Performance Computing, 2008

A Communication-Aware Topological Mapping Technique for NoCs.
Proceedings of the Euro-Par 2008, 2008

Reducing Packet Dropping in a Bufferless NoC.
Proceedings of the Euro-Par 2008, 2008

On the Influence of the Packet Marking and Injection Control Schemes in Congestion Management for MINs.
Proceedings of the Euro-Par 2008, 2008

A fault-tolerant directory-based cache coherence protocol for CMP architectures.
Proceedings of the 38th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2008

CART: Communication-Aware Routing Technique for Application-Specific NoCs.
Proceedings of the 11th Euromicro Conference on Digital System Design: Architectures, 2008

2007
A Latency-Aware Partitioning Method for Distributed Virtual Environment Systems.
IEEE Trans. Parallel Distributed Syst., 2007

A New Cost-Effective Technique for QoS Support in Clusters.
IEEE Trans. Parallel Distributed Syst., 2007

Exploring IBA Design Space for Improved Performance.
IEEE Trans. Parallel Distributed Syst., 2007

Handling Topology Changes in InfiniBand.
IEEE Trans. Parallel Distributed Syst., 2007

A Formal Model to Manage the InfiniBand Arbitration Tables Providing QoS.
IEEE Trans. Computers, 2007

A genetic approach for adding QoS to distributed virtual environments.
Comput. Commun., 2007

On the Characterization of Peer-To-Peer Distributed Virtual Environments.
Proceedings of the IEEE Virtual Reality Conference, 2007

Boosting Ethernet Performance by Segment-Based Routing.
Proceedings of the 15th Euromicro International Conference on Parallel, 2007

Congestion Management in MINs through Marked and Validated Packets.
Proceedings of the 15th Euromicro International Conference on Parallel, 2007

An Effective Starvation Avoidance Mechanism to Enhance the Token Coherence Protocol.
Proceedings of the 15th Euromicro International Conference on Parallel, 2007

Region-Based Routing: An Efficient Routing Mechanism to Tackle Unreliable Hardware in Network on Chips.
Proceedings of the First International Symposium on Networks-on-Chips, 2007

An Efficient Fault-Tolerant Routing Methodology for Fat-Tree Interconnection Networks.
Proceedings of the Parallel and Distributed Processing and Applications, 2007

Efficient Switches with QoS Support for Clusters.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

Deadline-based QoS Algorithms for High-performance Networks.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

Deterministic versus Adaptive Routing in Fat-Trees.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

RECN-IQ: A Cost-Effective Input-Queued Switch Architecture with Congestion Management.
Proceedings of the 2007 International Conference on Parallel Processing (ICPP 2007), 2007

Providing Full QoS with 2 VCs in High-Speed Switches.
Proceedings of the Information Networking. Towards Ubiquitous Networking and Services, 2007

Power-Aware Fat-Tree Networks Using On/Off Links.
Proceedings of the High Performance Computing and Communications, 2007

A Low Overhead Fault Tolerant Coherence Protocol for CMP Architectures.
Proceedings of the 13st International Conference on High-Performance Computer Architecture (HPCA-13 2007), 2007

Integrated QoS Provision and Congestion Management for Interconnection Networks.
Proceedings of the Euro-Par 2007, 2007

VB-MT: Design Issues and Performance of the Validation Buffer Microarchitecture for Multithreaded Processors.
Proceedings of the 16th International Conference on Parallel Architectures and Compilation Techniques (PACT 2007), 2007

2006
A Routing Methodology for Achieving Fault Tolerance in Direct Networks.
IEEE Trans. Computers, 2006

An Efficient Fault-Tolerant Routing Strategy for Tori and Meshes.
Scalable Comput. Pract. Exp., 2006

Efficient, Scalable Congestion Management for Interconnection Networks.
IEEE Micro, 2006

FIR: An efficient routing strategy for tori and meshes.
J. Parallel Distributed Comput., 2006

MMR: A MultiMedia Router architecture to support hybrid workloads.
J. Parallel Distributed Comput., 2006

Full QoS Support with 2 VCs for Single-chip Switches.
Proceedings of the Fifth IEEE International Symposium on Network Computing and Applications, 2006

Segment-based routing: an efficient fault-tolerant routing algorithm for meshes and tori.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

Dynamic power saving in fat-tree interconnection networks using on/off links.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

A Scalable Synchronization Technique for Distributed Virtual Environments Based on Networked-Server Architectures.
Proceedings of the 2006 International Conference on Parallel Processing Workshops (ICPP Workshops 2006), 2006

Dynamic Fault Tolerance with Misrouting in Fat Trees.
Proceedings of the 2006 International Conference on Parallel Processing (ICPP 2006), 2006

RECN-DD: A Memory-Efficient Congestion Management Technique for Advanced Switching.
Proceedings of the 2006 International Conference on Parallel Processing (ICPP 2006), 2006

Destination-Based HoL Blocking Elimination.
Proceedings of the 12th International Conference on Parallel and Distributed Systems, 2006

Reachability-Based Fault-Tolerant Routing.
Proceedings of the 12th International Conference on Parallel and Distributed Systems, 2006

Scalable Low-Cost QoS Support for Single-chip Switches.
Proceedings of the 12th International Conference on Parallel and Distributed Systems, 2006

QoS Support for Video Transmission in High-Speed Interconnects.
Proceedings of the High Performance Computing and Communications, 2006

Towards a Cost-Effective Interconnection Network Architecture with QoS and Congestion Management Support.
Proceedings of the Euro-Par 2006, Parallel Processing, 12th International Euro-Par Conference, Dresden, Germany, August 28, 2006

On the Influence of the Selection Function on the Performance of Fat-Trees.
Proceedings of the Euro-Par 2006, Parallel Processing, 12th International Euro-Par Conference, Dresden, Germany, August 28, 2006

Providing Full Awareness to Distributed Virtual Environments Based on Peer-to-Peer Architectures.
Proceedings of the Advances in Computer Graphics, 2006

Towards an efficient switch architecture for high-radix switches.
Proceedings of the 2006 ACM/IEEE Symposium on Architecture for Networking and Communications Systems, 2006

2005
On-Chip Interconnects and Instruction Steering Schemes for Clustered Microarchitectures.
IEEE Trans. Parallel Distributed Syst., 2005

Improving the Performance of Distributed Virtual Environment Systems.
IEEE Trans. Parallel Distributed Syst., 2005

Part II: A Methodology for Developing Deadlock-Free Dynamic Network Reconfiguration Processes.
IEEE Trans. Parallel Distributed Syst., 2005

Part I: A Theory for Deadlock-Free Dynamic Network Reconfiguration.
IEEE Trans. Parallel Distributed Syst., 2005

Traffic Scheduling Solutions with QoS Support for an Input-Buffered MultiMedia Router.
IEEE Trans. Parallel Distributed Syst., 2005

A Family of Mechanisms for Congestion Control in Wormhole Networks.
IEEE Trans. Parallel Distributed Syst., 2005

A Two-Level Directory Architecture for Highly Scalable cc-NUMA Multiprocessors.
IEEE Trans. Parallel Distributed Syst., 2005

Enforcing in-order packet delivery in system area networks with adaptive routing.
J. Parallel Distributed Comput., 2005

A Method for Providing QoS in Distributed Virtual Environments.
Proceedings of the 13th Euromicro Workshop on Parallel, 2005

A Memory-Effective Fault-Tolerant Routing Strategy for Direct Interconnection Networks.
Proceedings of the 4th International Symposium on Parallel and Distributed Computing (ISPDC 2005), 2005

A Scalable Methodology for Computing Fault-Free Paths in InfiniBand Torus Networks.
Proceedings of the High-Performance Computing - 6th International Symposium, 2005

Studying the Influence of the InfiniBand Packet Size to Guarantee QoS.
Proceedings of the 10th IEEE Symposium on Computers and Communications (ISCC 2005), 2005

A Sexual Elitist Genetic Algorithm for Providing QoS in Distributed Virtual Environment Systems.
Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

In-Order Packet Delivery in Interconnection Networks using Adaptive Routing.
Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

A Memory-Effective Routing Strategy for Regular Interconnection Networks.
Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

Efficient Reduction of HOL Blocking in Multistage Networks.
Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

Power Saving in Regular Interconnection Networks Built with High-Degree Switches.
Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

A New Scalable and Cost-Effective Congestion Management Strategy for Lossless Multistage Interconnection Networks.
Proceedings of the 11th International Conference on High-Performance Computer Architecture (HPCA-11 2005), 2005

Congestion Control in InfiniBand Networks.
Proceedings of the 13th Annual IEEE Symposium on High Performance Interconnects (HOTIC 2005), 2005

Dynamic Evolution of Congestion Trees: Analysis and Impact on Switch Architecture.
Proceedings of the High Performance Embedded Architectures and Compilers, 2005

Providing Full QoS Support in Clusters Using Only Two VCs at the Switches.
Proceedings of the High Performance Computing, 2005

On the Correct Sizing on Meshes Through an Effective Congestion Management Strategy.
Proceedings of the Euro-Par 2005, Parallel Processing, 11th International Euro-Par Conference, Lisbon, Portugal, August 30, 2005

Cost / Performance Trade-Offs and Fairness Evaluation of Queue Mapping Policies.
Proceedings of the Euro-Par 2005, Parallel Processing, 11th International Euro-Par Conference, Lisbon, Portugal, August 30, 2005

2004
An Effective Methodology to Improve the Performance of the Up*/Down* Routing Algorithm.
IEEE Trans. Parallel Distributed Syst., 2004

QoS in InfiniBand Subnetworks.
IEEE Trans. Parallel Distributed Syst., 2004

An Architecture for High-Performance Scalable Shared-Memory Multiprocessors Exploiting On-Chip Integration.
IEEE Trans. Parallel Distributed Syst., 2004

Deadlock-free dynamic reconfiguration over InfiniBand<sup>TM</sup> NETWORKS.
Parallel Algorithms Appl., 2004

On the development of a communication-aware task mapping technique.
J. Syst. Archit., 2004

An Efficient Fault-Tolerant Routing Methodology for Meshes and Tori.
IEEE Comput. Archit. Lett., 2004

A Cost-Effective Technique to Reduce HOL Blocking in Single-Stage and Multistage Switch Fabrics.
Proceedings of the 12th Euromicro Workshop on Parallel, 2004

An analysis of deadlock risk during centralized network mapping.
Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Networks, 2004

A Fully Adaptive Fault-Tolerant Routing Methodology Based on Intermediate Nodes.
Proceedings of the Network and Parallel Computing, IFIP International Conference, 2004

A Transition-Based Fault-Tolerant Routing Methodology for InfiniBand Networks.
Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), 2004

Use of Provisional Routes to Speed-up Change Assimilation in InfiniBand Networks.
Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), 2004

An Effective Fault-Tolerant Routing Methodology for Direct Networks.
Proceedings of the 33rd International Conference on Parallel Processing (ICPP 2004), 2004

LASH-TOR: A Generic Transition-Oriented Routing Algorithm.
Proceedings of the 10th International Conference on Parallel and Distributed Systems, 2004

A Comparison Study of Metaheuristic Techniques for Providing QoS to Avatars in DVE Systems.
Proceedings of the Computational Science and Its Applications, 2004

Simple Deadlock-Free Dynamic Network Reconfiguration.
Proceedings of the High Performance Computing, 2004

A New Adaptive Fault-Tolerant Routing Methodology for Direct Networks.
Proceedings of the High Performance Computing, 2004

Topic 14: Routing and Communication in Interconnection Networks.
Proceedings of the Euro-Par 2004 Parallel Processing, 2004

2003
Deadlock-Free Dynamic Reconfiguration Schemes for Increased Network Dependability.
IEEE Trans. Parallel Distributed Syst., 2003

FC3D: Flow Control-Based Distributed Deadlock Detection Mechanism for True Fully Adaptive Routing in Wormhole Networks.
IEEE Trans. Parallel Distributed Syst., 2003

Applying In-Transit Buffers to Boost the Performance of Networks with Source Routing.
IEEE Trans. Computers, 2003

Supporting adaptive routing in IBA switches.
J. Syst. Archit., 2003

Scalable Hardware-Based Multicast Trees.
Proceedings of the ACM/IEEE SC2003 Conference on High Performance Networking and Computing, 2003

A Method for Applying Double Scheme Dynamic Reconfiguration over InfiniBand<sup>TM</sup>.
Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 2003

A Hardware Approach to QoS Support in Cluster Environments: The Multimedia Router MMR.
Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 2003

Supporting Adaptive Routing in InfiniBand Networks.
Proceedings of the 11th Euromicro Workshop on Parallel, 2003

LSOM: A Link State Protocol Over Mac Addresses for Metropolitan Backbones Using Optical Ethernet Switches.
Proceedings of the 2nd IEEE International Symposium on Network Computing and Applications (NCA 2003), 2003

Performance Evaluation of COWs under Real Parallel Application.
Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), 2003

Supporting Fully Adaptive Routing in InfiniBand Networks.
Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), 2003

VOQSW: A Methodology to Reduce HOL Blocking in InfiniBand Networks.
Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), 2003

A Solution for Handling Hybrid Traffic in Clustered Environments: The MultiMedia Router MMR.
Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), 2003

Routing in InfiniBandTM Torus Network Topologie.
Proceedings of the 32nd International Conference on Parallel Processing (ICPP 2003), 2003

A Methodology for Developing Dynamic Network Reconfiguration Processes.
Proceedings of the 32nd International Conference on Parallel Processing (ICPP 2003), 2003

Evaluation of a Subnet Management Mechanism for InfiniBand Networks.
Proceedings of the 32nd International Conference on Parallel Processing (ICPP 2003), 2003

A New Proposal to Fill in the InfiniBand Arbitration Tables.
Proceedings of the 32nd International Conference on Parallel Processing (ICPP 2003), 2003

Performance Enhancement Techniques for InfiniBand? Architecture.
Proceedings of the Ninth International Symposium on High-Performance Computer Architecture (HPCA'03), 2003

On the Characterization of Distributed Virtual Environment Systems.
Proceedings of the Euro-Par 2003. Parallel Processing, 2003

Topic Introduction.
Proceedings of the Euro-Par 2003. Parallel Processing, 2003

On the InfiniBand Subnet Discovery Process.
Proceedings of the 2003 IEEE International Conference on Cluster Computing (CLUSTER 2003), 2003

2002
Boosting the Performance of Myrinet Networks.
IEEE Trans. Parallel Distributed Syst., 2002

A Clustering Method for Modeling the Communication Requirements of Message-Passing Applications.
Comput. Artif. Intell., 2002

Owner prediction for accelerating cache-to-cache transfer misses in a cc-NUMA architecture.
Proceedings of the 2002 ACM/IEEE conference on Supercomputing, 2002

Performance Sensitivity of Routing Algorithms to Failures in Networks of Workstations with Regular and Irregular Topologies.
Proceedings of the 10th Euromicro Workshop on Parallel, 2002

Removing the Latency Overhead of the ITB Mechanism in COWs with Source Routing.
Proceedings of the 10th Euromicro Workshop on Parallel, 2002

Increasing the Adaptivity of Routing Algorithms for k-ary n-cubes.
Proceedings of the 10th Euromicro Workshop on Parallel, 2002

Reducing the Latency of L2 Misses in Shared-Memory Multiprocessors through On-Chip Directory Integration.
Proceedings of the 10th Euromicro Workshop on Parallel, 2002

Improving the Performance of Real-Time Communication Services on High-Speed LANs under Topology Changes.
Proceedings of the 27th Annual IEEE Conference on Local Computer Networks (LCN 2002), 2002

Improving InfiniBand Routing through Multiple Virtual Networks.
Proceedings of the High Performance Computing, 4th International Symposium, 2002

Avoiding Network Congestion with Local Information.
Proceedings of the High Performance Computing, 4th International Symposium, 2002

Analyzing the Influence of Virtual Lanes on the Performance of InfiniBand Networks.
Proceedings of the 16th International Parallel and Distributed Processing Symposium (IPDPS 2002), 2002

Workshop Introduction.
Proceedings of the 16th International Parallel and Distributed Processing Symposium (IPDPS 2002), 2002

Investigating Switch Scheduling Algorithms to Support QoS in the Multimedia Router.
Proceedings of the 16th International Parallel and Distributed Processing Symposium (IPDPS 2002), 2002

A Strategy to Compute the InfiniBand Arbitration Tables.
Proceedings of the 16th International Parallel and Distributed Processing Symposium (IPDPS 2002), 2002

A Strategy to Manage Time Sensitive Traffic in InfiniBand.
Proceedings of the 16th International Parallel and Distributed Processing Symposium (IPDPS 2002), 2002

A Novel Approach to Reduce L2 Miss Latency in Shared-Memory Multiprocessors.
Proceedings of the 16th International Parallel and Distributed Processing Symposium (IPDPS 2002), 2002

Effective Methodology for Deadlock-Free Minimal Routing in InfiniBand Networks.
Proceedings of the 31st International Conference on Parallel Processing (ICPP 2002), 2002

A multimedia router architecture to provide high performance and QoS guarantees to mixed traffic.
Proceedings of the 2002 IEEE International Conference on Multimedia and Expo, 2002

Algorithms for Switch-Scheduling in the Multimedia Router for LANs.
Proceedings of the High Performance Computing, 2002

Evaluation of Routing Algorithms for InfiniBand Networks (Research Note).
Proceedings of the Euro-Par 2002, 2002

Congestion Control Based on Transmission Times.
Proceedings of the Euro-Par 2002, 2002

Memory Conscious 3D Wavelet Transform.
Proceedings of the 28th EUROMICRO Conference 2002, 4-6 September 2002, Dortmund, Germany, 2002

Integrated Admission and Congestion Control for QoS Support in Clusters.
Proceedings of the 2002 IEEE International Conference on Cluster Computing (CLUSTER 2002), 2002

Efficient Interconnects for Clustered Microarchitectures.
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques (PACT 2002), 2002

The Use of Prediction for Accelerating Upgrade Misses in cc-NUMA Multiprocessors.
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques (PACT 2002), 2002

A new switch scheduling algorithm to improve QoS in the multimedia router.
Proceedings of the IEEE 5th Workshop on Multimedia Signal Processing, 2002

2001
A Cost-Effective Approach to Deadlock Handling in Wormhole Networks.
IEEE Trans. Parallel Distributed Syst., 2001

A General Theory for Deadlock-Free Adaptive Routing Using a Mixed Set of Resources.
IEEE Trans. Parallel Distributed Syst., 2001

A Protocol for Deadlock-Free Dynamic Reconfiguration in High-Speed Local Area Networks.
IEEE Trans. Parallel Distributed Syst., 2001

A Comparison of Router Architectures for Virtual Cut-Through and Wormhole Switching in a NOW Environment.
J. Parallel Distributed Comput., 2001

Towards a Communication-Aware Task Scheduling Strategy for Heterogeneous Systems.
Comput. Artif. Intell., 2001

On the Relative Behavior of Source and Distributed Routing in NOWs Using Up/Down Routing Schemes.
Proceedings of the Ninth Euromicro Workshop on Parallel and Distributed Processing, 2001

On the Impact of Message Packetization in Networks of Workstations with Irregular Topology.
Proceedings of the Ninth Euromicro Workshop on Parallel and Distributed Processing, 2001

A Congestion Control Mechanism for Wormhole Networks.
Proceedings of the Ninth Euromicro Workshop on Parallel and Distributed Processing, 2001

On the Scalability of Topologies for Storage Area Networks in Building Environments.
Proceedings of the IEEE International Symposium on Network Computing and Applications (NCA 2001), 2001

On the Design of High-Speed Switch Fabrics.
Proceedings of the IEEE International Symposium on Network Computing and Applications (NCA 2001), 2001

Influence of Network Size and Load on the Performance of Reconfiguration Protocols.
Proceedings of the IEEE International Symposium on Network Computing and Applications (NCA 2001), 2001

On the Interconnection Topology for Storage Area Networks.
Proceedings of the 15th International Parallel & Distributed Processing Symposium (IPDPS-01), 2001

Improving Network Performance by Reducing Network Contention in Source-Based COWs with a Low Path-Computation Overhead.
Proceedings of the 15th International Parallel & Distributed Processing Symposium (IPDPS-01), 2001

A New Approach to Provide Real-Time Services on High-Speed Local Area Networks.
Proceedings of the 15th International Parallel & Distributed Processing Symposium (IPDPS-01), 2001

A First Implementation of In-Transit Buffers on Myrinet GM Software.
Proceedings of the 15th International Parallel & Distributed Processing Symposium (IPDPS-01), 2001

Tuning Buffer Size in the Multimedia Router (MMR).
Proceedings of the 15th International Parallel & Distributed Processing Symposium (IPDPS-01), 2001

A New Task Mapping Technique for Communication-Aware Scheduling Strategies.
Proceedings of the 30th International Workshops on Parallel Processing (ICPP 2001 Workshops), 2001

Effective Strategy to Compute Forwarding Tables for InfiniBand Networks.
Proceedings of the 2001 International Conference on Parallel Processing, 2001

Deadlock-Free Routing in InfiniBand through Destination Renaming.
Proceedings of the 2001 International Conference on Parallel Processing, 2001

Accurate Availability Model for Direct Interconnection Networks.
Proceedings of the 2001 International Conference on Parallel Processing, 2001

On the Switch Architecture for Fibre Channel Storage Area Networks.
Proceedings of the Eigth International Conference on Parallel and Distributed Systems, 2001

A Cost-Effective Hardware Link Scheduling Algorithm for the Multimedia Router (MMR).
Proceedings of the Networking, 2001

A New Scalable Directory Architecture for Large-Scale Multiprocessors.
Proceedings of the Seventh International Symposium on High-Performance Computer Architecture (HPCA'01), 2001

Performance Evaluation of Real-Time Communication Services on High-Speed LANs under Topology Changes.
Proceedings of the High Performance Computing - HiPC 2001, 8th International Conference, 2001

Improving Network Performance by Efficiently Dealing with Short Control Messages in Fibre Channel SANs.
Proceedings of the Euro-Par 2001: Parallel Processing, 2001

Improving the Accuracy of Reliability Models for Direct Interconnection Networks.
Proceedings of the Euro-Par 2001: Parallel Processing, 2001

Efficient 3d Wavelet Transform Decomposition For Video Compression.
Proceedings of the 2nd International Workshop on Digital and Computational Video (DCV 2001), 2001

A Tool for the Design and Evaluation of Fibre Channel Storage Area Networks.
Proceedings of the Proceedings 34th Annual Simulation Symposium (SS 2001), 2001

2000
Software-Based Rerouting for Fault-Tolerant Pipelined Communication.
IEEE Trans. Parallel Distributed Syst., 2000

On the Use of Virtual Channels in Networks of Workstations with Irregular Topology.
IEEE Trans. Parallel Distributed Syst., 2000

High-Performance Routing in Networks of Workstations with Irregular Topology.
IEEE Trans. Parallel Distributed Syst., 2000

An efficient implementation of tree-based multicast routing for distributed shared-memory multiprocessors.
J. Syst. Archit., 2000

Modeling and Simulation of Storage Area Networks.
Proceedings of the MASCOTS 2000, Proceedings of the 8th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, 29 August, 2000

On the Effect of Link Failures in Fibre Channel Storage Area Networks.
Proceedings of the 5th International Symposium on Parallel Architectures, 2000

An Accurate Analysis of Reliability Parameters in Meshes with Fault-Tolerant Adaptive Routing.
Proceedings of the 5th International Symposium on Parallel Architectures, 2000

A Flexible Routing Scheme for Networks of Workstations.
Proceedings of the High Performance Computing, Third International Symposium, 2000

Performance Sensitivity of Routing Algorithms to Failures in Networks of Worksations.
Proceedings of the High Performance Computing, Third International Symposium, 2000

On the Influence of the Selection Function on the Performance of Networks of Workstations.
Proceedings of the High Performance Computing, Third International Symposium, 2000

Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing.
Proceedings of the High Performance Computing, Third International Symposium, 2000

Switch Scheduling in the Multimedia Router (MMR).
Proceedings of the 14th International Parallel & Distributed Processing Symposium (IPDPS'00), 2000

Improving Routing Performance in Myrinet Networks.
Proceedings of the 14th International Parallel & Distributed Processing Symposium (IPDPS'00), 2000

A Simple and Efficient Mechanism to Prevent Saturation in Wormhole Networks.
Proceedings of the 14th International Parallel & Distributed Processing Symposium (IPDPS'00), 2000

Performance evaluation of a new routing strategy for irregular networks with source routing.
Proceedings of the 14th international conference on Supercomputing, 2000

Characterization and Enhancement of Dynamic Mapping Heuristics for Heterogeneous Systems.
Proceedings of the 2000 International Workshop on Parallel Processing, 2000

The Double Scheme: Deadlock-Free Dynamic Reconfiguration of Cut-Through Networks.
Proceedings of the 2000 International Conference on Parallel Processing, 2000

On the Design of Communication-Aware Task Scheduling Strategies for Heterogeneous Systems.
Proceedings of the 2000 International Conference on Parallel Processing, 2000

Fast Dynamic Reconfiguration in Irregular Networks.
Proceedings of the 2000 International Conference on Parallel Processing, 2000

Improving the Performance of Regular Networks with Source Routing.
Proceedings of the 2000 International Conference on Parallel Processing, 2000

Performance analysis of storage area networks using high-speed LAN interconnects.
Proceedings of the IEEE International Conference on Networks 2000: Networking Trends and Challenges in the New Millennium, 2000

Performance Evaluation of Dynamic Reconfiguration in High-Speed Local Area Networks.
Proceedings of the Sixth International Symposium on High-Performance Computer Architecture, 2000

Characterization and enhancement of Static Mapping Heuristics for Heterogeneous Systems.
Proceedings of the High Performance Computing, 2000

Routing and Communication in Interconnection Networks.
Proceedings of the Euro-Par 2000, Parallel Processing, 6th International Euro-Par Conference, Munich, Germany, August 29, 2000

Characterization of Communications between Processes in Message-Passing Applications.
Proceedings of the 2000 IEEE International Conference on Cluster Computing (CLUSTER 2000), November 28th, 2000

A New Methodology to Computer Deadlock-Free Routing Tables for Irregular Networks.
Proceedings of the Network-Based Parallel Computing: Communication, 2000

Broadcast/Multicast over Myrinet Using NIC-Assisted Multidestination Messages.
Proceedings of the Network-Based Parallel Computing: Communication, 2000

Extending Dynamic Reconfiguration to NOWs with Adaptive Routing.
Proceedings of the Network-Based Parallel Computing: Communication, 2000

On the Performance of Up*/Down* Routing.
Proceedings of the Network-Based Parallel Computing: Communication, 2000

1999
Dynamically Configurable Message Flow Control for Fault-Tolerant Routing.
IEEE Trans. Parallel Distributed Syst., 1999

Optimizing network throughput: optimal versus robust design.
Proceedings of the Seventh Euromicro Workshop on Parallel and Distributed Processing. PDP'99, 1999

A Comparison of Router Architectures for Virtual Cut-Through and Wormhole Switching in a NOWEnvironment.
Proceedings of the 13th International Parallel Processing Symposium / 10th Symposium on Parallel and Distributed Processing (IPPS / SPDP '99), 1999

Improving the performance of bristled CC-NUMA systems using virtual channels and adaptivity.
Proceedings of the 13th international conference on Supercomputing, 1999

Adaptive Bubble Router: A Design to Improve Performance in Torus Networks.
Proceedings of the International Conference on Parallel Processing 1999, 1999

Performance Evaluation of Networks of Workstations with Hardware Shared Memory Model Using Execution-Driven Simulation.
Proceedings of the International Conference on Parallel Processing 1999, 1999

Impact of Buffer Size on the Efficiency of Deadlock Detection.
Proceedings of the Fifth International Symposium on High-Performance Computer Architecture, 1999

MMR: A High-Performance Multimedia Router - Architecture and Design Trade-Offs.
Proceedings of the Fifth International Symposium on High-Performance Computer Architecture, 1999

Is It Worth the Flexibility Provided by Irregular Topologies in Networks of Workstations?
Proceedings of the Network-Based Parallel Computing: Communication, 1999

Deadlock-Free Routing in Irregular Networks with Dynamic Reconfiguration.
Proceedings of the Network-Based Parallel Computing: Communication, 1999

Performance Evaluation of the Multimedia Router with MPEG-2 Video Traffic.
Proceedings of the Network-Based Parallel Computing: Communication, 1999

1998
A cost-effective methodology for the evaluation of interconnection networks.
J. Syst. Archit., 1998

Suboptimal-Optimal Routing for LAN Internetworking Using Transparent Bridges.
Int. J. Found. Comput. Sci., 1998

A lab course on computer architecture.
Proceedings of the 1998 workshop on Computer architecture education, 1998

Using channel pipelining in reconfigurable interconnection networks.
Proceedings of the Sixth Euromicro Workshop on Parallel and Distributed Processing, 1998

Deadlock avoidance and adaptive routing in interconnection networks.
Proceedings of the Sixth Euromicro Workshop on Parallel and Distributed Processing, 1998

On the Design of Network Routers for Multimedia Applications.
Proceedings of the 1998 International Conference on Parallel Processing Workshops, 1998

Improving Performance of Networks of Workstations by using Disha Concurrent.
Proceedings of the 1998 International Conference on Parallel Processing (ICPP '98), 1998

Impact of Adaptivity on the Behaviour of Networks of Workstations under Bursty Traffic.
Proceedings of the 1998 International Conference on Parallel Processing (ICPP '98), 1998

DRIL: Dynamically Reduced Message Injection Limitation Mechanism for Wormhole Networks.
Proceedings of the 1998 International Conference on Parallel Processing (ICPP '98), 1998

A New Transparent Bridge Protocol for LAN Internetworking using Topologies with Active Loops.
Proceedings of the 1998 International Conference on Parallel Processing (ICPP '98), 1998

Convergence Points on Commercial Parallel Systems: Do We Have the Node Architecture? Do We Have the Network? Do We Have the Programming Paradigm?
Proceedings of the 1998 International Conference on Parallel Processing (ICPP '98), 1998

A Very Efficient Distributed Deadlock Detection Mechanism for Wormhole Networks.
Proceedings of the Fourth International Symposium on High-Performance Computer Architecture, Las Vegas, Nevada, USA, January 31, 1998

Virtual channel multiplexing in networks of workstations with irregular topology.
Proceedings of the 5th International Conference On High Performance Computing, 1998

Edinet: An Execution Driven Interconnection Network Simulator for DSM Systems.
Proceedings of the Computer Performance Evaluation: Modelling Techniques and Tools, 1998

A Tool for the Analysis of Reconfiguration and Routing Algorithms in Irregular Networks.
Proceedings of the Network-Based Parallel Computing: Communication, 1998

1997
A Theory of Fault-Tolerant Routing in Wormhole Networks.
IEEE Trans. Parallel Distributed Syst., 1997

Channel Bypassing: A Deadlock-Free Flow Control Policy for Adaptive Routing in Wormhole Networks.
Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 1997

Multilink extension to support deadlock-free adaptive non-minimal routing.
Proceedings of the Fifth Euromicro Workshop on Parallel and Distributed Processing (PDP '97), 1997

On the Reduction of Deadlock Frequency by Limiting Message Injection in Wormhole Networks.
Proceedings of the Parallel Computer Routing and Communication, 1997

Deadlock- and Livelock-Free Routing Protocols for Wave Switching.
Proceedings of the 11th International Parallel Processing Symposium (IPPS '97), 1997

Software-Based Deadlock Recovery Technique for True Fully Adaptive Routing in Wormhole Networks.
Proceedings of the 1997 International Conference on Parallel Processing (ICPP '97), 1997

Architectural Support for Reducing Communication Overhead in Multiprocessor Interconnection Networks.
Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture (HPCA '97), 1997

Improving the efficiency of adaptive routing in networks with irregular topology.
Proceedings of the Fourth International on High-Performance Computing, 1997

LIFE: a limited injection, fully adaptive, recovery-based routing algorithm.
Proceedings of the Fourth International on High-Performance Computing, 1997

Interconnection network behavior on a multicomputer in the parallelization of the MPEG coding algorithm. Worm-hole vs. packet-switching routing.
Proceedings of the Fourth International on High-Performance Computing, 1997

Efficient Adaptive Routing in Networks of Workstations with Irregular Topology.
Proceedings of the Communication and Architectural Support for Network-Based Parallel Computing, 1997

Interconnection networks - an engineering approach.
IEEE, ISBN: 978-0-8186-7800-4, 1997

1996
A Necessary and Sufficient Condition for Deadlock-Free Routing in Cut-Through and Store-and-Forward Networks.
IEEE Trans. Parallel Distributed Syst., 1996

An efficient implementation of tree-based multicast routing for distributed shared-memory multiprocessors.
Proceedings of the Eighth IEEE Symposium on Parallel and Distributed Processing, 1996

Interconnection Network Design: A Statistical Analysis of Interactions between Factors.
Proceedings of the 4th Euromicro Workshop on Parallel and Distributed Processing (PDP '96), 1996

Generalized Theory for Deadlock-Free Adaptive Wormhole Routing and its Application to Disha Concurrent.
Proceedings of IPPS '96, 1996

A High Performance Router Architecture for Interconnection Networks.
Proceedings of the 1996 International Conference on Parallel Processing, 1996

Optimal Topology for Distributed Shared-Memory Multiprocessors: Hypercubes Again?
Proceedings of the Euro-Par '96 Parallel Processing, 1996

1995
A Necessary and Sufficient Condition for Deadlock-Free Adaptive Routing in Wormhole Networks.
IEEE Trans. Parallel Distributed Syst., 1995

A Theory of Deadlock-Free Adaptive Multicast Routing in Wormhole Networks.
IEEE Trans. Parallel Distributed Syst., 1995

Deadlock-Free Fully-Adaptive Minimal Routing Algorithms: Limitations and Solutions.
Comput. Artif. Intell., 1995

Configurable Flow Control Mechanisms for Fault-Tolerant Routing.
Proceedings of the 22nd Annual International Symposium on Computer Architecture, 1995

Software Based Fault-Tolerant Oblivious Routing in Pipelined Networks.
Proceedings of the 1995 International Conference on Parallel Processing, 1995

1994
A Theory to Increase the Effective Redundancy in Wormhole Networks.
Parallel Process. Lett., 1994

Improving the efficiency of virtual channels with time-dependent selection functions.
Future Gener. Comput. Syst., 1994

Bandwidth Requirements For Wormhole Switches: A Simple And Efficient Design.
Proceedings of the Second Euromicro Workshop on Parallel and Distributed Processing, 1994

Performance Evaluation of Adaptive Routing Algorithms for k-ary-n-cubes.
Proceedings of the Parallel Computer Routing and Communication, 1994

Is It Possible to Fairly Compare Interconnection Networks?.
Proceedings of the Proceedings 1994 International Conference on Parallel and Distributed Systems, 1994

Scouting: Fully Adaptive, Deadlock-Free Routing in Faulty Pipelined Networks.
Proceedings of the Proceedings 1994 International Conference on Parallel and Distributed Systems, 1994

A Thory of Fault-Tolerant routing in Wormhole Networks.
Proceedings of the Proceedings 1994 International Conference on Parallel and Distributed Systems, 1994

Adaptive Unicast and Multicast in 3D Mesh Networks.
Proceedings of the 27th Annual Hawaii International Conference on System Sciences (HICSS-27), 1994

Highly adaptive wormhole routing algorithms for N-dimensional torus.
Proceedings of the Workshop on Interconnection Networks and Mapping and Scheduling Parallel Computations, 1994

1993
A New Theory of Deadlock-Free Adaptive Routing in Wormhole Networks.
IEEE Trans. Parallel Distributed Syst., 1993

On the Design of Deadlock-Free Adaptive Multicast Routing Algorithms.
Parallel Process. Lett., 1993

A New Theory of Deadlock-free Adaptive Multicast Routing in Wormhole Networks.
Proceedings of the Fifth IEEE Symposium on Parallel and Distributed Processing, 1993

Dynamic reconfiguration of multicomputer networks: limitations and tradeoffs.
Proceedings of the 1993 Euromicro Workshop on Parallel and Distributed Processing, 1993

Deadlock-Free Adaptive Routing Algorithms for the 3D-Torus: Limitations and Solutions.
Proceedings of the PARLE '93, 1993

Grouping Virtual Channels for Deadlock-Free Adaptive Wormhole Routing.
Proceedings of the PARLE '93, 1993

1992
Channel Classes: A New Concept for Deadlock Avoidance in Wormhole Networks.
Parallel Process. Lett., 1992

1991
An algorithm for dynamic reconfiguration of a multicomputer network.
Proceedings of the Third IEEE Symposium on Parallel and Distributed Processing, 1991

Deadlock-free adaptive routing algorithms for multicomputers: evaluation of a new algorithm.
Proceedings of the Third IEEE Symposium on Parallel and Distributed Processing, 1991

On the Design of Deadlock-Free Adaptive Routing Algorithms for Multicomputers: Design Methodologies.
Proceedings of the PARLE '91: Parallel Architectures and Languages Europe, 1991

On the Design of Deadlock-Free Adaptive Routing Algorithms for Multicomputers: Theoretical Aspects.
Proceedings of the Distributed Memory Computing, 2nd European Conference, 1991


  Loading...