Fabrizio Petrini

Orcid: 0000-0002-4977-7107

According to our database1, Fabrizio Petrini authored at least 135 papers between 1991 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Accelerating Sparse Tensor Decomposition Using Adaptive Linearized Representation.
CoRR, 2024

A High-Performance Design, Implementation, Deployment, and Evaluation of The Slim Fly Network.
Proceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation, 2024

2023
The Intel Programmable and Integrated Unified Memory Architecture Graph Analytics Processor.
IEEE Micro, 2023

Recent Trends in Graph Decomposition (Dagstuhl Seminar 23331).
Dagstuhl Reports, 2023

Open Problems in (Hyper)Graph Decomposition.
CoRR, 2023

PolarStar: Expanding the Scalability Horizon of Diameter-3 Networks.
CoRR, 2023

In-network Allreduce with Multiple Spanning Trees on PolarFly.
Proceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures, 2023

Characterizing the Scalability of Graph Convolutional Networks on Intel<sup>®</sup> PIUMA.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2023

Dynamic Tensor Linearization and Time Slicing for Efficient Factorization of Infinite Data Streams.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023

2022
Accelerating Allreduce With In-Network Reduction on Intel PIUMA.
IEEE Micro, 2022

Ridgeline: A 2D Roofline Model for Distributed Systems.
CoRR, 2022

SU3_Bench on a Programmable Integrated Unified Memory Architecture (PIUMA) and How that Differs from Standard NUMA CPUs.
Proceedings of the High Performance Computing - 37th International Conference, 2022

PolarFly: A Cost-Effective and Flexible Low-Diameter Topology.
Proceedings of the SC22: International Conference for High Performance Computing, 2022

Efficient, out-of-memory sparse MTTKRP on massively parallel architectures.
Proceedings of the ICS '22: 2022 International Conference on Supercomputing, Virtual Event, June 28, 2022

Accelerating Prefix Scan with in-network computing on Intel PIUMA.
Proceedings of the 29th IEEE International Conference on High Performance Computing, 2022

2021
A New Parallel Algorithm for Sinkhorn Word-Movers Distance and Its Performance on PIUMA and Xeon CPU.
CoRR, 2021

Performance Optimization of SU3_Bench on Xeon and Programmable Integrated Unified Memory Architecture.
CoRR, 2021

Lessons Learned from Accelerating Quicksilver on Programmable Integrated Unified Memory Architecture (PIUMA) and How That's Different from CPU.
Proceedings of the High Performance Computing - 36th International Conference, 2021

High Performance Streaming Tensor Decomposition.
Proceedings of the 35th IEEE International Parallel and Distributed Processing Symposium, 2021

ALTO: adaptive linearized storage of sparse tensors.
Proceedings of the ICS '21: 2021 International Conference on Supercomputing, 2021

In-network reductions on multi-dimensional HyperX.
Proceedings of the IEEE Symposium on High-Performance Interconnects, 2021

2020
Introduction to the TOPC Special Issue on Innovations in Systems for Irregular Applications, Part 2.
ACM Trans. Parallel Comput., 2020

Introduction to the TOPC Special Issue on Innovations in Systems for Irregular Applications, Part 1.
ACM Trans. Parallel Comput., 2020

Mapping Stencils on Coarse-grained Reconfigurable Spatial Architecture.
CoRR, 2020

PIUMA: Programmable Integrated Unified Memory Architecture.
CoRR, 2020

An Efficient Shared-memory Parallel Sinkhorn-Knopp Algorithm to Compute the Word Mover's Distance.
CoRR, 2020

Online and Real-time Object Tracking Algorithm with Extremely Small Matrices.
CoRR, 2020

Prune the Unnecessary: Parallel Pull-Push Louvain Algorithms with Automatic Edge Pruning.
Proceedings of the ICPP 2020: 49th International Conference on Parallel Processing, 2020

Breaking the Scalability Wall.
Proceedings of the 27th IEEE International Conference on High Performance Computing, 2020

2017
Scalable Single Source Shortest Path Algorithms for Massively Parallel Systems.
IEEE Trans. Parallel Distributed Syst., 2017

Exploring optimizations on shared-memory platforms for parallel triangle counting algorithms.
Proceedings of the 2017 IEEE High Performance Extreme Computing Conference, 2017

Truss decomposition on shared-memory parallel systems.
Proceedings of the 2017 IEEE High Performance Extreme Computing Conference, 2017

2016
An Early Performance Study of Large-Scale POWER8 SMP Systems.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

Subgraph Counting: Color Coding Beyond Trees.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

Optimizing Sparse Matrix-Vector Multiplication for Large-Scale Data Analytics.
Proceedings of the 2016 International Conference on Supercomputing, 2016

2015
Optimizing Sparse Linear Algebra for Large-Scale Graph Analytics.
Computer, 2015

Exploring network optimizations for large-scale graph analytics.
Proceedings of the International Conference for High Performance Computing, 2015

Scalable Community Detection with the Louvain Algorithm.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

2014
A Throughput-Optimized Optical Network for Data-Intensive Computing.
IEEE Micro, 2014

Hourglass: A Bandwidth-Driven Performance Model for Sorting Algorithms.
Proceedings of the Supercomputing - 29th International Conference, 2014

Performance Analysis of Graph Algorithms on P7IH.
Proceedings of the Supercomputing - 29th International Conference, 2014

Traversing Trillions of Edges in Real Time: Graph Exploration on Large-Scale Parallel Machines.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

Scalable Single Source Shortest Path Algorithms for Massively Parallel Systems.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

2013
Massive data analytics: The Graph 500 on IBM Blue Gene/Q.
IBM J. Res. Dev., 2013

2012
Top Picks from Hot Interconnects 2011: Petascale Network Architectures.
IEEE Micro, 2012

Looking under the hood of the IBM blue gene/Q network.
Proceedings of the SC Conference on High Performance Computing Networking, 2012

Breaking the speed and scalability barriers for graph exploration on distributed-memory machines.
Proceedings of the SC Conference on High Performance Computing Networking, 2012

Performance evaluation of interthread communicationmechanisms on multicore/multithreaded architectures.
Proceedings of the 21st International Symposium on High-Performance Parallel and Distributed Computing, 2012

2011
Ultra low latency market data feed on IBM PowerEN<sup>TM</sup>.
Comput. Sci. Res. Dev., 2011

Characterization of the Communication Patterns of Scientific Applications on Blue Gene/P.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Introduction.
Proceedings of the Euro-Par 2011 Parallel Processing - 17th International Conference, 2011

2010
DotStar: breaking the scalability and performance barriers in parsing regular expressions.
Comput. Sci. Res. Dev., 2010

Tools for Very Fast Regular Expression Matching.
Computer, 2010

Intra-Socket and Inter-Socket Communication in Multi-core Systems.
IEEE Comput. Archit. Lett., 2010

Scalable Graph Exploration on Multicore Processors.
Proceedings of the Conference on High Performance Computing Networking, 2010

Streaming, low-latency communication in on-line trading systems.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Multicore and Manycore Programming.
Proceedings of the Euro-Par 2010 - Parallel Processing, 16th International Euro-Par Conference, Ischia, Italy, August 31, 2010

High Performance Topology-Aware Communication in Multicore Processors.
Proceedings of the Scientific Computing with Multicore and Accelerators., 2010

Combinatorial Algorithm Design on the Cell/B.E. Processor.
Proceedings of the Scientific Computing with Multicore and Accelerators., 2010

2009
Efficient and Scalable Hardware-Based Multicast in Fat-Tree Networks.
IEEE Trans. Parallel Distributed Syst., 2009

Guest Editors' Introduction: Hot Interconnects.
IEEE Micro, 2009

Faster FAST: multicore acceleration of streaming financial data.
Comput. Sci. Res. Dev., 2009

Applying Amdahl's Other Law to the data center.
IBM J. Res. Dev., 2009

SCAMPI: a scalable CAM-based algorithm for multiple pattern inspection.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009

Fulcrum's FocalPoint FM4000: A Scalable, Low-Latency 10GigE Switch for High-Performance Data Centers.
Proceedings of the 17th IEEE Symposium on High Performance Interconnects, 2009

2008
Efficient Breadth-First Search on the Cell/BE Processor.
IEEE Trans. Parallel Distributed Syst., 2008

Accelerating Real-Time String Searching with Multicore Processors.
Computer, 2008

High-speed string searching against large dictionaries on the Cell/B.E. Processor.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Exact multi-pattern string matching on the cell/b.e. processor.
Proceedings of the 5th Conference on Computing Frontiers, 2008

2007
Towards Fault Resilient Global Arrays.
Proceedings of the Parallel Computing: Architectures, 2007

Challenges in Mapping Graph Exploration Algorithms on Advanced Multi-core Processors.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

Peak-Performance DFA-based String Matching on the Cell Processor.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

Multicore Surprises: Lessons Learned from Optimizing Sweep3D on the Cell Broadband Engine.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

Transparent system-level migration of PGAS applications using Xen on InfiniBand.
Proceedings of the 2007 IEEE International Conference on Cluster Computing, 2007

2006
STORM: Scalable Resource Management for Large-Scale Parallel Computers.
IEEE Trans. Computers, 2006

SFT: scalable fault tolerance.
ACM SIGOPS Oper. Syst. Rev., 2006

Guest Editors' Introduction: High-Performance Interconnects.
IEEE Micro, 2006

Cell Multiprocessor Communication Network: Built for Speed.
IEEE Micro, 2006

NIC-based reduction algorithms for large-scale clusters.
Int. J. High Perform. Comput. Netw., 2006

An Abstract Interface for System Software on Large-Scale Clusters.
Comput. J., 2006

A Locality-Aware Cooperative Cache Management Protocol to Improve Network File System Performance.
Proceedings of the 26th IEEE International Conference on Distributed Computing Systems (ICDCS 2006), 2006

2005
Adaptive Parallel Job Scheduling with Flexible Coscheduling.
IEEE Trans. Parallel Distributed Syst., 2005

QsNetII: Defining High-Performance Network Design.
IEEE Micro, 2005

Transparent, Incremental Checkpointing at Kernel Level: a Foundation for Fault Tolerance for Parallel Computers.
Proceedings of the ACM/IEEE SC2005 Conference on High Performance Networking and Computing, 2005

Assessing MPI Performance on QsNet<sup>II</sup>.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2005

Current Practice and a Direction Forward in Checkpoint/Restart Implementations for Fault Tolerance.
Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

Monitoring and Debugging Parallel Software with BCS-MPI on Large-Scale Clusters.
Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

EtherNET vs. EtherNOT.
Proceedings of the 13th Annual IEEE Symposium on High Performance Interconnects (HOTIC 2005), 2005

2004
A Performance Evaluation of an Alpha EV7 Processing Node.
Int. J. High Perform. Comput. Appl., 2004

A Performance and Scalability Analysis of the BlueGene/L Architecture.
Proceedings of the ACM/IEEE SC2004 Conference on High Performance Networking and Computing, 2004

On the Feasibility of Incremental Checkpointing for Scientific Computing.
Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), 2004

System-Level Fault-Tolerance in Large-Scale Parallel Machines with Buffered Coscheduling.
Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), 2004

Architectural Support for System Software on Large-Scale Clusters.
Proceedings of the 33rd International Conference on Parallel Processing (ICPP 2004), 2004

What are the future trends in high-performance inter.connects for parallel computers? [Panel 1].
Proceedings of the 12th Annual IEEE Symposium on High Performance Interconnects, 2004

Designing Parallel Operating Systems via Parallel Programming.
Proceedings of the Euro-Par 2004 Parallel Processing, 2004

Topic 14: Routing and Communication in Interconnection Networks.
Proceedings of the Euro-Par 2004 Parallel Processing, 2004

2003
Using multirail networks in high-performance clusters.
Concurr. Comput. Pract. Exp., 2003

Performance Evaluation of the Quadrics Interconnection Network.
Clust. Comput., 2003

The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8, 192 Processors of ASCI Q.
Proceedings of the ACM/IEEE SC2003 Conference on High Performance Networking and Computing, 2003

Scalable NIC-based Reduction on Large-scale Clusters.
Proceedings of the ACM/IEEE SC2003 Conference on High Performance Networking and Computing, 2003

BCS-MPI: A New Approach in the System Software Design for Large-Scale Parallel Computers.
Proceedings of the ACM/IEEE SC2003 Conference on High Performance Networking and Computing, 2003

Scalable Hardware-Based Multicast Trees.
Proceedings of the ACM/IEEE SC2003 Conference on High Performance Networking and Computing, 2003

Parallel Job Scheduling under Dynamic Workloads.
Proceedings of the Job Scheduling Strategies for Parallel Processing, 2003

Flexible CoScheduling: Mitigating Load Imbalance and Improving Utilization of Heterogeneous Resources.
Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), 2003

Scalable collective communication on the ASCI Q machine.
Proceedings of the 11th Annual IEEE Symposium on High Performance Interconnects, 2003

2002
The Quadrics Network: High-Performance Clustering Technology.
IEEE Micro, 2002

STORM: lightning-fast resource management.
Proceedings of the 2002 ACM/IEEE conference on Supercomputing, 2002

Performance Evaluation of I/O Traffic and Placement of I/O Nodes on a High Performance Network.
Proceedings of the 16th International Parallel and Distributed Processing Symposium (IPDPS 2002), 2002

Scalable Resource Management in High Performance Computers.
Proceedings of the 2002 IEEE International Conference on Cluster Computing (CLUSTER 2002), 2002

2001
Improved resource utilization with buffered coscheduling.
Parallel Algorithms Appl., 2001

Predictive performance and scalability modeling of a large-scale application.
Proceedings of the 2001 ACM/IEEE conference on Supercomputing, 2001

Hardware- and Software-Based Collective Communication on the Quadrics Network.
Proceedings of the IEEE International Symposium on Network Computing and Applications (NCA 2001), 2001

Performance Evaluation of the Quadrics Interconnection Network.
Proceedings of the 15th International Parallel & Distributed Processing Symposium (IPDPS-01), 2001

Gang Scheduling with Lightweight User-Level Communication.
Proceedings of the 30th International Workshops on Parallel Processing (ICPP 2001 Workshops), 2001

The Quadrics network (QsNet): high-performance clustering technology.
Proceedings of the Ninth Symposium on High Performance Interconnects, 2001

2000
Efficient Total-Exchange in Wormhole-Routed Toroidal Cubes.
Comput. Artif. Intell., 2000

Time-Sharing Parallel Jobs in the Presence of Multiple Resource Requirements.
Proceedings of the Job Scheduling Strategies for Parallel Processing, IPDPS 2000 Workshop, 2000

Buffered Coscheduling: A New Methodology for Multitasking Parallel Jobs on Distributed Systems.
Proceedings of the 14th International Parallel & Distributed Processing Symposium (IPDPS'00), 2000

A General Predictive Performance Model for Wavefront Algorithms on Clusters of SMPs.
Proceedings of the 2000 International Conference on Parallel Processing, 2000

Scheduling with Global Information in Distributed Systems.
Proceedings of the 20th International Conference on Distributed Computing Systems, 2000

1999
A New Approach to Parallel Program Development and Scheduling of Parallel Jobs on Distributed Systems.
Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 1999

1998
Performance Analysis of Wormhole Routed K-Ary N-Trees.
Int. J. Found. Comput. Sci., 1998

Total Exchange on k-ary n-cubes with Adaptive Routing.
Proceedings of the 12th International Parallel Processing Symposium / 9th Symposium on Parallel and Distributed Processing (IPPS/SPDP '98), March 30, 1998

1997
Routing in Bidirectional k-ary n-cubes with the Red Rover Algorithm.
Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 1997

On the Reduction of Deadlock Frequency by Limiting Message Injection in Wormhole Networks.
Proceedings of the Parallel Computer Routing and Communication, 1997

Performance Analysis of Minimal Adaptive Wormhole Routing with Time-Dependent Deadlock Recovery.
Proceedings of the 11th International Parallel Processing Symposium (IPPS '97), 1997

k -ary n -trees: High Performance Networks for Massively Parallel Architectures.
Proceedings of the 11th International Parallel Processing Symposium (IPPS '97), 1997

Network Performance under Physical Constraints.
Proceedings of the 1997 International Conference on Parallel Processing (ICPP '97), 1997

LIFE: a limited injection, fully adaptive, recovery-based routing algorithm.
Proceedings of the Fourth International on High-Performance Computing, 1997

Efficient Total-Exchange in Wormhole-Routed Toroidal Cubes.
Proceedings of the Euro-Par '97 Parallel Processing, 1997

SMART: A Simulator of Massive Architectures and Topologies.
Proceedings of the IASTED International Conference on Parallel and Distributed Systems, 1997

Efficient Personalized Communication on Wormhole Networks.
Proceedings of the 1997 Conference on Parallel Architectures and Compilation Techniques (PACT '97), 1997

1996
Minimal vs. non Minimal Adaptive Routing on k-ary n-cubes.
Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 1996

Latency and Bandwidth Requirements of Massively Parallel Programs: FFT as a Case Study.
Proceedings of the Euro-Par '96 Parallel Processing, 1996

1991
Pisa parallel processing project on general-purpose highly-parallel computers.
Proceedings of the Fifteenth Annual International Computer Software and Applications Conference, 1991


  Loading...