Michela Becchi

Orcid: 0000-0001-8353-2915

According to our database1, Michela Becchi authored at least 81 papers between 2006 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
\texttt{Picasso}: Memory-Efficient Graph Coloring Using Palettes With Applications in Quantum Computing.
CoRR, 2024

2023
Fused Breadth-First Probabilistic Traversals on Distributed GPU Systems.
CoRR, 2023

GPU-Accelerated Error-Bounded Compression Framework for Quantum Circuit Simulations.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023

Evaluating Asynchronous Parallel I/O on HPC Systems.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023

Lightweight Huffman Coding for Efficient GPU Compression.
Proceedings of the 37th International Conference on Supercomputing, 2023

A Code Transformation to Improve the Efficiency of OpenCL Code on FPGA through Pipes.
Proceedings of the 20th ACM International Conference on Computing Frontiers, 2023

High-Level Synthesis of Irregular Applications: A Case Study on Influence Maximization.
Proceedings of the 20th ACM International Conference on Computing Frontiers, 2023

Runway: In-transit Data Compression on Heterogeneous HPC Systems.
Proceedings of the 23rd IEEE/ACM International Symposium on Cluster, 2023

2022
Enabling The Feed-Forward Design Model in OpenCL Using Pipes.
CoRR, 2022

Accelerating Random Forest Classification on GPU and FPGA.
Proceedings of the 51st International Conference on Parallel Processing, 2022

A GPU-accelerated Data Transformation Framework Rooted in Pushdown Transducers.
Proceedings of the 29th IEEE International Conference on High Performance Computing, 2022

Data Transformation Acceleration using Deterministic Finite-State Transducers.
Proceedings of the IEEE International Conference on Big Data, 2022

2021
Exploring Thread Coarsening on FPGA.
Proceedings of the 28th IEEE International Conference on High Performance Computing, 2021

PILOT: a Runtime System to Manage Multi-tenant GPU Unified Memory Footprint.
Proceedings of the 28th IEEE International Conference on High Performance Computing, 2021

2020
A Loop-Aware Autotuner for High-Precision Floating-Point Applications.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2020

GPU-Based Static Data-Flow Analysis for Fast and Scalable Android App Vetting.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020

Evaluating Thread Coarsening and Low-cost Synchronization on Intel Xeon Phi.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020

Optimizing Complex OpenCL Code for FPGA: A Case Study on Finite Automata Traversal.
Proceedings of the 26th IEEE International Conference on Parallel and Distributed Systems, 2020

GPU-FPtuner: Mixed-precision Auto-tuning for Floating-point Applications on GPU.
Proceedings of the 27th IEEE International Conference on High Performance Computing, 2020

A Flexible and Scalable NTT Hardware : Applications from Homomorphically Encrypted Deep Learning to Post-Quantum Cryptography.
Proceedings of the 2020 Design, Automation & Test in Europe Conference & Exhibition, 2020

2019
Editorial: Special Issue on Computing Frontiers.
J. Signal Process. Syst., 2019

Evaluating High Performance Pattern Matching on the Automata Processor.
IEEE Trans. Computers, 2019

Characterizing the Performance/Accuracy Tradeoff of High-Precision Applications via Auto-tuning.
Proceedings of the IEEE International Symposium on Workload Characterization, 2019

A comparative study of parallel programming frameworks for distributed GPU applications.
Proceedings of the 16th ACM International Conference on Computing Frontiers, 2019

2018
A Compiler Framework for Fixed-Topology Non-Deterministic Finite Automata on SIMD Platforms.
Proceedings of the 24th IEEE International Conference on Parallel and Distributed Systems, 2018

Compiling SIMT Programs on Multi- and Many-Core Processors with Wide Vector Units: A Case Study with CUDA.
Proceedings of the 25th IEEE International Conference on High Performance Computing, 2018

2017
A Principled Approach to Secure Multi-core Processor Design with ReWire.
ACM Trans. Embed. Comput. Syst., 2017

Fast Integral Histogram Computations on GPU for Real-Time Video Analytics.
CoRR, 2017

Understanding the performance-accuracy tradeoffs of floating-point arithmetic on GPUs.
Proceedings of the 2017 IEEE International Symposium on Workload Characterization, 2017

Demystifying automata processing: GPUs, FPGAs or Micron's AP?
Proceedings of the International Conference on Supercomputing, 2017

An Analytical Study of Recursive Tree Traversal Patterns on Multi- and Many-Core Platforms.
Proceedings of the 23rd IEEE International Conference on Parallel and Distributed Systems, 2017

A Memory-Efficient GPU Method for Hamming and Levenshtein Distance Similarity.
Proceedings of the 24th IEEE International Conference on High Performance Computing, 2017

2016
Picking Pesky Parameters: Optimizing Regular Expression Matching in Practice.
IEEE Trans. Parallel Distributed Syst., 2016

A programming model for reconfigurable computing based in functional concurrency.
Proceedings of the 11th International Symposium on Reconfigurable Communication-centric Systems-on-Chip, 2016

Compiler-Assisted Workload Consolidation for Efficient Dynamic Parallelism on GPU.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

High Performance Pattern Matching Using the Automata Processor.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

Parallel Gene Upstream Comparison via Multi-Level Hash Tables on GPU.
Proceedings of the 22nd IEEE International Conference on Parallel and Distributed Systems, 2016

IVM: a task-based shared memory programming model and runtime system to enable uniform access to CPU-GPU clusters.
Proceedings of the ACM International Conference on Computing Frontiers, CF'16, 2016

Evaluating the Energy Efficiency of Deep Convolutional Neural Networks on CPUs and GPUs.
Proceedings of the 2016 IEEE International Conferences on Big Data and Cloud Computing (BDCloud), 2016

O3FA: A Scalable Finite Automata-based Pattern-Matching Engine for Out-of-Order Deep Packet Inspection.
Proceedings of the 2016 Symposium on Architectures for Networking and Communications Systems, 2016

2015
Fast support for unstructured data processing: the unified automata processor.
Proceedings of the 48th International Symposium on Microarchitecture, 2015

Semantics Driven Hardware Design, Implementation, and Verification with ReWire.
Proceedings of the 16th ACM SIGPLAN/SIGBED Conference on Languages, 2015

Accelerating regular expression matching over compressed HTTP.
Proceedings of the 2015 IEEE Conference on Computer Communications, 2015

Nested Parallelism on GPU: Exploring Parallelization Templates for Irregular Loops and Recursive Computations.
Proceedings of the 44th International Conference on Parallel Processing, 2015

Improving Application Concurrency on GPUs by Managing Implicit and Explicit Synchronizations.
Proceedings of the 21st IEEE International Conference on Parallel and Distributed Systems, 2015

Exploiting Dynamic Parallelism to Efficiently Support Irregular Nested Loops on GPUs.
Proceedings of the 2015 International Workshop on Code Optimisation for Multi and Many Cores, 2015

Hardware Synthesis from Functional Embedded Domain-Specific Languages: A Case Study in Regular Expression Compilation.
Proceedings of the Applied Reconfigurable Computing - 11th International Symposium, 2015

2014
Large-Scale Pairwise Alignments on GPU Clusters: Exploring the Implementation Space.
J. Signal Process. Syst., 2014

Revisiting State Blow-Up: Automatically Building Augmented-FA While Preserving Functional Equivalence.
IEEE J. Sel. Areas Commun., 2014

GRapid: A compilation and runtime framework for rapid prototyping of graph applications on many-core processors.
Proceedings of the 20th IEEE International Conference on Parallel and Distributed Systems, 2014

A flexible scheduling framework for heterogeneous CPU-GPU clusters.
Proceedings of the 21st International Conference on High Performance Computing, 2014

Design of a hybrid MPI-CUDA benchmark suite for CPU-GPU clusters.
Proceedings of the International Conference on Parallel Architectures and Compilation, 2014

2013
A-DFA: A Time- and Space-Efficient DFA Compression Algorithm for Fast Regular Expression Evaluation.
ACM Trans. Archit. Code Optim., 2013

Scheduling concurrent applications on a cluster of CPU-GPU nodes.
Future Gener. Comput. Syst., 2013

Exploring different automata representations for efficient regular expression matching on GPUs.
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2013

Deploying Graph Algorithms on GPUs: An Adaptive Solution.
Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

A preemption-based runtime to efficiently schedule multi-process applications on heterogeneous clusters with GPUs.
Proceedings of the 22nd International Symposium on High-Performance Parallel and Distributed Computing, 2013

Semantics-directed machine architecture in ReWire.
Proceedings of the 2013 International Conference on Field-Programmable Technology, 2013

GPU acceleration of regular expression matching for large datasets: exploring the implementation space.
Proceedings of the Computing Frontiers Conference, 2013

A distributed CPU-GPU framework for pairwise alignments on large-scale sequence datasets.
Proceedings of the 24th International Conference on Application-Specific Systems, 2013

2012
A Massively Parallel, Energy Efficient Programmable Accelerator for Learning and Classification.
ACM Trans. Archit. Code Optim., 2012

Formal Semantics of Heterogeneous CUDA-C: A Modular Approach with Applications
Proceedings of the Proceedings Seventh Conference on Systems Software Verification, 2012

ValuePack: value-based scheduling framework for CPU-GPU clusters.
Proceedings of the SC Conference on High Performance Computing Networking, 2012

Poster: Multiple Pairwise Sequence Alignments with the Needleman-Wunsch Algorithm on GPU.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012

Abstract: Multiple Pairwise Sequence Alignments with the Needleman-Wunsch Algorithm on GPU.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012

A virtual memory based runtime to support multi-tenancy in clusters with GPUs.
Proceedings of the 21st International Symposium on High-Performance Parallel and Distributed Computing, 2012

Efficient GPU Implementation of the Integral Histogram.
Proceedings of the Computer Vision - ACCV 2012 Workshops, 2012

2011
Supporting GPU sharing in cloud environments with a transparent runtime consolidation framework.
Proceedings of the 20th ACM International Symposium on High Performance Distributed Computing, 2011

2010
Data-aware scheduling of legacy kernels on heterogeneous platforms with distributed memory.
Proceedings of the SPAA 2010: Proceedings of the 22nd Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2010

A programmable parallel accelerator for learning and classification.
Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, 2010

2009
Evaluating regular expression matching engines on network and general purpose processors.
Proceedings of the 2009 ACM/IEEE Symposium on Architecture for Networking and Communications Systems, 2009

2008
Dynamic Thread Assignment on Heterogeneous Multiprocessor Architectures.
J. Instr. Level Parallelism, 2008

A workload for evaluating deep packet inspection architectures.
Proceedings of the 4th International Symposium on Workload Characterization (IISWC 2008), 2008

Extending finite automata to efficiently match Perl-compatible regular expressions.
Proceedings of the 2008 ACM Conference on Emerging Network Experiment and Technology, 2008

A remotely accessible network processor-based router for network experimentation.
Proceedings of the 2008 ACM/IEEE Symposium on Architecture for Networking and Communications Systems, 2008

Efficient regular expression evaluation: theory to practice.
Proceedings of the 2008 ACM/IEEE Symposium on Architecture for Networking and Communications Systems, 2008

2007
Memory-Efficient Regular Expression Search Using State Merging.
Proceedings of the INFOCOM 2007. 26th IEEE International Conference on Computer Communications, 2007

A hybrid finite automaton for practical deep packet inspection.
Proceedings of the 2007 ACM Conference on Emerging Network Experiment and Technology, 2007

Performance/area efficiency in chip multiprocessors with micro-caches.
Proceedings of the 4th Conference on Computing Frontiers, 2007

An improved algorithm to accelerate regular expression evaluation.
Proceedings of the 2007 ACM/IEEE Symposium on Architecture for Networking and Communications Systems, 2007

2006
CAMP: fast and efficient IP lookup architecture.
Proceedings of the 2006 ACM/IEEE Symposium on Architecture for Networking and Communications Systems, 2006


  Loading...