Francesco Conti

Fabrizio Indirli

Antonio Latella

Giacomo Michele Puglia

IEEE Access, 2024

2023

RedMule: A mixed-precision matrix-matrix operation engine for flexible and energy-efficient on-chip linear algebra and TinyML training acceleration.

[BibT_eX]

[DOI]

Future Gener. Comput. Syst., December, 2023

Reduced precision floating-point optimization for Deep Neural Network On-Device Learning on microcontrollers.

[BibT_eX]

[DOI]

Future Gener. Comput. Syst., December, 2023

Graphene-Based Wireless Agile Interconnects for Massive Heterogeneous Multi-Chip Processors.

[BibT_eX]

[DOI]

IEEE Wirel. Commun., August, 2023

Lightweight Neural Architecture Search for Temporal Convolutional Networks at the Edge.

[BibT_eX]

[DOI]

IEEE Trans. Computers, March, 2023

Dustin: A 16-Cores Parallel Ultra-Low-Power Cluster With 2b-to-32b Fully Flexible Bit-Precision and Vector Lockstep Execution Mode.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. I Regul. Pap., 2023

Siracusa: A 16 nm Heterogenous RISC-V SoC for Extended Reality with At-MRAM Neural Engine.

[BibT_eX]

[DOI]

CoRR, 2023

A Survey on Design Methodologies for Accelerating Deep Learning on Heterogeneous Architectures.

[BibT_eX]

[DOI]

CoRR, 2023

A Survey on Deep Learning Hardware Accelerators for Heterogeneous HPC Platforms.

[BibT_eX]

[DOI]

CoRR, 2023

Marsellus: A Heterogeneous RISC-V AI-IoT End-Node SoC with 2-to-8b DNN Acceleration and 30%-Boost Adaptive Body Biasing.

[BibT_eX]

[DOI]

CoRR, 2023

Echoes: a 200 GOPS/W Frequency Domain SoC with FFT Processor and I2S DSP for Flexible Data Acquisition from Microphone Arrays.

[BibT_eX]

[DOI]

CoRR, 2023

DARKSIDE: A Heterogeneous RISC-V Compute Cluster for Extreme-Edge On-Chip DNN Inference and Training.

[BibT_eX]

[DOI]

CoRR, 2023

Hybrid Modular Redundancy: Exploring Modular Redundancy Approaches in RISC-V Multi-Core Computing Clusters for Reliable Processing in Space.

[BibT_eX]

[DOI]

CoRR, 2023

RISC-V Processor Technologies for Aerospace Applications in the ISOLDE Project.

[BibT_eX]

[DOI]

Proceedings of the Embedded Computer Systems: Architectures, Modeling, and Simulation, 2023

A 3 TOPS/W RISC-V Parallel Cluster for Inference of Fine-Grain Mixed-Precision Quantized Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2023

A 12.4TOPS/W @ 136GOPS AI-IoT System-on-Chip with 16 RISC-V, 2-to-8b Precision-Scalable DNN Acceleration and 30%-Boost Adaptive Body Biasing.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Solid- State Circuits Conference, 2023

ECHOES: a 200 GOPS/W Frequency Domain SoC with FFT Processor and I2S DSP for Flexible Data Acquisition from Microphone Arrays.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Circuits and Systems, 2023

PULP Fiction No More - Dependable PULP Systems for Space.

[BibT_eX]

[DOI]

Proceedings of the IEEE European Test Symposium, 2023

Siracusa: A Low-Power On-Sensor RISC-V SoC for Extended Reality Visual Processing in 16nm CMOS.

[BibT_eX]

[DOI]

Proceedings of the 49th IEEE European Solid State Circuits Conference, 2023

End-to-End DNN Inference on a Massively Parallel Analog In Memory Computing Architecture.

[BibT_eX]

[DOI]

Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2023

Specialization meets Flexibility: a Heterogeneous Architecture for High-Efficiency, High-flexibility AR/VR Processing.

[BibT_eX]

[DOI]

Arpan Suravi Prasad

Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

HTVM: Efficient Neural Network Deployment On Heterogeneous TinyML Platforms.

[BibT_eX]

[DOI]

Josse Van Delm

Maarten Vandersteegen

Giuseppe Maria Sarda

Marian Verhelst

Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

WIP: Automatic DNN Deployment on Heterogeneous Platforms: the GAP9 Case Study.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Compilers, 2023

Free Bits: Latency Optimization of Mixed-Precision Quantized Neural Networks on the Edge.

[BibT_eX]

[DOI]

Georg Rutishauser

Proceedings of the 5th IEEE International Conference on Artificial Intelligence Circuits and Systems, 2023

2022

Vau Da Muntanialas: Energy-Efficient Multi-Die Scalable Acceleration of RNN Inference.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. I Regul. Pap., 2022

Vega: A Ten-Core SoC for IoT Endnodes With DNN Acceleration and Cognitive Wake-Up From MRAM-Based State-Retentive Sleep Mode.

[BibT_eX]

[DOI]

IEEE J. Solid State Circuits, 2022

Fully Onboard AI-Powered Human-Drone Pose Estimation on Ultralow-Power Autonomous Flying Nano-UAVs.

[BibT_eX]

[DOI]

Luca Maria Gambardella

Alessandro Giusti

Jérôme Guzzi

IEEE Internet Things J., 2022

A Heterogeneous In-Memory Computing Cluster for Flexible End-to-End Inference of Real-World Deep Neural Networks.

[BibT_eX]

[DOI]

IEEE J. Emerg. Sel. Topics Circuits Syst., 2022

Motor-Unit Ordering of Blindly-Separated Surface-EMG Signals for Gesture Recognition.

[BibT_eX]

[DOI]

Mattia Orlandi

Elisa Donati

Simone Benatti

Proceedings of the Advances in System-Integrated Intelligence, 2022

PULP-TrainLib: Enabling On-Device Training for RISC-V Multi-core MCUs Through Performance-Driven Autotuning.

[BibT_eX]

[DOI]

Proceedings of the Embedded Computer Systems: Architectures, Modeling, and Simulation, 2022

ViT-LR: Pushing the Envelope for Transformer-Based on-Device Embedded Continual Learning.

[BibT_eX]

[DOI]

Alberto Dequino

Proceedings of the 13th IEEE International Green and Sustainable Computing Conference, 2022

Darkside: 2.6GFLOPS, 8.7mW Heterogeneous RISC-V Cluster for Extreme-Edge On-Chip DNN Inference and Training.

[BibT_eX]

[DOI]

Proceedings of the 48th IEEE European Solid State Circuits Conference, 2022

RedMulE: A Compact FP16 Matrix-Multiplication Accelerator for Adaptive Deep Learning on RISC-V-Based Ultra-Low-Power SoCs.

[BibT_eX]

[DOI]

Proceedings of the 2022 Design, Automation & Test in Europe Conference & Exhibition, 2022

SNE: an Energy-Proportional Digital Accelerator for Sparse Event-Based Convolutions.

[BibT_eX]

[DOI]

Proceedings of the 2022 Design, Automation & Test in Europe Conference & Exhibition, 2022

sEMG Neural Spikes Reconstruction for Gesture Recognition on a Low-Power Multicore Processor.

[BibT_eX]

[DOI]

Mattia Orlandi

Victor Javier Kartsch Morinigo

Simone Benatti

Proceedings of the IEEE Biomedical Circuits and Systems Conference, 2022

Scale up your In-Memory Accelerator: Leveraging Wireless-on-Chip Communication for AIMC-based CNN Inference.

[BibT_eX]

[DOI]

Albert Cabellos-Aparicio

Proceedings of the 4th IEEE International Conference on Artificial Intelligence Circuits and Systems, 2022

2021

RNN-Based Radio Resource Management on Multicore RISC-V Accelerator Architectures.

[BibT_eX]

[DOI]

IEEE Trans. Very Large Scale Integr. Syst., 2021

XpulpNN: Enabling Energy Efficient and Flexible Inference of Quantized Neural Networks on RISC-V Based IoT End Nodes.

[BibT_eX]

[DOI]

IEEE Trans. Emerg. Top. Comput., 2021

DORY: Automatic End-to-End Deployment of Real-World DNNs on Low-Cost IoT MCUs.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2021

A TinyML Platform for On-Device Continual Learning With Quantized Latent Replays.

[BibT_eX]

[DOI]

IEEE J. Emerg. Sel. Topics Circuits Syst., 2021

Improving Autonomous Nano-Drones Performance via Automated End-to-End Optimization and Deployment of DNNs.

[BibT_eX]

[DOI]

IEEE J. Emerg. Sel. Topics Circuits Syst., 2021

Vega: A 10-Core SoC for IoT End-Nodes with DNN Acceleration and Cognitive Wake-Up From MRAM-Based State-Retentive Sleep Mode.

[BibT_eX]

[DOI]

CoRR, 2021

Fully Onboard AI-powered Human-Drone Pose Estimation on Ultra-low Power Autonomous Flying Nano-UAVs.

[BibT_eX]

[DOI]

Luca Maria Gambardella

Alessandro Giusti

Jérôme Guzzi

CoRR, 2021

A Multi-Precision Bit-Serial Hardware Accelerator IP for Deep Learning Enabled Internet-of-Things.

[BibT_eX]

[DOI]

Maurizio Capra

Maurizio Martina

Proceedings of the 64th IEEE International Midwest Symposium on Circuits and Systems, 2021

4.4 A 1.3TOPS/W @ 32GOPS Fully Integrated 10-Core SoC for IoT End-Nodes with 1.7μW Cognitive Wake-Up From MRAM-Based State-Retentive Sleep Mode.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Solid-State Circuits Conference, 2021

TCN Mapping Optimization for Ultra-Low Power Time-Series Edge Inference.

[BibT_eX]

[DOI]

Alberto Dequino

Proceedings of the IEEE/ACM International Symposium on Low Power Electronics and Design, 2021

GVSoC: A Highly Configurable, Fast and Accurate Full-Platform Simulator for RISC-V based IoT Processors.

[BibT_eX]

[DOI]

Proceedings of the 39th IEEE International Conference on Computer Design, 2021

A 1.15 TOPS/W, 16-Cores Parallel Ultra-Low Power Cluster with 2b-to-32b Fully Flexible Bit-Precision and Vector Lockstep Execution Mode.

[BibT_eX]

[DOI]

Proceedings of the 47th ESSCIRC 2021, 2021

A RISC-V-based FPGA Overlay to Simplify Embedded Accelerator Deployment.

[BibT_eX]

[DOI]

Proceedings of the 24th Euromicro Conference on Digital System Design, 2021

Fünfiiber-Drone: A Modular Open-Platform 18-grams Autonomous Nano-Drone.

[BibT_eX]

[DOI]

Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2021

Pruning In Time (PIT): A Lightweight Network Architecture Optimizer for Temporal Convolutional Networks.

[BibT_eX]

[DOI]

Matteo Risso

Proceedings of the 58th ACM/IEEE Design Automation Conference, 2021

A Microcontroller is All You Need: Enabling Transformer Execution on Low-Power IoT Endnodes.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE International Conference on Omni-Layer Intelligent Systems, 2021

Architecting more than Moore: wireless plasticity for massive heterogeneous computer architectures (WiPLASH).

[BibT_eX]

[DOI]

Proceedings of the CF '21: Computing Frontiers Conference, 2021

To Buffer, or Not to Buffer? A Case Study on FFT Accelerators for Ultra-Low-Power Multicore Clusters.

[BibT_eX]

[DOI]

Luca Bertaccini

Proceedings of the 32nd IEEE International Conference on Application-specific Systems, 2021

End-to-end 100-TOPS/W Inference With Analog In-Memory Computing: Are We There Yet?

[BibT_eX]

[DOI]

Proceedings of the 3rd IEEE International Conference on Artificial Intelligence Circuits and Systems, 2021

2020

Always-On 674μ W@4GOP/s Error Resilient Binary Neural Networks With Aggressive SRAM Voltage Scaling on a 22-nm IoT End-Node.

[BibT_eX]

[DOI]

Alfio Di Mauro

IEEE Trans. Circuits Syst., 2020

Robust Real-Time Embedded EMG Recognition Framework Using Temporal Convolutional Networks on a Multicore IoT Processor.

[BibT_eX]

[DOI]

Simone Benatti

Victor Javier Kartsch

Teresa Serrano-Gotarredona

IEEE Trans. Biomed. Circuits Syst., 2020

Introduction to the Special Issue on the 2nd IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS 2020).

[BibT_eX]

[DOI]

Maurizio Valle

Hai Li

IEEE J. Emerg. Sel. Topics Circuits Syst., 2020

Exploring NEURAghe: A Customizable Template for APSoC-Based CNN Inference at the Edge.

[BibT_eX]

[DOI]

IEEE Embed. Syst. Lett., 2020

XpulpNN: Enabling Energy Efficient and Flexible Inference of Quantized Neural Network on RISC-V based IoT End Nodes.

[BibT_eX]

[DOI]

CoRR, 2020

Graphene-based Wireless Agile Interconnects for Massive Heterogeneous Multi-chip Processors.

[BibT_eX]

[DOI]

CoRR, 2020

Always-On 674uW @ 4GOP/s Error Resilient Binary Neural Networks with Aggressive SRAM Voltage Scaling on a 22nm IoT End-Node.

[BibT_eX]

[DOI]

Alfio Di Mauro

CoRR, 2020

Technical Report: NEMO DNN Quantization for Deployment Model.

[BibT_eX]

[DOI]

CoRR, 2020

Memory-Latency-Accuracy Trade-Offs for Continual Learning on a RISC-V Extreme-Edge Node.

[BibT_eX]

[DOI]

Proceedings of the IEEE Workshop on Signal Processing Systems, 2020

A Mixed-Precision RISC-V Processor for Extreme-Edge DNN Inference.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE Computer Society Annual Symposium on VLSI, 2020

XpulpNN: Accelerating Quantized Neural Networks on RISC-V Processors Through ISA Extensions.

[BibT_eX]

[DOI]

Proceedings of the 2020 Design, Automation & Test in Europe Conference & Exhibition, 2020

Enabling mixed-precision quantized neural networks in extreme-edge devices.

[BibT_eX]

[DOI]

Proceedings of the 17th ACM International Conference on Computing Frontiers, 2020

Temporal Variability Analysis in sEMG Hand Grasp Recognition using Temporal Convolutional Networks.

[BibT_eX]

[DOI]

Proceedings of the 2nd IEEE International Conference on Artificial Intelligence Circuits and Systems, 2020

2019

A 64-mW DNN-Based Visual Navigation Engine for Autonomous Nano-Drones.

[BibT_eX]

[DOI]

IEEE Internet Things J., 2019

PULP-NN: Accelerating Quantized Neural Networks on Parallel Ultra-Low-Power RISC-V Processors.

[BibT_eX]

[DOI]

CoRR, 2019

Optimally Scheduling CNN Convolutions for Efficient Memory Access.

[BibT_eX]

[DOI]

Arthur Stoutchinin

CoRR, 2019

PULP-NN: A Computing Library for Quantized Neural Network inference at the edge on RISC-V Based Parallel Ultra Low Power Clusters.

[BibT_eX]

[DOI]

Proceedings of the 26th IEEE International Conference on Electronics, Circuits and Systems, 2019

An Open Source and Open Hardware Deep Learning-Powered Visual Navigation Engine for Autonomous Nano-UAVs.

[BibT_eX]

[DOI]

Daniele Palossi

Proceedings of the 15th International Conference on Distributed Computing in Sensor Systems, 2019

DORY: Lightweight memory hierarchy management for deep NN inference on IoT endnodes: work-in-progress.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis Companion, 2019

Optimization and deployment of CNNs at the edge: the ALOHA experience.

[BibT_eX]

[DOI]

Ilias Theodorakopoulos

Michael Masin

Francesca Palumbo

Proceedings of the 16th ACM International Conference on Computing Frontiers, 2019

2018

NEURAghe: Exploiting CPU-FPGA Synergies for Efficient and Flexible CNN Inference Acceleration on Zynq SoCs.

[BibT_eX]

[DOI]

ACM Trans. Reconfigurable Technol. Syst., 2018

A Heterogeneous Multicore System on Chip for Energy Efficient Brain Inspired Computing.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. II Express Briefs, 2018

XNOR Neural Engine: A Hardware Accelerator IP for 21.6-fJ/op Binary Neural Network Inference.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2018

Ultra Low Power Deep-Learning-powered Autonomous Nano Drones.

[BibT_eX]

[DOI]

CoRR, 2018

Architecture-aware design and implementation of CNN algorithms for embedded inference: the ALOHA project.

[BibT_eX]

[DOI]

Ilias Theodorakopoulos

Proceedings of the 30th International Conference on Microelectronics, 2018

ALOHA: an architectural-aware framework for deep learning at the edge.

[BibT_eX]

[DOI]

Ilias Theodorakopoulos

Michael Masin

Francesca Palumbo

Proceedings of the Workshop on INTelligent Embedded Systems Architectures and Applications, 2018

Quantized NNs as the definitive solution for inference on low-power ARM MCUs?: work-in-progress.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis, 2018

Chipmunk: A systolically scalable 0.9 mm2, 3.08Gop/s/mW @ 1.2 mW accelerator for near-sensor recurrent neural network inference.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Custom Integrated Circuits Conference, 2018

Thermal image-based CNN's for ultra-low power people recognition.

[BibT_eX]

[DOI]

Andres Gomez

Proceedings of the 15th ACM International Conference on Computing Frontiers, 2018

GAP-8: A RISC-V SoC for AI at the Edge of the IoT.

[BibT_eX]

[DOI]

Proceedings of the 29th IEEE International Conference on Application-specific Systems, 2018

2017

Accelerated Visual Context Classification on a Low-Power Smartwatch.

[BibT_eX]

[DOI]

IEEE Trans. Hum. Mach. Syst., 2017

An IoT Endpoint System-on-Chip for Secure and Energy-Efficient Near-Sensor Analytics.

[BibT_eX]

[DOI]

Robert Schilling

Antonio Pullini

Frank Kagan Gürkaynak

Michael Muehlberghuber

IEEE Trans. Circuits Syst. I Regul. Pap., 2017

A Self-Aware Architecture for PVT Compensation and Power Nap in Near Threshold Processors.

[BibT_eX]

[DOI]

Igor Loi

Antonio Pullini

Thomas Christoph Müller

IEEE Des. Test, 2017

Chipmunk: A Systolically Scalable 0.9 mm2, 3.08 Gop/s/mW @ 1.2 mW Accelerator for Near-Sensor Recurrent Neural Network Inference.

[BibT_eX]

[DOI]

CoRR, 2017

Slow and steady wins the race? A comparison of ultra-low-power RISC-V cores for Internet-of-Things applications.

[BibT_eX]

[DOI]

Proceedings of the 27th International Symposium on Power and Timing Modeling, 2017

Multi-core data analytics SoC with a flexible 1.76 Gbit/s AES-XTS cryptographic accelerator in 65 nm CMOS.

[BibT_eX]

[DOI]

Frank K. Gürkaynak

Robert Schilling

Michael Muehlberghuber

Stefan Mangard

Proceedings of the Fourth Workshop on Cryptography and Security in Computing Systems, 2017

An Ultra-Low Power Address-Event Sensor Interface for Energy-Proportional Time-to-Information Extraction.

[BibT_eX]

[DOI]

Alfio Di Mauro

Proceedings of the 54th Annual Design Automation Conference, 2017

2016

Heterogeneous Architectures For Parallel Acceleration.

[BibT_eX]

[DOI]

PhD thesis, 2016

He-P2012: Performance and Energy Exploration of Architecturally Heterogeneous Many-Cores.

[BibT_eX]

[DOI]

J. Signal Process. Syst., 2016

PULP: A Ultra-Low Power Parallel Accelerator for Energy-Efficient and Flexible Embedded Vision.

[BibT_eX]

[DOI]

J. Signal Process. Syst., 2016

On-the-fly adaptivity for process networks over shared-memory platforms.

[BibT_eX]

[DOI]

Microprocess. Microsystems, 2016

A high-efficiency runtime reconfigurable IP for CNN acceleration on a mid-range all-programmable SoC.

[BibT_eX]

[DOI]

Proceedings of the International Conference on ReConFigurable Computing and FPGAs, 2016

A heterogeneous multi-core system-on-chip for energy efficient brain inspired vision.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Circuits and Systems, 2016

Enabling the heterogeneous accelerator model on ultra-low power microcontroller platforms.

[BibT_eX]

[DOI]

Proceedings of the 2016 Design, Automation & Test in Europe Conference & Exhibition, 2016

Curbing the roofline: a scalable and flexible architecture for CNNs on FPGA.

[BibT_eX]

[DOI]

Proceedings of the ACM International Conference on Computing Frontiers, CF'16, 2016

2015

PULP: A parallel ultra low power platform for next generation IoT applications.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE Hot Chips 27 Symposium (HCS), 2015

A ultra-low-energy convolution engine for fast brain-inspired vision in multicore clusters.

[BibT_eX]

[DOI]

Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, 2015

2014

Energy-efficient vision on the PULP platform for ultra-low power parallel computing.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE Workshop on Signal Processing Systems, 2014

Online process transformation for polyhedral process networks in shared-memory MPSoCs.

[BibT_eX]

[DOI]

Proceedings of the 3rd Mediterranean Conference on Embedded Computing, 2014

A Stream Buffer Mechanism for Pervasive Splitting Transformations on Polyhedral Process Networks.

[BibT_eX]

[DOI]

Proceedings of the 2nd International Workshop on Many-core Embedded Systems, 2014

Tightly-coupled hardware support to dynamic parallelism acceleration in embedded shared memory clusters.

[BibT_eX]

[DOI]

Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2014

Brain-Inspired Classroom Occupancy Monitoring on a Low-Power Mobile Platform.

[BibT_eX]

[DOI]

Antonio Pullini

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014

He-P2012: Architectural heterogeneity exploration on a scalable many-core platform.

[BibT_eX]

[DOI]

Proceedings of the IEEE 25th International Conference on Application-Specific Systems, 2014

2013

Synthesis-friendly techniques for tightly-coupled integration of hardware accelerators into shared-memory multi-core clusters.

[BibT_eX]

[DOI]

Andrea Marongiu