Matthew Mattina

According to our database1, Matthew Mattina authored at least 47 papers between 2002 and 2023.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2023
Design Principles for Lifelong Learning AI Accelerators.
CoRR, 2023

2022
UDC: Unified DNAS for Compressible TinyML Models.
CoRR, 2022

UDC: Unified DNAS for Compressible TinyML Models for Neural Processing Units.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

S2TA: Exploiting Structured Sparsity for Energy-Efficient Mobile CNN Acceleration.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022

2021
Compressing RNNs to Kilobyte Budget for IoT Devices Using Kronecker Products.
ACM J. Emerg. Technol. Comput. Syst., 2021

Doping: A technique for efficient compression of LSTM models using sparse structured additive matrices.
CoRR, 2021

Information contraction in noisy binary neural networks and its implications.
CoRR, 2021

On the effects of quantisation on model uncertainty in Bayesian neural networks.
Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence, 2021

Doping: A technique for Extreme Compression of LSTM Models using Sparse Structured Additive Matrices.
Proceedings of Machine Learning and Systems 2021, 2021

MicroNets: Neural Network Architectures for Deploying TinyML Applications on Commodity Microcontrollers.
Proceedings of Machine Learning and Systems 2021, 2021

Co-Designing Hardware and Models for Efficient On-Device ML Inference.
Proceedings of the IEEE/ACM International Symposium on Low Power Electronics and Design, 2021

Strong data processing inequality in neural networks with noisy neurons and its implications.
Proceedings of the IEEE International Symposium on Information Theory, 2021

Debiasing Model Updates for Improving Personalized Federated Training.
Proceedings of the 38th International Conference on Machine Learning, 2021

Federated Learning Based on Dynamic Regularization.
Proceedings of the 9th International Conference on Learning Representations, 2021

Towards Efficient Point Cloud Graph Neural Networks Through Architectural Simplification.
Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, 2021

2020
Sparse Systolic Tensor Array for Efficient CNN Hardware Acceleration.
CoRR, 2020

High Throughput Matrix-Matrix Multiplication between Asymmetric Bit-Width Operands.
CoRR, 2020

Compressing Language Models using Doped Kronecker Products.
CoRR, 2020

Noisy Machines: Understanding Noisy Neural Networks and Enhancing Robustness to Analog Hardware Errors Using Distillation.
CoRR, 2020

Systolic Tensor Array: An Efficient Structured-Sparse GEMM Accelerator for Mobile CNN Inference.
IEEE Comput. Archit. Lett., 2020

Searching for Winograd-aware Quantized Networks.
Proceedings of Machine Learning and Systems 2020, 2020

A Systematic Methodology for Characterizing Scalability of DNN Accelerators using SCALE-Sim.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2020

TinyLSTMs: Efficient Neural Speech Enhancement for Hearing Aids.
Proceedings of the Interspeech 2020, 2020

ISP4ML: The Role of Image Signal Processing in Efficient Deep Learning Vision Systems.
Proceedings of the 25th International Conference on Pattern Recognition, 2020

Rank and run-time aware compression of NLP Applications.
Proceedings of SustaiNLP: Workshop on Simple and Efficient Natural Language Processing, 2020

Efficient Residue Number System Based Winograd Convolution.
Proceedings of the Computer Vision - ECCV 2020, 2020

Ternary MobileNets via Per-Layer Hybrid Filter Banks.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019
ISP4ML: Understanding the Role of Image Signal Processing in Efficient Deep Learning Vision Systems.
CoRR, 2019

Compressing RNNs for IoT devices by 15-38x using Kronecker Products.
CoRR, 2019

Measuring scheduling efficiency of RNNs for NLP applications.
CoRR, 2019

FixyNN: Efficient Hardware for Mobile Computer Vision via Transfer Learning.
CoRR, 2019

Pushing the limits of RNN Compression.
Proceedings of the Fifth Workshop on Energy Efficient Machine Learning and Cognitive Computing, 2019

SpArSe: Sparse Architecture Search for CNNs on Resource-Constrained Microcontrollers.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

FixyNN: Energy-Efficient Real-Time Mobile Computer Vision Hardware Acceleration via Transfer Learning.
Proceedings of Machine Learning and Systems 2019, 2019

Ternary Hybrid Neural-Tree Networks for Highly Constrained IoT Applications.
Proceedings of Machine Learning and Systems 2019, 2019

Learning Low-precision Neural Networks without Straight-Through Estimator (STE).
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019

Run-Time Efficient RNN Compression for Inference on Edge Devices.
Proceedings of the 2nd Workshop on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications, 2019

Efficient Winograd or Cook-Toom Convolution Kernel Implementation on Widely Used Mobile CPUs.
Proceedings of the 2nd Workshop on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications, 2019

2018
Efficient and Robust Machine Learning for Real-World Systems.
CoRR, 2018

Energy Efficient Hardware for On-Device CNN Inference via Transfer Learning.
CoRR, 2018

SCALE-Sim: Systolic CNN Accelerator.
CoRR, 2018

Mobile Machine Learning Hardware at ARM: A Systems-on-Chip (SoC) Perspective.
CoRR, 2018

Euphrates: Algorithm-SoC Co-Design for Low-Power Mobile Continuous Vision.
Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture, 2018

2008
TILE64 - Processor: A 64-Core SoC with Mesh Interconnect.
Proceedings of the 2008 IEEE International Solid-State Circuits Conference, 2008

2007
On-Chip Interconnection Architecture of the Tile Processor.
IEEE Micro, 2007

2006
Last level cache (LLC) performance of data mining workloads on a CMP - a case study of parallel bioinformatics workloads.
Proceedings of the 12th International Symposium on High-Performance Computer Architecture, 2006

2002
Tarantula: A Vector Extension to the Alpha Architecture.
Proceedings of the 29th International Symposium on Computer Architecture (ISCA 2002), 2002


  Loading...