José-María Arnau

J. Parallel Distributed Comput., April, 2023

SHARP: An Adaptable, Energy-Efficient Accelerator for Recurrent Neural Networks.

[BibT_eX]

[DOI]

Reza Yazdani Aminabadi

ACM Trans. Embed. Comput. Syst., March, 2023

Irregular accesses reorder unit: improving GPGPU memory coalescing for graph-based workloads.

[BibT_eX]

[DOI]

Albert Segura

J. Supercomput., 2023

A Lightweight, Compiler-Assisted Register File Cache for GPGPU.

[BibT_eX]

[DOI]

Mojtaba Abaie Shoushtary

Jordi Tubella Murgadas

CoRR, 2023

Vitamin-V: Virtual Environment and Tool-boxing for Trustworthy Development of RISC-V based Cloud Services.

[BibT_eX]

[DOI]

CoRR, 2023

δLTA: Decoupling Camera Sampling from Processing to Avoid Redundant Computations in the Vision Pipeline.

[BibT_eX]

[DOI]

Pedro Henrique Exenberger Becker

Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

K-D Bonsai: ISA-Extensions to Compress K-D Trees for Autonomous Driving Tasks.

[BibT_eX]

[DOI]

Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

VITAMIN-V: Virtual Environment and Tool-Boxing for Trustworthy Development of RISC-V Based Cloud Services.

[BibT_eX]

[DOI]

Proceedings of the 26th Euromicro Conference on Digital System Design, 2023

Exploiting Kernel Compression on BNNs.

[BibT_eX]

[DOI]

Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2023

SLIDEX: Sliding Window Extension for Image Processing.

[BibT_eX]

[DOI]

Proceedings of the 32nd International Conference on Parallel Architectures and Compilation Techniques, 2023

2022

Energy-Efficient Stream Compaction Through Filtering and Coalescing Accesses in GPGPU Memory Partitions.

[BibT_eX]

[DOI]

Albert Segura

IEEE Trans. Computers, 2022

E-BATCH: Energy-Efficient and High-Throughput RNN Batching.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2022

CREW: Computation reuse and efficient weight storage for hardware-accelerated MLPs and RNNs.

[BibT_eX]

[DOI]

J. Syst. Archit., 2022

DNN pruning with principal component analysis and connection importance estimation.

[BibT_eX]

[DOI]

J. Syst. Archit., 2022

Saving RNN Computations with a Neuron-Level Fuzzy Memoization Scheme.

[BibT_eX]

[DOI]

CoRR, 2022

Mixture-of-Rookies: Saving DNN Computations by Predicting ReLU Outputs.

[BibT_eX]

[DOI]

CoRR, 2022

ASRPU: A Programmable Accelerator for Low-Power Automatic Speech Recognition.

[BibT_eX]

[DOI]

CoRR, 2022

2021

Exploiting Beam Search Confidence for Energy-Efficient Speech Recognition.

[BibT_eX]

[DOI]

CoRR, 2021

A Low-Power Hardware Accelerator for ORB Feature Extraction in Self-Driving Cars.

[BibT_eX]

[DOI]

Proceedings of the 33rd IEEE International Symposium on Computer Architecture and High Performance Computing, 2021

2020

LAWS: Locality-AWare Scheme for Automatic Speech Recognition.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2020

Design and Evaluation of an Ultra Low-power Human-quality Speech Recognition System.

[BibT_eX]

[DOI]

Pedro Henrique Exenberger Becker

ACM Trans. Archit. Code Optim., 2020

Demystifying Power and Performance Bottlenecks in Autonomous Driving Systems.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Workload Characterization, 2020

Boosting LSTM Performance Through Dynamic Precision Selection.

[BibT_eX]

[DOI]

Proceedings of the 27th IEEE International Conference on High Performance Computing, 2020

2019

A Low-Power, High-Performance Speech Recognition Accelerator.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2019

CGPA: Coarse-Grained Pruning of Activations for Energy-Efficient RNN Inference.

[BibT_eX]

[DOI]

IEEE Micro, 2019

LSTM-Sharp: An Adaptable, Energy-Efficient Hardware Accelerator for Long Short-Term Memory.

[BibT_eX]

[DOI]

CoRR, 2019

(Pen-) Ultimate DNN Pruning.

[BibT_eX]

[DOI]

CoRR, 2019

Neuron-Level Fuzzy Memoization in RNNs.

[BibT_eX]

[DOI]

Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019

SCU: a GPU stream compaction unit for graph processing.

[BibT_eX]

[DOI]

Albert Segura

Proceedings of the 46th International Symposium on Computer Architecture, 2019

POSTER: Leveraging Run-Time Feedback for Efficient ASR Acceleration.

[BibT_eX]

[DOI]

Proceedings of the 28th International Conference on Parallel Architectures and Compilation Techniques, 2019

2018

Performance Analysis and Optimization of Automatic Speech Recognition.

[BibT_eX]

[DOI]

IEEE Trans. Multi Scale Comput. Syst., 2018

The Dark Side of DNN Pruning.

[BibT_eX]

[DOI]

Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture, 2018

Computation Reuse in DNNs by Exploiting Input Similarity.

[BibT_eX]

[DOI]

Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture, 2018

A Novel Register Renaming Technique for Out-of-Order Processors.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018

E-PUR: an energy-efficient processing unit for recurrent neural networks.

[BibT_eX]

[DOI]

Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques, 2018

2017

Low-Power Automatic Speech Recognition Through a Mobile GPU and a Viterbi Accelerator.

[BibT_eX]

[DOI]

IEEE Micro, 2017

UNFOLD: a memory-efficient speech recognizer using on-the-fly WFST composition.

[BibT_eX]

[DOI]

Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

An Ultra Low-Power Hardware Accelerator for Acoustic Scoring in Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques, 2017

2016

An ultra low-power hardware accelerator for automatic speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016

2015

Energy-efficient mobile GPU systems.

[BibT_eX]

[DOI]

PhD thesis, 2015

2014

Eliminating redundant fragment shader executions on a mobile GPU via hardware memoization.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture, 2014

2013

TEAPOT: a toolset for evaluating performance, power and image quality on mobile graphics systems.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Supercomputing, 2013

Parallel frame rendering: Trading responsiveness for energy on a mobile GPU.

[BibT_eX]

[DOI]

Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, 2013

2012

Boosting mobile GPU performance with a decoupled access/execute fragment processor.

[BibT_eX]

[DOI]