Nuno Roma

IEEE Des. Test, October, 2023

Trading Performance, Power, and Area on Low-Precision Posit MAC Units for CNN Training.

[BibT_eX]

[DOI]

Proceedings of the 35th IEEE International Symposium on Computer Architecture and High Performance Computing, 2023

GPU Acceleration of MIP Intra Prediction in VVC.

[BibT_eX]

[DOI]

Proceedings of the 31st European Signal Processing Conference, 2023

Neural Network Predictor for Fast Channel Change on DVB Set-Top-Boxes.

[BibT_eX]

[DOI]

Proceedings of the Design and Architecture for Signal and Image Processing, 2023

Supporting RISC-V Performance Counters Through Linux Performance Analysis Tools.

[BibT_eX]

[DOI]

Proceedings of the 34th IEEE International Conference on Application-specific Systems, 2023

2022

Unified Posit/IEEE-754 Vector MAC Unit for Transprecision Computing.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. II Express Briefs, 2022

Compiling for Vector Extensions With Stream-Based Specialization.

[BibT_eX]

[DOI]

IEEE Micro, 2022

Decoupling GPGPU voltage-frequency scaling for deep-learning applications.

[BibT_eX]

[DOI]

Francisco Mendes

J. Parallel Distributed Comput., 2022

gem5-ndp: Near-Data Processing Architecture Simulation From Low Level Caches to DRAM.

[BibT_eX]

[DOI]

Proceedings of the 2022 IEEE 34th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), 2022

Early prototyping and testing of CERN LHC CMS high-granularity calorimeter slow-control system.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Workshop on Rapid System Prototyping, 2022

Mode-Adaptive Subsampling of SAD/SSE Operations for Intra Prediction Cost Reduction.

[BibT_eX]

[DOI]

Marcel Moscarelli Corrêa

Proceedings of the IEEE International Symposium on Circuits and Systems, 2022

2021

A Compute Cache System for Signal Processing Applications.

[BibT_eX]

[DOI]

J. Signal Process. Syst., 2021

A Reconfigurable Posit Tensor Unit with Variable-Precision Arithmetic and Automatic Data Streaming.

[BibT_eX]

[DOI]

J. Signal Process. Syst., 2021

Compiler-Assisted Data Streaming for Regular Code Structures.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2021

Fast and energy-efficient approximate motion estimation architecture for real-time 4 K UHD processing.

[BibT_eX]

[DOI]

J. Real Time Image Process., 2021

Unlimited Vector Extension with Data Streaming Support.

[BibT_eX]

[DOI]

Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021

Positnn: Training Deep Neural Networks with Mixed Low-Precision Posit.

[BibT_eX]

[DOI]

Gonçalo Raposo

Proceedings of the IEEE International Conference on Acoustics, 2021

2020

UHD 8K energy-quality scalable HEVC intra-prediction SAD unit hardware using optimized and configurable imprecise adders.

[BibT_eX]

[DOI]

Marcel Moscarelli Corrêa

J. Real Time Image Process., 2020

Dynamic Fused Multiply-Accumulate Posit Unit with Variable Exponent Size for Low-Precision DSP Applications.

[BibT_eX]

[DOI]

Proceedings of the IEEE Workshop on Signal Processing Systems, 2020

2PSA: An Optimized and Flexible Power-Precision Scalable Adder.

[BibT_eX]

[DOI]

Luciano Agostini

Proceedings of the 33rd Symposium on Integrated Circuits and Systems Design, 2020

Exploiting Non-conventional DVFS on GPUs: Application to Deep Learning.

[BibT_eX]

[DOI]

Francisco Mendes

Proceedings of the 32nd IEEE International Symposium on Computer Architecture and High Performance Computing, 2020

Processing Convolutional Neural Networks on Cache.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Reconfigurable Stream-based Tensor Unit with Variable-Precision Posit Arithmetic.

[BibT_eX]

[DOI]

Proceedings of the 31st IEEE International Conference on Application-specific Systems, 2020

2019

Modeling and Decoupling the GPU Power Consumption for Cross-Domain DVFS.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2019

DVFS-aware application classification to improve GPGPUs energy efficiency.

[BibT_eX]

[DOI]

Parallel Comput., 2019

Flying tourist problem: Flight time and cost minimization in complex routes.

[BibT_eX]

[DOI]

Rafael Marques

Luís M. S. Russo

Expert Syst. Appl., 2019

GPU Static Modeling Using PTX and Deep Structured Learning.

[BibT_eX]

[DOI]

IEEE Access, 2019

Power-Efficient Approximate SAD Architecture with LOA Imprecise Adders.

[BibT_eX]

[DOI]

Luciano Agostini

Proceedings of the 10th IEEE Latin American Symposium on Circuits & Systems, 2019

Heart Disease Detection Architecture for Lead I Off-the-Person ECG Monitoring Devices.

[BibT_eX]

[DOI]

Proceedings of the 27th European Signal Processing Conference, 2019

2018

Stream data prefetcher for the GPU memory interface.

[BibT_eX]

[DOI]

J. Supercomput., 2018

Highly parallel HEVC decoding for heterogeneous systems with CPU and GPU.

[BibT_eX]

[DOI]

Biao Wang

Gabriel Falcão Paiva Fernandes

Mauricio Alvarez-Mesa

Signal Process. Image Commun., 2018

Exploiting Compute Caches for Memory Bound Vector Operations.

[BibT_eX]

[DOI]

Proceedings of the 30th International Symposium on Computer Architecture and High Performance Computing, 2018

GPGPU Power Modeling for Multi-domain Voltage-Frequency Scaling.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018

2017

Adaptive In-Cache Streaming for Efficient Data Management.

[BibT_eX]

[DOI]

IEEE Trans. Very Large Scale Integr. Syst., 2017

GHEVC: An Efficient HEVC Decoder for Graphics Processing Units.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2017

Special issue on real-time energy-aware circuits and systems for HEVC and for its 3D and SVC extensions.

[BibT_eX]

[DOI]

J. Real Time Image Process., 2017

GPU Parallelization of HEVC In-Loop Filters.

[BibT_eX]

[DOI]

Biao Wang

Mauricio Alvarez-Mesa

Int. J. Parallel Program., 2017

Efficient parallelization of perturbative Monte Carlo QM/MM simulations in heterogeneous platforms.

[BibT_eX]

[DOI]

Int. J. High Perform. Comput. Appl., 2017

Energy-efficient motion estimation with approximate arithmetic.

[BibT_eX]

[DOI]

Luciano Volcan Agostini

Proceedings of the 19th IEEE International Workshop on Multimedia Signal Processing, 2017

2016

Adaptive Scheduling Framework for Real-Time Video Encoding on Heterogeneous Systems.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., 2016

BowMapCL: Burrows-Wheeler Mapping on Multiple Heterogeneous Accelerators.

[BibT_eX]

[DOI]

David Nogueira

IEEE ACM Trans. Comput. Biol. Bioinform., 2016

GPU-assisted HEVC intra decoder.

[BibT_eX]

[DOI]

J. Real Time Image Process., 2016

Exploiting task and data parallelism for advanced video coding on hybrid CPU + GPU platforms.

[BibT_eX]

[DOI]

J. Real Time Image Process., 2016

Multi-objective kernel mapping and scheduling for morphable many-core architectures.

[BibT_eX]

[DOI]

Expert Syst. Appl., 2016

Editorial to special issue on energy efficient architectures for embedded systems.

[BibT_eX]

[DOI]

José L. Núñez-Yáñez

EURASIP J. Embed. Syst., 2016

A Cross-Core Performance Model for Heterogeneous Many-Core Architectures.

[BibT_eX]

[DOI]

Rui Pinheiro

Proceedings of the High Performance Computing for Computational Science - VECPAR 2016, 2016

Efficient HEVC decoder for heterogeneous CPU with GPU systems.

[BibT_eX]

[DOI]

Biao Wang

Mauricio Alvarez-Mesa

Proceedings of the 18th IEEE International Workshop on Multimedia Signal Processing, 2016

Unsupervised variable-grained online phase clustering for heterogeneous/morphable processors.

[BibT_eX]

[DOI]

Miguel Tairum Cruz

Proceedings of the International Conference on High Performance Computing & Simulation, 2016

In-Cache Streaming: Morphable Infrastructure for Many-Core Processing Systems.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2016: Parallel Processing Workshops, 2016

Performance and Power-Aware Classification for Frequency Scaling of GPGPU Applications.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2016: Parallel Processing Workshops, 2016

2015

Multicore SIMD ASIP for Next-Generation Sequencing and Alignment Biochip Platforms.

[BibT_eX]

[DOI]

David Martins de Matos

IEEE Trans. Very Large Scale Integr. Syst., 2015

Morphable hundred-core heterogeneous architecture for energy-aware computation.

[BibT_eX]

[DOI]

IET Comput. Digit. Tech., 2015

Acceleration of stochastic seismic inversion in OpenCL-based heterogeneous platforms.

[BibT_eX]

[DOI]

Comput. Geosci., 2015

Implementation and performance analysis of efficient index structures for DNA search algorithms in parallel platforms.

[BibT_eX]

[DOI]

Gustavo Encarnação

Concurr. Comput. Pract. Exp., 2015

HEVC in-loop filters GPU parallelization in embedded systems.

[BibT_eX]

[DOI]

Proceedings of the 2015 International Conference on Embedded Computer Systems: Architectures, 2015

Multi-kernel Auto-Tuning on GPUs: Performance and Energy-Aware Optimization.

[BibT_eX]

[DOI]

Proceedings of the 23rd Euromicro International Conference on Parallel, 2015

Energy-Efficient Architecture for DP Local Sequence Alignment: Exploiting ILP and DLP.

[BibT_eX]

[DOI]

Miguel Tairum Cruz

Proceedings of the Bioinformatics and Biomedical Engineering, 2015

Run-Time Machine Learning for HEVC/H.265 Fast Partitioning Decision.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Symposium on Multimedia, 2015

High performance IP core for HEVC quantization.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Symposium on Circuits and Systems, 2015

Towards GPU HEVC intra decoding: Seizing fine-grain parallelism.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Multimedia and Expo, 2015

GPU acceleration of the HEVC decoder inter prediction module.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE Global Conference on Signal and Information Processing, 2015

Efficient data-stream management for shared-memory many-core systems.

[BibT_eX]

[DOI]

Proceedings of the 25th International Conference on Field Programmable Logic and Applications, 2015

Fast and Scalable Thread Migration for Multi-core Architectures.

[BibT_eX]

[DOI]

Miguel Rodrigues

Proceedings of the 13th IEEE International Conference on Embedded and Ubiquitous Computing, 2015

2014

Dynamic Load Balancing for Real-Time Video Encoding on Heterogeneous CPU+GPU Systems.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2014

Unified transform architecture for AVC, AVS, VC-1 and HEVC high-performance codecs.

[BibT_eX]

[DOI]

EURASIP J. Adv. Signal Process., 2014

Cache-Oblivious parallel SIMD Viterbi decoding for sequence search in HMMER.

[BibT_eX]

[DOI]

Miguel Ferreira

Luís M. S. Russo

BMC Bioinform., 2014

Stream Oriented Modular Architecture with Polymorphic Processing Engines.

[BibT_eX]

[DOI]

Proceedings of the 26th IEEE International Symposium on Computer Architecture and High Performance Computing Workshop, 2014

FEVES: Framework for Efficient Parallel Video Encoding on Heterogeneous Systems.

[BibT_eX]

[DOI]

Proceedings of the 43rd International Conference on Parallel Processing, 2014

Collaborative inter-prediction on CPU+GPU systems.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE International Conference on Image Processing, 2014

Cooperative CPU+GPU deblocking filter parallelization for high performance HEVC video codecs.

[BibT_eX]

[DOI]

Leonel Augusto Sousa

Proceedings of the IEEE International Conference on Acoustics, 2014

Optimized ASIP architecture for compressed BWT-indexed search in bioinformatics applications.

[BibT_eX]

[DOI]

Proceedings of the International Conference on High Performance Computing & Simulation, 2014

Burrows-Wheeler Transform based indexed exact search on a multi-GPU OpenCL platform.

[BibT_eX]

[DOI]

David Nogueira

Proceedings of the International Conference on High Performance Computing & Simulation, 2014

Low-power vectorial VLIW architecture for maximum parallelism exploitation of dynamic programming algorithms.

[BibT_eX]

[DOI]

Miguel Tairum Cruz

Proceedings of the International Conference on High Performance Computing & Simulation, 2014

Opencl parallelization of the HEVC de-quantization and inverse transform for heterogeneous platforms.

[BibT_eX]

[DOI]

Proceedings of the 22nd European Signal Processing Conference, 2014

GPU Accelerated Stochastic Inversion of Deep Water Seismic Data.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2014: Parallel Processing Workshops, 2014

2013

Scalable Unified Transform Architecture for Advanced Video Coding Embedded Systems.

[BibT_eX]

[DOI]

Int. J. Parallel Program., 2013

Configurable and scalable class of high performance hardware accelerators for simultaneous DNA sequence alignment.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2013

HotStream: Efficient Data Streaming of Complex Patterns to Multiple Accelerating Kernels.

[BibT_eX]

[DOI]

Proceedings of the 25th International Symposium on Computer Architecture and High Performance Computing, 2013

Transparent Application Acceleration by Intelligent Scheduling of Shared Library Calls on Heterogeneous Systems.

[BibT_eX]

[DOI]

Proceedings of the Parallel Processing and Applied Mathematics, 2013

A flexible shared library profiler for early estimation of performance gains in heterogeneous systems.

[BibT_eX]

[DOI]

Proceedings of the International Conference on High Performance Computing & Simulation, 2013

Scalable and high throughput biosensing platform.

[BibT_eX]

[DOI]

Proceedings of the 23rd International Conference on Field programmable Logic and Applications, 2013

High performance multi-standard architecture for DCT computation in H.264/AVC High Profile and HEVC codecs.

[BibT_eX]

[DOI]

Proceedings of the 2013 Conference on Design and Architectures for Signal and Image Processing, 2013

BioBlaze: Multi-core SIMD ASIP for DNA sequence alignment.

[BibT_eX]

[DOI]

Andre Patricio

David Martins de Matos

Proceedings of the 24th International Conference on Application-Specific Systems, 2013

2012

Integrated Hardware Architecture for Efficient Computation of the $n$-Best Bio-Sequence Local Alignments in Embedded Platforms.

[BibT_eX]

[DOI]

IEEE Trans. Very Large Scale Integr. Syst., 2012

Hardware accelerator architecture for simultaneous short-read DNA sequences alignment with enhanced traceback phase.

[BibT_eX]

[DOI]

Microprocess. Microsystems, 2012

System-level prototyping framework for heterogeneous multi-core architecture applied to biological sequence analysis.

[BibT_eX]

[DOI]

Pedro Magalhães

Proceedings of the 23rd IEEE International Symposium on Rapid System Prototyping, 2012

Multi-level Parallelization of Advanced Video Coding on Hybrid CPU+GPU Platforms.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2012: Parallel Processing Workshops, 2012

High Performance Unified Architecture for Forward and Inverse Quantization in H.264/AVC.

[BibT_eX]

[DOI]

Proceedings of the 15th Euromicro Conference on Digital System Design, 2012

2011

A flexible architecture for the computation of direct and inverse transforms in H.264/AVC video codecs.

[BibT_eX]

[DOI]

IEEE Trans. Consumer Electron., 2011

A tutorial overview on the properties of the discrete cosine transform for encoded image and video processing.

[BibT_eX]

[DOI]

Signal Process., 2011

High throughput and scalable architecture for unified transform coding in embedded H.264/AVC video coding systems.

[BibT_eX]

[DOI]

Proceedings of the 2011 International Conference on Embedded Computer Systems: Architectures, 2011

Advantages and GPU implementation of high-performance indexed DNA search based on suffix arrays.

[BibT_eX]

[DOI]

Gustavo Encarnação

Proceedings of the 2011 International Conference on High Performance Computing & Simulation, 2011

2010

p264: open platform for designing parallel H.264/AVC video encoders on multi-core systems.

[BibT_eX]

[DOI]

António Rodrigues

Proceedings of the Network and Operating System Support for Digital Audio and Video, 2010

H.264/AVC framework for multi-core embedded video encoders.

[BibT_eX]

[DOI]

Proceedings of the 2010 International Symposium on System on Chip, SoC 2010, Tampere, 2010

Integrated accelerator architecture for DNA sequences alignment with enhanced traceback phase.

[BibT_eX]

[DOI]

Proceedings of the 2010 International Conference on High Performance Computing & Simulation, 2010

Hardware/software co-design of H.264/AVC encoders for multi-core embedded systems.

[BibT_eX]

[DOI]

Tiago Jose Barreiros Martins de Almeida

Proceedings of the 2010 Conference on Design & Architectures for Signal & Image Processing, 2010

A Parallel Programming Framework for Multi-core DNA Sequence Alignment.

[BibT_eX]

[DOI]

Nuno Filipe Valentim Roma

Proceedings of the CISIS 2010, 2010

2009

Distributed Software Platform for Automation and Control of General Anaesthesia.

[BibT_eX]

[DOI]

Gesner Passos

Bertinho Andrade da Costa

João Miranda Lemos

Proceedings of the Eighth International Symposium on Parallel and Distributed Computing, 2009

2008

Application Specific Programmable IP Core for Motion Estimation: Technology Comparison Targeting Efficient Embedded Co-Processing Units.

[BibT_eX]

[DOI]

Proceedings of the 11th Euromicro Conference on Digital System Design: Architectures, 2008

2007

Reconfigurable architectures and processors for real-time video motion estimation.

[BibT_eX]

[DOI]

J. Real Time Image Process., 2007

Adaptive Motion Estimation Processor for Autonomous Video Devices.

[BibT_eX]

[DOI]

Nuno Filipe Valentim Roma

EURASIP J. Embed. Syst., 2007

Efficient Hybrid DCT-Domain Algorithm for Video Spatial Downscaling.

[BibT_eX]

[DOI]

EURASIP J. Adv. Signal Process., 2007

Adaptive Motion Estimation Algorithm for H.264/AVC.

[BibT_eX]

[DOI]

Proceedings of the 15th International Conference on Digital Signal Processing, 2007

2006

Low Power Distance Measurement Unit for Real-Time Hardware Motion Estimators.

[BibT_eX]

[DOI]

Proceedings of the Integrated Circuit and System Design. Power and Timing Modeling, 2006

Application Specific Instruction Set Processor for Adaptive Video Motion Estimation.

[BibT_eX]

[DOI]

Proceedings of the Ninth Euromicro Conference on Digital System Design: Architectures, Methods and Tools (DSD 2006), 30 August, 2006

2005

Efficient VLSI Architecture for Real-Time Motion Estimation in Advanced Video Coding.

[BibT_eX]

[DOI]

Proceedings of the Proceedings 2005 IEEE International SOC Conference, 2005

Least squares motion estimation algorithm in the compressed DCT domain for H.26x/MPEG-x video sequences.

[BibT_eX]

[DOI]

Proceedings of the Advanced Video and Signal Based Surveillance, 2005

2003

Automatic Synthesis of Motion Estimation Processors Based on a New Class of Hardware Architectures.

[BibT_eX]

[DOI]

J. VLSI Signal Process., 2003

Fast transcoding architectures for insertion of non-regular shaped objects in the compressed DCT-domain.

[BibT_eX]

[DOI]

Signal Process. Image Commun., 2003

Customisable Core-Based Architectures for Real-Time Motion Estimation on FPGAs.

[BibT_eX]

[DOI]

Proceedings of the Field Programmable Logic and Application, 13th International Conference, 2003

2002

Efficient and configurable full-search block-matching processors.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., 2002

Insertion of irregular-shaped logos in the compressed DCT domain.

[BibT_eX]

[DOI]

Proceedings of the 14th International Conference on Digital Signal Processing, 2002

2001

A New Efficient VLSI Architecture for Full Search Block Matching Motion Estimation.

[BibT_eX]

Proceedings of the SOC Design Methodologies, 2001

2000

In the Development and Evaluation of Specialized Processors for Computing High-Order 2-D Image Moments in Real-Time.

[BibT_eX]

[DOI]

Proceedings of the Fifth International Workshop on Computer Architectures for Machine Perception (CAMP 2000), 2000

1999

Low-power array architectures for motion estimation.

[BibT_eX]

[DOI]