# Leonel Sousa

According to our database1, Leonel Sousa authored at least 236 papers between 1997 and 2018.

Collaborative distances:

Book
In proceedings
Article
PhD thesis
Other

## Bibliography

2018
Highly parallel HEVC decoding for heterogeneous systems with CPU and GPU.
Sig. Proc.: Image Comm., 2018

Temperature-aware dynamic voltage and frequency scaling enabled MPSoC modeling using Stochastic Activity Networks.
Microprocessors and Microsystems - Embedded Hardware Design, 2018

Guest Editors' Introduction.
Int. J. Semantic Computing, 2018

MrBayes sMC3.
IJHPCA, 2018

A Survey on Fully Homomorphic Encryption: An Engineering Perspective.
ACM Comput. Surv., 2018

Beamformed Fingerprint Learning for Accurate Millimeter Wave Positioning.
CoRR, 2018

Performability-Based Workflow Scheduling in Grids.
Comput. J., 2018

Analysis of Scheduling Policies in Metaheuristics for Evolutionary Biology.
Proceedings of the 6th International Workshop on Parallelism in Bioinformatics, 2018

Towards Efficient Modular Adders based on Reversible Circuits.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2018

Configurable N-fold Hardware Architecture for Convolutional Neural Networks.
Proceedings of the 2018 International Conference on Biomedical Engineering and Applications, 2018

Data-Aided Fast Beamforming Selection for 5G.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017
An Efficient Component for Designing Signed Reverse Converters for a Class of RNS Moduli Sets of Composite Form {2k, 2P-1}.
IEEE Trans. VLSI Syst., 2017

GHEVC: An Efficient HEVC Decoder for Graphics Processing Units.
IEEE Trans. Multimedia, 2017

A Reduced-Bias Approach With a Lightweight Hard-Multiple Generator to Design a Radix-8 Modulo 2n + 1 Multiplier.
IEEE Trans. on Circuits and Systems, 2017

Arithmetical Improvement of the Round-Off for Cryptosystems in High-Dimensional Lattices.
IEEE Trans. Computers, 2017

Beyond the Roofline: Cache-Aware Power and Energy-Efficiency Modeling for Multi-Cores.
IEEE Trans. Computers, 2017

Special issue on real-time energy-aware circuits and systems for HEVC and for its 3D and SVC extensions.
J. Real-Time Image Processing, 2017

Performance and power modeling and evaluation of virtualized servers in IaaS clouds.
Inf. Sci., 2017

GPU Parallelization of HEVC In-Loop Filters.
International Journal of Parallel Programming, 2017

Efficient reductions in cyclotomic rings - Application to R-LWE based FHE schemes.
IACR Cryptology ePrint Archive, 2017

Cache-aware Roofline Model in Intel® Advisor.
ERCIM News, 2017

Sign Detection and Number Comparison on RNS 3-Moduli Sets $$\{2^n-1, 2^{n+x}, 2^n+1\}$$.
CSSP, 2017

Accelerating the phylogenetic parsimony function on heterogeneous systems.
Concurrency and Computation: Practice and Experience, 2017

Energy-aware mechanism for stencil-based MPDATA algorithm with constraints.
Concurrency and Computation: Practice and Experience, 2017

A Multifunctional Unit for Designing Efficient RNS-Based Datapaths.
IEEE Access, 2017

Design Space Exploration of LDPC Decoders Using High-Level Synthesis.
IEEE Access, 2017

A stochastic number representation for fully homomorphic cryptography.
Proceedings of the 2017 IEEE International Workshop on Signal Processing Systems, 2017

Modeling Large Compute Nodes with Heterogeneous Memories with Cache-Aware Roofline Model.
Proceedings of the High Performance Computing Systems. Performance Modeling, Benchmarking, and Simulation, 2017

Pipelined FPGA coprocessor for elliptic curve cryptography based on residue number system.
Proceedings of the 2017 International Conference on Embedded Computer Systems: Architectures, 2017

Efficient Reductions in Cyclotomic Rings - Application to Ring-LWE Based FHE Schemes.
Proceedings of the Selected Areas in Cryptography - SAC 2017, 2017

Energy-efficient motion estimation with approximate arithmetic.
Proceedings of the 19th IEEE International Workshop on Multimedia Signal Processing, 2017

Exploring GPU performance, power and energy-efficiency bounds with Cache-aware Roofline Modeling.
Proceedings of the 2017 IEEE International Symposium on Performance Analysis of Systems and Software, 2017

Analyzing Performance of Multi-cores and Applications with Cache-aware Roofline Model.
Proceedings of the 2017 International Conference on High Performance Computing & Simulation, 2017

Performance Analysis with Cache-Aware Roofline Model in Intel Advisor.
Proceedings of the 2017 International Conference on High Performance Computing & Simulation, 2017

On Boosting Energy-Efficiency of Heterogeneous Embedded Systems via Game Theory.
Proceedings of the 8th Workshop and 6th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures and Design Tools and Architectures for Multicore Embedded Computing Platforms, 2017

TrustZone-backed bitcoin wallet.
Proceedings of the Fourth Workshop on Cryptography and Security in Computing Systems, 2017

2016
Adaptive Scheduling Framework for Real-Time Video Encoding on Heterogeneous Systems.
IEEE Trans. Circuits Syst. Video Techn., 2016

A Framework for Application-Guided Task Management on Heterogeneous Embedded Systems.
TACO, 2016

GPU-assisted HEVC intra decoder.
J. Real-Time Image Processing, 2016

Exploiting task and data parallelism for advanced video coding on hybrid CPU + GPU platforms.
J. Real-Time Image Processing, 2016

Method for designing two levels RNS reverse converters for large dynamic ranges.
Integration, 2016

Guest Editors' Introduction.
Int. J. Semantic Computing, 2016

Ubiquitous Multimedia: Emerging Research on Multimedia Computing.
IEEE MultiMedia, 2016

A Survey on Programmable LDPC Decoders.
IEEE Access, 2016

HPC on the Intel Xeon Phi: Homomorphic Word Searching.
Proceedings of the High Performance Computing for Computational Science - VECPAR 2016, 2016

Efficient HEVC decoder for heterogeneous CPU with GPU systems.
Proceedings of the 18th IEEE International Workshop on Multimedia Signal Processing, 2016

Area-delay-power-aware adder placement method for RNS reverse converter design.
Proceedings of the IEEE 7th Latin American Symposium on Circuits & Systems, 2016

Enhancing Data Parallelism of Fully Homomorphic Encryption.
Proceedings of the Information Security and Cryptology - ICISC 2016 - 19th International Conference, Seoul, South Korea, November 30, 2016

High-Level Designs of Complex FIR Filters on FPGAs for the SKA.
Proceedings of the 18th IEEE International Conference on High Performance Computing and Communications; 14th IEEE International Conference on Smart City; 2nd IEEE International Conference on Data Science and Systems, 2016

2015
Reverse Converter Design via Parallel-Prefix Adders: Novel Components, Methodology, and Implementations.
IEEE Trans. VLSI Syst., 2015

Arithmetic-Based Binary-to-RNS Converter Modulo {2n±k} for jn-bit Dynamic Range.
IEEE Trans. VLSI Syst., 2015

Base Transformation With Injective Residue Mapping for Dynamic Range Reduction in RNS.
IEEE Trans. on Circuits and Systems, 2015

2n RNS Scalers for Extended 4-Moduli Sets.
IEEE Trans. Computers, 2015

Real-time implementation of remotely sensed hyperspectral image unmixing on GPUs.
J. Real-Time Image Processing, 2015

Attaining performance fairness in big.LITTLE systems.
Proceedings of the 12th International Workshop on Intelligent Solutions in Embedded Systems, 2015

Accelerating Phylogenetic Inference on Heterogeneous OpenCL Platforms.
Proceedings of the 2015 IEEE TrustCom/BigDataSE/ISPA, 2015

HEVC in-loop filters GPU parallelization in embedded systems.
Proceedings of the 2015 International Conference on Embedded Computer Systems: Architectures, 2015

Run-Time Machine Learning for HEVC/H.265 Fast Partitioning Decision.
Proceedings of the 2015 IEEE International Symposium on Multimedia, 2015

Featuring Immediate Revocation in Mikey-Sakke (FIRM).
Proceedings of the 2015 IEEE International Symposium on Multimedia, 2015

RNS reverse converters based on the new Chinese Remainder Theorem I.
Proceedings of the 2015 IEEE International Symposium on Circuits and Systems, 2015

High performance IP core for HEVC quantization.
Proceedings of the 2015 IEEE International Symposium on Circuits and Systems, 2015

Towards GPU HEVC intra decoding: Seizing fine-grain parallelism.
Proceedings of the 2015 IEEE International Conference on Multimedia and Expo, 2015

Stretching the limits of Programmable Embedded Devices for Public-key Cryptography.
Proceedings of the Second Workshop on Cryptography and Security in Computing Systems, 2015

GPU acceleration of the HEVC decoder inter prediction module.
Proceedings of the 2015 IEEE Global Conference on Signal and Information Processing, 2015

Programmable RNS lattice-based parallel cryptographic decryption.
Proceedings of the 26th IEEE International Conference on Application-specific Systems, 2015

2014
An Efficient Scalable RNS Architecture for Large Dynamic Ranges.
Signal Processing Systems, 2014

A Flexible Architecture for Modular Arithmetic Hardware Accelerators based on RNS.
Signal Processing Systems, 2014

Dynamic Load Balancing for Real-Time Video Encoding on Heterogeneous CPU+GPU Systems.
IEEE Trans. Multimedia, 2014

Efficient Method for Designing Modulo {2n ± k} Multipliers.
Journal of Circuits, Systems, and Computers, 2014

Unified transform architecture for AVC, AVS, VC-1 and HEVC high-performance codecs.
EURASIP J. Adv. Sig. Proc., 2014

Method for Designing Efficient Mixed Radix Multipliers.
CSSP, 2014

Cache-aware Roofline model: Upgrading the loft.
Computer Architecture Letters, 2014

On the Evaluation of Multi-core Systems with SIMD Engines for Public-Key Cryptography.
Proceedings of the 26th IEEE International Symposium on Computer Architecture and High Performance Computing Workshop, 2014

Performance-Aware Task Management and Frequency Scaling in Embedded Systems.
Proceedings of the 26th IEEE International Symposium on Computer Architecture and High Performance Computing, 2014

Accelerating Phylogenetic Inference on GPUs: an OpenACC and CUDA comparison.
Proceedings of the International Work-Conference on Bioinformatics and Biomedical Engineering, 2014

ROM-less RNS-to-binary converter moduli {22n - 1, 22n + 1, 2n - 3, 2n + 3}.
Proceedings of the 2014 International Symposium on Integrated Circuits (ISIC), 2014

Method for designing multi-channel RNS architectures to prevent power analysis SCA.
Proceedings of the IEEE International Symposium on Circuits and Systemss, 2014

FEVES: Framework for Efficient Parallel Video Encoding on Heterogeneous Systems.
Proceedings of the 43rd International Conference on Parallel Processing, 2014

Collaborative inter-prediction on CPU+GPU systems.
Proceedings of the 2014 IEEE International Conference on Image Processing, 2014

Reconfigurable data flow engine for HEVC motion estimation.
Proceedings of the 2014 IEEE International Conference on Image Processing, 2014

Cooperative CPU+GPU deblocking filter parallelization for high performance HEVC video codecs.
Proceedings of the IEEE International Conference on Acoustics, 2014

Opencl parallelization of the HEVC de-quantization and inverse transform for heterogeneous platforms.
Proceedings of the 22nd European Signal Processing Conference, 2014

Nonlinear system identification using constellation based multiple model adaptive estimators.
Proceedings of the 22nd European Signal Processing Conference, 2014

SchedMon: A Performance and Energy Monitoring Tool for Modern Multi-cores.
Proceedings of the Euro-Par 2014: Parallel Processing Workshops, 2014

Combining flexibility with low power: Dataflow and wide-pipeline LDPC decoding engines in the Gbit/s era.
Proceedings of the IEEE 25th International Conference on Application-Specific Systems, 2014

2013
On the Design of RNS Reverse Converters for the Four-Moduli Set ${\bf\{2^{\mmb n}+1, 2^{\mmb n}-1, 2^{\mmb n}, 2^{{\mmb n}+1}+1\}}$.
IEEE Trans. VLSI Syst., 2013

A Lab Project on the Design and Implementation of Programmable and Configurable Embedded Systems.
IEEE Trans. Education, 2013

Method to Design General RNS Reverse Converters for Extended Moduli Sets.
IEEE Trans. on Circuits and Systems, 2013

RNS Reverse Converters for Moduli Sets With Dynamic Ranges up to (8n+1)-bit.
IEEE Trans. on Circuits and Systems, 2013

The CRNS framework and its application to programmable and reconfigurable cryptography.
TACO, 2013

Scalable Unified Transform Architecture for Advanced Video Coding Embedded Systems.
International Journal of Parallel Programming, 2013

Randomised multi-modulo residue number system architecture for double-and-add to prevent power analysis side channel attacks.
IET Circuits, Devices & Systems, 2013

Monitoring Performance and Power for Application Characterization with the Cache-Aware Roofline Model.
Proceedings of the Parallel Processing and Applied Mathematics, 2013

Stressing the BER simulation of LDPC codes in the error floor region using GPU clusters.
Proceedings of the ISWCS 2013, 2013

A comparison of computing architectures and parallelization frameworks based on a two-dimensional FDTD.
Proceedings of the International Conference on High Performance Computing & Simulation, 2013

An RNS-based architecture targeting hardware accelerators for modular arithmetic.
Proceedings of the IEEE International Conference on Acoustics, 2013

Open the Gates: Using High-level Synthesis towards programmable LDPC decoders on FPGAs.
Proceedings of the IEEE Global Conference on Signal and Information Processing, 2013

Accelerating the Computation of Induced Dipoles for Molecular Mechanics with Dataflow Engines.
Proceedings of the 21st IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2013

High performance multi-standard architecture for DCT computation in H.264/AVC High Profile and HEVC codecs.
Proceedings of the 2013 Conference on Design and Architectures for Signal and Image Processing, 2013

DARNS: A randomized multi-modulo RNS architecture for double-and-add in ECC to prevent power analysis side channel attacks.
Proceedings of the 18th Asia and South Pacific Design Automation Conference, 2013

A compact and scalable RNS architecture.
Proceedings of the 24th International Conference on Application-Specific Systems, 2013

2012
Corrections to "MRC-Based RNS Reverse Converters for the Four-Moduli Sets 2n+1, 2n-1, 2n, 22n+1-1 and 2n+1, 2n-1, 22n, 22n+1-1".
IEEE Trans. on Circuits and Systems, 2012

MRC-Based RNS Reverse Converters for the Four-Moduli Sets 2n+1, 2n-1, 2n, 22n+1-1 and 2n+1, 2n-1, 22n, 22n+1-1.
IEEE Trans. on Circuits and Systems, 2012

Portable LDPC Decoding on Multicores Using OpenCL [Applications Corner].
IEEE Signal Process. Mag., 2012

Fine-grain parallelism using multi-core, Cell/BE, and GPU Systems.
Parallel Computing, 2012

Computation of Induced Dipoles in Molecular Mechanics Simulations Using Graphics Processors.
Journal of Chemical Information and Modeling, 2012

Configurable M-factor VLSI DVB-S2 LDPC decoder architecture with optimized memory tiling design.
EURASIP J. Wireless Comm. and Networking, 2012

RNS-Based Elliptic Curve Point Multiplication for Massive Parallel Architectures.
Comput. J., 2012

Energy efficient stream-based configurable architecture for embedded platforms.
Proceedings of the 2012 International Conference on Embedded Computer Systems: Architectures, 2012

On Realistic Divisible Load Scheduling in Highly Heterogeneous Distributed Systems.
Proceedings of the 20th Euromicro International Conference on Parallel, 2012

Simultaneous Multi-Level Divisible Load Balancing for Heterogeneous Desktop Systems.
Proceedings of the 10th IEEE International Symposium on Parallel and Distributed Processing with Applications, 2012

Multi-level Parallelization of Advanced Video Coding on Hybrid CPU+GPU Platforms.
Proceedings of the Euro-Par 2012: Parallel Processing Workshops, 2012

Hierarchical Partitioning Algorithm for Scientific Computing on Highly Heterogeneous CPU + GPU Clusters.
Proceedings of the Euro-Par 2012 Parallel Processing - 18th International Conference, 2012

VLSI Reverse Converter for RNS Based on the Moduli Set.
Proceedings of the 15th Euromicro Conference on Digital System Design, 2012

RNS Arithmetic Units for Modulo {2^n+-k}.
Proceedings of the 15th Euromicro Conference on Digital System Design, 2012

High Performance Unified Architecture for Forward and Inverse Quantization in H.264/AVC.
Proceedings of the 15th Euromicro Conference on Digital System Design, 2012

Efficient implementation of multi-moduli architectures for Binary-to-RNS conversion.
Proceedings of the 17th Asia and South Pacific Design Automation Conference, 2012

2011
Modeling and Evaluating Non-shared Memory CELL/BE Type Multi-core Architectures for Local Image and Video Processing.
Signal Processing Systems, 2011

Massively LDPC Decoding on Multicore Architectures.
IEEE Trans. Parallel Distrib. Syst., 2011

A flexible architecture for the computation of direct and inverse transforms in H.264/AVC video codecs.
IEEE Trans. Consumer Electronics, 2011

A tutorial overview on the properties of the discrete cosine transform for encoded image and video processing.
Signal Processing, 2011

Parallel Computing - Special Issue.
Parallel Computing, 2011

CHPS: An Environment for Collaborative Execution on Heterogeneous Desktop Systems.
IJNC, 2011

High throughput and scalable architecture for unified transform coding in embedded H.264/AVC video coding systems.
Proceedings of the 2011 International Conference on Embedded Computer Systems: Architectures, 2011

Real-time DVB-S2 LDPC decoding on many-core GPU accelerators.
Proceedings of the IEEE International Conference on Acoustics, 2011

Introduction.
Proceedings of the Euro-Par 2011 Parallel Processing - 17th International Conference, 2011

Scheduling Divisible Loads on Heterogeneous Desktop Systems with Limited Memory.
Proceedings of the Euro-Par 2011: Parallel Processing Workshops - CCPI, CGWS, HeteroPar, HiBB, HPCVirt, HPPC, HPSS, MDGS, ProPer, Resilience, UCHPC, VHPC, Bordeaux, France, August 29, 2011

Binary-to-RNS Conversion Units for moduli {2^n ± 3}.
Proceedings of the 14th Euromicro Conference on Digital System Design, 2011

Virtualization for Morphable Multi-Cores.
Proceedings of the ARCS 2011, 2011

2010
Measuring and Extraction of Biological Information on New Handheld Biochip-Based Microsystem.
IEEE Trans. Instrumentation and Measurement, 2010

On the Modeling of New Tunnel Junction Magnetoresistive Biosensors.
IEEE Trans. Instrumentation and Measurement, 2010

A quantitative analysis of firing rate estimators: Unveiling bias sources.
Neurocomputing, 2010

An improved RNS generator 2n +/- k based on threshold logic.
Proceedings of the 18th IEEE/IFIP VLSI-SoC 2010, 2010

Unifying stream based and reconfigurable computing to design application accelerators.
Proceedings of the 18th IEEE/IFIP VLSI-SoC 2010, 2010

Embedded multicore architectures for LDPC decoding.
Proceedings of the 2010 International Conference on Embedded Computer Systems: Architectures, 2010

Programming Cell/BE and GPUs systems for real-time video encoding.
Proceedings of the Real-Time Image and Video Processing 2010, 2010

p264: open platform for designing parallel H.264/AVC video encoders on multi-core systems.
Proceedings of the Network and Operating System Support for Digital Audio and Video, 2010

An improved RNS reverse converter for the {22n+1-1, 2n, 2n-1} moduli set.
Proceedings of the International Symposium on Circuits and Systems (ISCAS 2010), May 30, 2010

Collaborative execution environment for heterogeneous parallel systems.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Exploiting SIMD extensions for linear image processing with OpenCL.
Proceedings of the 28th International Conference on Computer Design, 2010

High-Performance Computing on Heterogeneous Systems: Database Queries on CPU and GPU.
Proceedings of the High Performance Computing: From Grids and Clouds to Exascale, 2010

Arithmetic Units for RNS Moduli {2n-3} and {2n+3} Operations.
Proceedings of the 13th Euromicro Conference on Digital System Design, 2010

Hardware/software co-design of H.264/AVC encoders for multi-core embedded systems.
Proceedings of the 2010 Conference on Design & Architectures for Signal & Image Processing, 2010

Iterative induced dipoles computation for molecular mechanics on GPUs.
Proceedings of 3rd Workshop on General Purpose Processing on Graphics Processing Units, 2010

Elliptic Curve point multiplication on GPUs.
Proceedings of the 21st IEEE International Conference on Application-specific Systems Architectures and Processors, 2010

Efficient Independent Component Analysis on a GPU.
Proceedings of the 10th IEEE International Conference on Computer and Information Technology, 2010

2009
A Feature Selection Algorithm for the Regularization of Neuron Models.
IEEE Trans. Instrumentation and Measurement, 2009

A Portable and Autonomous Magnetic Detection Platform for Biosensing.
Sensors, 2009

Modelling and programming stream-based distributed computing based on the meta-pipeline approach.
IJPEDS, 2009

Parallel LDPC Decoding on GPUs Using a Stream-Based Computing Approach.
J. Comput. Sci. Technol., 2009

Neural code metrics: Analysis and application to the assessment of neural models.
Neurocomputing, 2009

Development and evaluation of scalable video motion estimators on GPU.
Proceedings of the IEEE Workshop on Signal Processing Systems, 2009

Applying the Stream-Based Computing Model to Design Hardware Accelerators: A Case Study.
Proceedings of the Embedded Computer Systems: Architectures, 2009

On the design of distributed autonomous embedded systems for biomedical applications.
Proceedings of the 3rd International Conference on Pervasive Computing Technologies for Healthcare, 2009

CaravelaMPI: Message Passing Interface for Parallel GPU-Based Applications.
Proceedings of the Eighth International Symposium on Parallel and Distributed Computing, 2009

Distributed Software Platform for Automation and Control of General Anaesthesia.
Proceedings of the Eighth International Symposium on Parallel and Distributed Computing, 2009

How GPUs can outperform ASICs for fast LDPC decoding.
Proceedings of the 23rd international conference on Supercomputing, 2009

Fine-grain Parallelism Using Multi-core, Cell/BE, and GPU Systems: Accelerating the Phylogenetic Likelihood Function.
Proceedings of the ICPP 2009, 2009

Multi-core platforms for signal processing: source and channel coding.
Proceedings of the 2009 IEEE International Conference on Multimedia and Expo, 2009

Parallel LDPC Decoding on the Cell/B.E. Processor.
Proceedings of the High Performance Embedded Architectures and Compilers, 2009

Compact and Flexible Microcoded Elliptic Curve Processor for Reconfigurable Devices.
Proceedings of the FCCM 2009, 2009

Preface.
Proceedings of the Euro-Par 2009, 2009

2008
Cost-Efficient SHA Hardware Accelerators.
IEEE Trans. VLSI Syst., 2008

Statistical Analysis of a Spike Train Distance in Poisson Models.
IEEE Signal Process. Lett., 2008

Parallel Advanced Video Coding: Motion Estimation on Multi-cores.
Scalable Computing: Practice and Experience, 2008

Massive parallel LDPC decoding on GPU.
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2008

Edge Stream Oriented LDPC Decoding.
Proceedings of the 16th Euromicro International Conference on Parallel, 2008

Heuristic Optimization Methods for Improving Performance of Recursive General Purpose Applications on GPUs.
Proceedings of the 7th International Symposium on Parallel and Distributed Computing (ISPDC 2008), 2008

Distributed Web-based Platform for Computer Architecture Simulation.
Proceedings of the 7th International Symposium on Parallel and Distributed Computing (ISPDC 2008), 2008

Design and implementation of a tool for modeling and programming deadlock free meta-pipeline applications.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

BRAM-LUT Tradeoff on a Polymorphic DES Design.
Proceedings of the High Performance Embedded Architectures and Compilers, 2008

Efficient FPGA elliptic curve cryptographic processor over GF(2m).
Proceedings of the 2008 International Conference on Field-Programmable Technology, 2008

On-the-fly attestation of reconfigurable hardware.
Proceedings of the FPL 2008, 2008

Application Specific Programmable IP Core for Motion Estimation: Technology Comparison Targeting Efficient Embedded Co-Processing Units.
Proceedings of the 11th Euromicro Conference on Digital System Design: Architectures, 2008

An RNS based Specific Processor for Computing the Minimum Sum-of-Absolute-Differences.
Proceedings of the 11th Euromicro Conference on Digital System Design: Architectures, 2008

Merged Computation for Whirlpool Hashing.
Proceedings of the Design, Automation and Test in Europe, 2008

A Parallel Algorithm for Advanced Video Motion Estimation on Multicore Architectures.
Proceedings of the Second International Conference on Complex, 2008

Low power microarchitecture with instruction reuse.
Proceedings of the 5th Conference on Computing Frontiers, 2008

Towards a Unified Model for the Retina - Static vs Dynamic Integrate and Fire Models.
Proceedings of the First International Conference on Biomedical Electronics and Devices, 2008

2007
Reconfigurable architectures and processors for real-time video motion estimation.
J. Real-Time Image Processing, 2007

Improving residue number system multiplication with more balanced moduli sets and enhanced modular arithmetic structures.
IET Computers & Digital Techniques, 2007

Embedded Systems for Portable and Mobile Video Platforms.
EURASIP J. Emb. Sys., 2007

Adaptive Motion Estimation Processor for Autonomous Video Devices.
EURASIP J. Emb. Sys., 2007

Efficient Hybrid DCT-Domain Algorithm for Video Spatial Downscaling.
EURASIP J. Adv. Sig. Proc., 2007

Caravela: A Novel Stream-Based Distributed Computing Environment.
IEEE Computer, 2007

Developing and Integrating Lab Projects as Important Learning Components in an Embedded Systems Course.
Proceedings of the IEEE International Conference on Microelectronic Systems Education, 2007

Meta-Pipeline: A New Execution Mechanism for Distributed Pipeline Processing.
Proceedings of the 6th International Symposium on Parallel and Distributed Computing (ISPDC 2007), 2007

A New Handheld Biochip-based Microsystem.
Proceedings of the International Symposium on Circuits and Systems (ISCAS 2007), 2007

Generic Architecture Designed for Biomedical Embedded Systems.
Proceedings of the Embedded System Design: Topics, Techniques and Trends, IFIP TC10 Working Conference: International Embedded Systems Symposium (IESS), May 30, 2007

Additive Logistic Regression Applied to Retina Modelling.
Proceedings of the International Conference on Image Processing, 2007

A Run-time Reconfigurable Processor for Video Motion Estimation.
Proceedings of the FPL 2007, 2007

Stochastic integrate-and-fire model for the retina.
Proceedings of the 15th European Signal Processing Conference, 2007

Data buffering optimization methods toward a uniform programming interface for gpu-based applications.
Proceedings of the 4th Conference on Computing Frontiers, 2007

Design and implementation of a stream-based distributedcomputing platform using graphics processing units.
Proceedings of the 4th Conference on Computing Frontiers, 2007

Efficient Method for Magnitude Comparison in RNS Based on Two Pairs of Conjugate Moduli.
Proceedings of the 18th IEEE Symposium on Computer Arithmetic (ARITH-18 2007), 2007

2006
Toward a Realistic Task Scheduling Model.
IEEE Trans. Parallel Distrib. Syst., 2006

A New Hand-Held Microsystem Architecture for Biological Analysis.
IEEE Trans. on Circuits and Systems, 2006

Maestro2: Experimental Evaluation of Communication Performance Improvement Techniques in the Link Layer.
Journal of Interconnection Networks, 2006

Rescheduling for Optimized SHA-1 Calculation.
Proceedings of the Embedded Computer Systems: Architectures, 2006

Low Power Distance Measurement Unit for Real-Time Hardware Motion Estimators.
Proceedings of the Integrated Circuit and System Design. Power and Timing Modeling, 2006

Reconfigurable memory based AES co-processor.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

Application Specific Instruction Set Processor for Adaptive Video Motion Estimation.
Proceedings of the Ninth Euromicro Conference on Digital System Design: Architectures, Methods and Tools (DSD 2006), 30 August, 2006

Improving SHA-2 Hardware Implementations.
Proceedings of the Cryptographic Hardware and Embedded Systems, 2006

Configurable Embedded Core for Controlling Electro-Mechanical Systems.
Proceedings of the Reconfigurable Computing: Architectures and Applications, 2006

2005
IEEE Trans. Parallel Distrib. Syst., 2005

Corrections to "A Universal Architecture for Designing Efficient Modulo 2n+1 Multipliers".
IEEE Trans. on Circuits and Systems, 2005

A universal architecture for designing efficient modulo 2n+1 multipliers.
IEEE Trans. on Circuits and Systems, 2005

Visual neuroprosthesis: a non invasive system for stimulating the cortex.
IEEE Trans. on Circuits and Systems, 2005

Efficient VLSI Architecture for Real-Time Motion Estimation in Advanced Video Coding.
Proceedings of the Proceedings 2005 IEEE International SOC Conference, 2005

On the Implementation and Evaluation of Berkeley Sockets on Maestro2 cluster computing environment.
Proceedings of the 4th International Symposium on Parallel and Distributed Computing (ISPDC 2005), 2005

Least squares motion estimation algorithm in the compressed DCT domain for H.26x/MPEG-x video sequences.
Proceedings of the Advanced Video and Signal Based Surveillance, 2005

The Midlifekicker Microarchitecture Evaluation Metric.
Proceedings of the 16th IEEE International Conference on Application-Specific Systems, 2005

2004
On Task Scheduling Accuracy: Evaluation Methodology and Results.
The Journal of Supercomputing, 2004

List scheduling: extension for contention awareness and evaluation of node priorities for heterogeneous cluster architectures.
Parallel Computing, 2004

A programmable cellular neural network circuit.
Proceedings of the 17th Annual Symposium on Integrated Circuits and Systems Design, 2004

Task Scheduling: Considering the Processor Involvement in Communication.
Proceedings of the 3rd International Symposium on Parallel and Distributed Computing (ISPDC 2004), 2004

Distributed Shared Memory System Based on the Maestro2 High Performance Cluster Network.
Proceedings of the 3rd International Symposium on Parallel and Distributed Computing (ISPDC 2004), 2004

On the performance of Maestro2 high performance network equipment, using new improvement techniques.
Proceedings of the 23rd IEEE International Performance Computing and Communications Conference, 2004

{2n+1, sn+k, sn-1}: A New RNS Moduli Set Extension.
Proceedings of the 2004 Euromicro Symposium on Digital Systems Design (DSD 2004), Architectures, Methods and Tools, 31 August, 2004

2003
Automatic Synthesis of Motion Estimation Processors Based on a New Class of Hardware Architectures.
VLSI Signal Processing, 2003

Fast transcoding architectures for insertion of non-regular shaped objects in the compressed DCT-domain.
Sig. Proc.: Image Comm., 2003

An FPL Bioinspired Visual Encoding System to Stimulate Cortical Neurons in Real-Time.
Proceedings of the Field Programmable Logic and Application, 13th International Conference, 2003

Customisable Core-Based Architectures for Real-Time Motion Estimation on FPGAs.
Proceedings of the Field Programmable Logic and Application, 13th International Conference, 2003

RDSP: A RISC DSP based on Residue Number System.
Proceedings of the 2003 Euromicro Symposium on Digital Systems Design (DSD 2003), 2003

2002
Efficient and configurable full-search block-matching processors.
IEEE Trans. Circuits Syst. Video Techn., 2002

2001
A New Efficient VLSI Architecture for Full Search Block Matching Motion Estimation.
Proceedings of the SOC Design Methodologies, 2001

Comparison of Contention Aware List Scheduling Heuristics for Cluster Computing.
Proceedings of the 30th International Workshops on Parallel Processing (ICPP 2001 Workshops), 2001

Scheduling Task Graphs on Arbitrary Processor Architectures Considering Contention.
Proceedings of the High-Performance Computing and Networking, 9th International Conference, 2001

Exploiting Unused Time Slots in List Scheduling Considering Communication Contention.
Proceedings of the Euro-Par 2001: Parallel Processing, 2001

2000
Synchronous Non-local Image Processing on Orthogonal Multiprocessor Systems.
Proceedings of the Vector and Parallel Processing, 2000

A Platform Independent Parallelising Tool Based on Graph Theoretic Models.
Proceedings of the Vector and Parallel Processing, 2000

In the Development and Evaluation of Specialized Processors for Computing High-Order 2-D Image Moments in Real-Time.
Proceedings of the Fifth International Workshop on Computer Architectures for Machine Perception (CAMP 2000), 2000

1999
Low-power array architectures for motion estimation.
Proceedings of the Third IEEE Workshop on Multimedia Signal Processing, 1999

Applying Conditional Processing to Design Low-Power Array Processors for Motion Estimation.
Proceedings of the 1999 International Conference on Image Processing, 1999

On the Development of a Video CODEC for Low Bitrate Communication in General Purpose Computers.
Proceedings of the 17th IASTED International Conference on Applied Informatics, 1999

1997
A new orthogonal multiprocessor and its application to image processing.
Proceedings of the Fourth International on High-Performance Computing, 1997