Shen-Fu Hsiao

Orcid: 0000-0002-4627-570X

According to our database1, Shen-Fu Hsiao authored at least 80 papers between 1991 and 2022.

Collaborative distances:
  • Dijkstra number2 of five.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2022
A 40.96-GOPS 196.8-mW Digital Logic Accelerator Used in DNN for Underwater Object Recognition.
IEEE Trans. Circuits Syst. I Regul. Pap., 2022

Dynamically Swappable Digit-Serial Multi-Precision Deep Neural Network Accelerator with Early Termination.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2022

A Power Effective DLA for PBs in Opto-Electrical Neural Network Architecture.
Proceedings of the IEEE Asia Pacific Conference on Circuit and Systems, 2022

2021
Efficient Quantization and Multi-Precision Design of Arithmetic Components for Deep Learning.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2021

Comparison of Digit-Serial and Bit-Level Designs for Acceleration of Convolutional Neural Network Computation.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2021

Quantization of Deep Neural Network Models Considering Per-Layer Computation Complexity for Efficient Execution in Multi-Precision Accelerators.
Proceedings of the IEEE International Conference on Consumer Electronics-Taiwan, 2021

Efficient Computation of Depthwise Separable Convolution in MoblieNet Deep Neural Network Models.
Proceedings of the IEEE International Conference on Consumer Electronics-Taiwan, 2021

Multi-threaded System Design of A Multi-Precision Deep Learning Accelerator on FPGA with Optimized Memory Usage.
Proceedings of the IEEE International Conference on Consumer Electronics-Taiwan, 2021

2020
Design of a Sparsity-Aware Reconfigurable Deep Learning Accelerator Supporting Various Types of Operations.
IEEE J. Emerg. Sel. Topics Circuits Syst., 2020

Flexible Multi-Precision Accelerator Design for Deep Convolutional Neural Networks Considering Both Data Computation and Communication.
Proceedings of the 2020 International Symposium on VLSI Design, Automation and Test, 2020

Hardware Efficient Function Computation Based on Optimized Piecewise Polynomial Approximation.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2020

Sparsity-Aware Deep Learning Accelerator Design Supporting CNN and LSTM Operations.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2020

2019
Dual-Precision Acceleration of Convolutional Neural Network Computation with Mixed Input and Output Data Reuse.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2019

Low-Complexity Deep Neural Networks for Image Object Classification and Detection.
Proceedings of the 2019 IEEE Asia Pacific Conference on Circuits and Systems, 2019

Multi-Precision Table-Addition Designs for Computing Nonlinear Functions in Deep Neural Networks.
Proceedings of the 2019 IEEE Asia Pacific Conference on Circuits and Systems, 2019

2018
Hardware design of disparity computation for stereo vision using guided image filtering.
Proceedings of the 2018 International Symposium on VLSI Design, 2018

Optimization of Lookup Table Size in Table-Bound Design of Function Computation.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2018

Design Tradeoff of Internal Memory Size and Memory Access Energy in Deep Neural Network Hardware Accelerators.
Proceedings of the IEEE 7th Global Conference on Consumer Electronics, 2018

Design and Implementation of Low-Cost LK Optical Flow Computation for Images of Single and Multiple Levels.
Proceedings of the 21st Euromicro Conference on Digital System Design, 2018

Architectural Exploration of Function Computation Based on Cubic Polynomial Interpolation with Application in Deep Neural Networks.
Proceedings of the 21st Euromicro Conference on Digital System Design, 2018

2017
Hierarchical Multipartite Function Evaluation.
IEEE Trans. Computers, 2017

Hardware efficient implementation of histograms of oriented gradients for pedestrian detection.
Proceedings of the IEEE 6th Global Conference on Consumer Electronics, 2017

2016
Low-power dual-precision table-based function evaluation supporting dynamic precision changes.
Proceedings of the 2016 IEEE Asia Pacific Conference on Circuits and Systems, 2016

Hardware design of histograms of oriented gradients based on local binary pattern and binarization.
Proceedings of the 2016 IEEE Asia Pacific Conference on Circuits and Systems, 2016

2015
Table Size Reduction Methods for Faithfully Rounded Lookup-Table-Based Multiplierless Function Evaluation.
IEEE Trans. Circuits Syst. II Express Briefs, 2015

An OpenGL ES 2.0 3D graphics SoC with versatile HW/SW development support.
Proceedings of the VLSI Design, Automation and Test, 2015

Low-power and high-performance design of OpenGL ES 2.0 graphics processing unit for mobile applications.
Proceedings of the 2015 IEEE International Conference on Digital Signal Processing, 2015

2014
VLSI implementations of stereo matching using Dynamic Programming.
Proceedings of the Technical Papers of 2014 International Symposium on VLSI Design, 2014

Design of low-leakage multi-port SRAM for register file in graphics processing unit.
Proceedings of the IEEE International Symposium on Circuits and Systemss, 2014

Design and Implementation of Multiple-Vehicle Detection and Tracking Systems with Machine Learning.
Proceedings of the 17th Euromicro Conference on Digital System Design, 2014

Compression of Lookup Table for Piecewise Polynomial Function Evaluation.
Proceedings of the 17th Euromicro Conference on Digital System Design, 2014

VLSI implementation of belief-propagation-based stereo matching with linear-model message update.
Proceedings of the 2014 IEEE Asia Pacific Conference on Circuits and Systems, 2014

2013
Design of Hardware Function Evaluators Using Low-Overhead Nonuniform Segmentation With Address Remapping.
IEEE Trans. Very Large Scale Integr. Syst., 2013

Low-Cost FIR Filter Designs Based on Faithfully Rounded Truncated Multiple Constant Multiplication/Accumulation.
IEEE Trans. Circuits Syst. II Express Briefs, 2013

Design of a programmable vertex processor in OpenGL ES 2.0 mobile graphics processing units.
Proceedings of the 2013 International Symposium on VLSI Design, Automation, and Test, 2013

2012
Two-Level Hardware Function Evaluation Based on Correction of Normalized Piecewise Difference Functions.
IEEE Trans. Circuits Syst. II Express Briefs, 2012

Low latency design of Depth-Image-Based Rendering using hybrid warping and hole-filling.
Proceedings of the 2012 IEEE International Symposium on Circuits and Systems, 2012

Low-cost designs of rectangular to polar coordinate converters for digital communication.
Proceedings of the IEEE Asia Pacific Conference on Circuits and Systems, 2012

Asynchronous AHB bus interface designs in a multiple-clock-domain graphics system.
Proceedings of the IEEE Asia Pacific Conference on Circuits and Systems, 2012

2011
Design and Application of Faithfully Rounded and Truncated Multipliers With Combined Deletion, Reduction, Truncation, and Rounding.
IEEE Trans. Circuits Syst. II Express Briefs, 2011

Designs of angle-rotation in digital frequency synthesizer/mixer using multi-stage architectures.
Proceedings of the Conference Record of the Forty Fifth Asilomar Conference on Signals, 2011

2010
Low Area/Power Synthesis Using Hybrid Pass Transistor/CMOS Logic Cells in Standard Cell-Based Design Environment.
IEEE Trans. Circuits Syst. II Express Briefs, 2010

A new non-uniform segmentation and addressing remapping strategy for hardware-oriented function evaluators based on polynomial approximation.
Proceedings of the International Symposium on Circuits and Systems (ISCAS 2010), May 30, 2010

Design of table-based function evaluators with reduced memory size Using a bottom-up non-uniform segmentation method.
Proceedings of the IEEE Asia Pacific Conference on Circuits and Systems, 2010

2009
Low Cost Design of an Advanced Encryption Standard (AES) Processor Using a New Common-Subexpression-Elimination Algorithm.
IEICE Trans. Fundam. Electron. Commun. Comput. Sci., 2009

An 8.69 Mvertices/s 278 Mpixels/s tile-based 3D graphics SoC HW/SW development for consumer electronics.
Proceedings of the 14th Asia South Pacific Design Automation Conference, 2009

2008
An automatic hardware generator for special arithmetic functions using various ROM-based approximation approaches.
Proceedings of the International Symposium on Circuits and Systems (ISCAS 2008), 2008

Area oriented pass-transistor logic synthesis using buffer elimination and layout compaction.
Proceedings of the International Symposium on Circuits and Systems (ISCAS 2008), 2008

Efficient designs of flaoting-point CORDIC rotation and vectoring operations.
Proceedings of the IEEE Asia Pacific Conference on Circuits and Systems, 2008

Efficient pre-clipping and clipping algorithms for 3D graphics geometry computation.
Proceedings of the IEEE Asia Pacific Conference on Circuits and Systems, 2008

2006
Memory-free low-cost designs of advanced encryption standard using common subexpression elimination for subfunctions in transformations.
IEEE Trans. Circuits Syst. I Regul. Pap., 2006

Novel Memory Organization and Circuit Designs for Efficient Data Access in Applications of 3D Graphics and Multimedia Coding.
Proceedings of the 14th IEEE International Workshop on Memory Technology, 2006

Efficient Pass-Transistor-Logic Synthesis for Sequential Circuits.
Proceedings of the IEEE Asia Pacific Conference on Circuits and Systems 2006, 2006

An Automatic Cache Generator Based on Content-Addressable Memory.
Proceedings of the IEEE Asia Pacific Conference on Circuits and Systems 2006, 2006

2005
Low-error carry-free fixed-width multipliers with low-cost compensation circuits.
IEEE Trans. Circuits Syst. II Express Briefs, 2005

Efficient VLSI Implementations of Fast Multiplierless Approximated DCT Using Parameterized Hardware Modules for Silicon Intellectual Property Design.
IEEE Trans. Circuits Syst. I Regul. Pap., 2005

A Cell-Driven Multiplier Generator with Delay Optimization of Partial Products Compression and an Efficient Partition Technique for the Final Addition.
IEICE Trans. Inf. Syst., 2005

An efficient pass-transistor-logic synthesizer using multiplexers and inverters only.
Proceedings of the International Symposium on Circuits and Systems (ISCAS 2005), 2005

2004
Para-CORDIC: parallel CORDIC rotation algorithm.
IEEE Trans. Circuits Syst. I Regul. Pap., 2004

A memory-efficient and high-speed sine/cosine generator based on parallel CORDIC rotations.
IEEE Signal Process. Lett., 2004

2003
Design and implementation of a video-oriented network-interface-card system.
Proceedings of the 2003 Asia and South Pacific Design Automation Conference, 2003

2002
High-performance Multiplexer-based Logic Synthesis Using Pass-transistor Logic.
VLSI Design, 2002

Partition methodology for the final adder in a tree-structure parallel multiplier generator.
Proceedings of the IEEE Asia Pacific Conference on Circuits and Systems 2002, 2002

2001
Parallel, Pipelined and Folded Architectures for Computation of 1-D and 2-D DCT in Image and Video Codec.
J. VLSI Signal Process., 2001

A new hardware-efficient algorithm and architecture for computation of 2-D DCTs on a linear array.
IEEE Trans. Circuits Syst. Video Technol., 2001

2000
Redundant Constant-Factor Implementation of Multi-Dimensional CORDIC and Its Application to Complex SVD.
J. VLSI Signal Process., 2000

VLSI design of an efficient embedded zerotree wavelet coder with function of digital watermarking.
IEEE Trans. Consumer Electron., 2000

Low-cost unified architectures for the computation of discrete trigonometric transforms.
Proceedings of the IEEE International Conference on Acoustics, 2000

1999
A cost-efficient and fully-pipelinable architecture for DCT/IDCT.
IEEE Trans. Consumer Electron., 1999

A high-speed constant-factor redundant CORDIC processor without extra correcting or scaling iterations.
Proceedings of the 1999 International Symposium on Circuits and Systems, ISCAS 1999, Orlando, Florida, USA, May 30, 1999

Design and performance verification of ALUs for 64-bit 8-issue superscaler microprocessors using 0.25 um CMOS technology.
Proceedings of the 6th IEEE International Conference on Electronics, Circuits and Systems, 1999

New hardware-efficient algorithm and architecture for the computation of 2-D DCT on a linear systolic array.
Proceedings of the 1999 IEEE International Conference on Acoustics, 1999

A high-throughput, low power architecture and its VLSI implementation for DFT/IDFT computation.
Proceedings of the 1999 IEEE International Conference on Acoustics, 1999

1998
Design, Implementation and Analysis of a New Redundant CORDIC Processor with Constant Scaling Factor and Regular Structure.
J. VLSI Signal Process., 1998

1997
New unified VLSI architectures for computing DFT and other transforms.
Proceedings of the 1997 IEEE International Conference on Acoustics, 1997

1996
Parallel singular value decomposition of complex matrices using multidimensional CORDIC algorithms.
IEEE Trans. Signal Process., 1996

1995
Householder CORDIC Algorithms.
IEEE Trans. Computers, 1995

Adaptive Jacobi method for parallel singular value decompositions.
Proceedings of the 1995 International Conference on Acoustics, 1995

1994
Parallel processing of complex data using quaternion and pseudo-quaternion CORDIC algorithms.
Proceedings of the International Conference on Application Specific Array Processors, 1994

1991
The CORDIC Householder algorithm.
Proceedings of the 10th IEEE Symposium on Computer Arithmetic, 1991


  Loading...