Hayden Kwok-Hay So

Orcid: 0000-0002-6514-0237

Affiliations:
  • University of Hong Kong


According to our database1, Hayden Kwok-Hay So authored at least 104 papers between 2005 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
A Composable Dynamic Sparse Dataflow Architecture for Efficient Event-based Vision Processing on FPGA.
Proceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2024

2023
A Reconfigurable Architecture for Real-time Event-based Multi-Object Tracking.
ACM Trans. Reconfigurable Technol. Syst., December, 2023

Random resistive memory-based deep extreme point learning machine for unified visual processing.
CoRR, 2023

SpikeMOT: Event-based Multi-Object Tracking with Sparse Motion Features.
CoRR, 2023

DyBit: Dynamic Bit-Precision Numbers for Efficient Quantized Neural Network Inference.
CoRR, 2023

RSQP: Problem-specific Architectural Customization for Accelerated Convex Quadratic Optimization.
Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

Towards Asynchronously Triggered Spiking Neural Network on FPGA for Event-based Vision.
Proceedings of the International Conference on Field Programmable Technology, 2023

SqueezeBlock: A Transparent Weight Compression Scheme for Deep Neural Networks.
Proceedings of the International Conference on Field Programmable Technology, 2023

Model-Platform Optimized Deep Neural Network Accelerator Generation through Mixed-Integer Geometric Programming.
Proceedings of the 31st IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2023

MSD: Mixing Signed Digit Representations for Hardware-efficient DNN Acceleration on FPGA with Heterogeneous Resources.
Proceedings of the 31st IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2023

DPACS: Hardware Accelerated Dynamic Neural Network Pruning through Algorithm-Architecture Co-design.
Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

2022
NITI: Training Integer Neural Networks Using Integer-Only Arithmetic.
IEEE Trans. Parallel Distributed Syst., 2022

Low-Latency In Situ Image Analytics With FPGA-Based Quantized Convolutional Neural Network.
IEEE Trans. Neural Networks Learn. Syst., 2022

REMOT: A Hardware-Software Architecture for Attention-Guided Multi-Object Tracking with Dynamic Vision Sensors on FPGAs.
Proceedings of the FPGA '22: The 2022 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Virtual Event, USA, 27 February 2022, 2022

2021
High-Dimensional Dense Residual Convolutional Neural Network for Light Field Reconstruction.
IEEE Trans. Pattern Anal. Mach. Intell., 2021

Mix and Match: A Novel FPGA-Centric Deep Neural Network Quantization Framework.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2021

HAO: Hardware-aware Neural Architecture Optimization for Efficient Inference.
Proceedings of the 29th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2021

2020
PARC: ultrafast and accurate clustering of phenotypic data of millions of single cells.
Bioinform., 2020

Vision Guided Crop Detection in Field Robots using FPGA-Based Reconfigurable Computers.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2020

CSB-RNN: a faster-than-realtime RNN acceleration framework with compressed structured blocks.
Proceedings of the ICS '20: 2020 International Conference on Supercomputing, 2020

Exploiting Elasticity in Tensor Ranks for Compressing Neural Networks.
Proceedings of the 25th International Conference on Pattern Recognition, 2020

Dynamic Sparse Training: Find Efficient Sparse Network From Scratch With Trainable Masked Layers.
Proceedings of the 8th International Conference on Learning Representations, 2020

FTDL: An FPGA-tailored Architecture for Deep Learning Systems.
Proceedings of the FPGA '20: The 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2020

FTDL: A Tailored FPGA-Overlay for Deep Learning with High Scalability.
Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020

2019
GraVF-M: Graph Processing System Generation for Multi-FPGA Platforms.
ACM Trans. Reconfigurable Technol. Syst., 2019

Large-Scale Multi-Class Image-Based Cell Classification With Deep Learning.
IEEE J. Biomed. Health Informatics, 2019

Fringe Pattern Improvement and Super-Resolution Using Deep Learning in Digital Holography.
IEEE Trans. Ind. Informatics, 2019

A Real-Time Coprime Line Scan Super-Resolution System for Ultra-Fast Microscopy.
IEEE Trans. Biomed. Circuits Syst., 2019

High-Throughput Line Buffer Microarchitecture for Arbitrary Sized Streaming Image Processing.
J. Imaging, 2019

Design of quadruple precision multiplier architectures with SIMD single and double precision support.
Integr., 2019

Computational Light Field Generation Using Deep Nonparametric Bayesian Learning.
IEEE Access, 2019

PACoGen: A Hardware Posit Arithmetic Core Generator.
IEEE Access, 2019

E-LSTM: Efficient Inference of Sparse LSTM on Embedded Heterogeneous System.
Proceedings of the 56th Annual Design Automation Conference 2019, 2019

2018
Human somatic label-free bright-field cell images.
Dataset, November, 2018

Introduction to the Special Issue on Application-Specific Systems, Architectures and Processors.
J. Signal Process. Syst., 2018

An Unified Architecture for Single, Double, Double-Extended, and Quadruple Precision Division.
Circuits Syst. Signal Process., 2018

A Division-Free and Variable-Regularized LMS-Based Generalized Sidelobe Canceller for Adaptive Beamforming and Its Efficient Hardware Realization.
IEEE Access, 2018

Urban Farming in Myanmar: An Experiential Learning Project for Engineering and Science Students from Hong Kong and Myanmar.
Proceedings of the IEEE International Conference on Teaching, 2018

Architecture Generator for Type-3 Unum Posit Adder/Subtractor.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2018

Performance-Driven System Generation for Distributed Vertex-Centric Graph Processing on Multi-FPGA Systems.
Proceedings of the 28th International Conference on Field Programmable Logic and Applications, 2018

Universal number posit arithmetic generator on FPGA.
Proceedings of the 2018 Design, Automation & Test in Europe Conference & Exhibition, 2018

2017
The First 25 Years of the FPL Conference: Significant Papers.
ACM Trans. Reconfigurable Technol. Syst., 2017

Computationally Efficient Hyperspectral Data Learning Based on the Doubly Stochastic Dirichlet Process.
IEEE Trans. Geosci. Remote. Sens., 2017

Area-Efficient Architecture for Dual-Mode Double Precision Floating Point Division.
IEEE Trans. Circuits Syst. I Regul. Pap., 2017

Proceedings of the 3rd International Workshop on Overlay Architectures for FPGAs (OLAF 2017).
CoRR, 2017

Computational single-cell classification using deep learning on bright-field and phase images.
Proceedings of the Fifteenth IAPR International Conference on Machine Vision Applications, 2017

Towards Flexible Automatic Generation of Graph Processing Gateware.
Proceedings of the 8th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies, 2017

Ultra-low latency continuous block-parallel stream windowing using FPGA on-chip memory.
Proceedings of the International Conference on Field Programmable Technology, 2017

NnCore: A parameterized non-linear function generator for machine learning applications in FPGAs.
Proceedings of the International Conference on Field Programmable Technology, 2017

OLAF'17: Third International Workshop on Overlay Architectures for FPGAs.
Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017

A Parameterizable Activation Function Generator for FPGA-Based Neural Network Applications.
Proceedings of the 25th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2017

2016
FPGA High-level Synthesis versus Overlay: Comparisons on Computation Kernels.
SIGARCH Comput. Archit. News, 2016

Consistency Analysis for the Doubly Stochastic Dirichlet Process.
CoRR, 2016

Proceedings of the 2nd International Workshop on Overlay Architectures for FPGAs (OLAF 2016).
CoRR, 2016

A Soft Processor Overlay with Tightly-coupled FPGA Accelerator.
CoRR, 2016

High-throughput cellular imaging with high-speed asymmetric-detection time-stretch optical microscopy under FPGA platform.
Proceedings of the International Conference on ReConFigurable Computing and FPGAs, 2016

Towards FPGA-assisted spark: An SVM training acceleration case study.
Proceedings of the International Conference on ReConFigurable Computing and FPGAs, 2016

Dual-mode double precision division architecture.
Proceedings of the IEEE 59th International Midwest Symposium on Circuits and Systems, 2016

Taylor Series Based Architecture for Quadruple Precision Floating Point Division.
Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2016

Data-driven light field depth estimation using deep Convolutional Neural Networks.
Proceedings of the 2016 International Joint Conference on Neural Networks, 2016

Sparse Hierarchical Nonparametric Bayesian learning for light field representation and denoising.
Proceedings of the 2016 International Joint Conference on Neural Networks, 2016

Real-time object detection and classification for high-speed asymmetric-detection time-stretch optical microscopy on FPGA.
Proceedings of the 2016 International Conference on Field-Programmable Technology, 2016

GraVF: A vertex-centric distributed graph processing framework on FPGAs.
Proceedings of the 26th International Conference on Field Programmable Logic and Applications, 2016

OLAF'16: Second International Workshop on Overlay Architectures for FPGAs.
Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2016

Vertex-Centric Graph Processing on FPGA.
Proceedings of the 24th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2016

Unsupervised tracking with a low computational cost using the doubly stochastic Dirichlet process mixture model.
Proceedings of the Image Processing: Machine Vision Applications IX, 2016

Architecture for quadruple precision floating point division with multi-precision support.
Proceedings of the 27th IEEE International Conference on Application-specific Systems, 2016

FPGA Overlays.
Proceedings of the FPGAs for Software Programmers, 2016

2015
Configurable Architectures for Multi-Mode Floating Point Adders.
IEEE Trans. Circuits Syst. I Regul. Pap., 2015

Automatic Nested Loop Acceleration on FPGAs Using Soft CGRA Overlay.
CoRR, 2015

Dual-mode double precision / two-parallel single precision floating point multiplier architecture.
Proceedings of the 2015 IFIP/IEEE International Conference on Very Large Scale Integration, 2015

Architecture for Dual-Mode Quadruple Precision Floating Point Adder.
Proceedings of the 2015 IEEE Computer Society Annual Symposium on VLSI, 2015

Accelerated cell imaging and classification on FPGAs for quantitative-phase asymmetric-detection time-stretch optical microscopy.
Proceedings of the 2015 International Conference on Field Programmable Technology, 2015

QuickDough: A rapid FPGA loop accelerator design framework using soft CGRA overlay.
Proceedings of the 2015 International Conference on Field Programmable Technology, 2015

Significant papers from the first 25 years of the FPL conference.
Proceedings of the 25th International Conference on Field Programmable Logic and Applications, 2015

Automatic Soft CGRA Overlay Customization for High-Productivity Nested Loop Acceleration on FPGAs.
Proceedings of the 23rd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2015

2014
Mixed-architecture process scheduling on tightly coupled reconfigurable computers.
Proceedings of the 24th International Conference on Field Programmable Logic and Applications, 2014

Scheduling Mixed-Architecture Processes in Tightly Coupled FPGA-CPU Reconfigurable Computers.
Proceedings of the 22nd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2014

Map-reduce processing of k-means algorithm with FPGA-accelerated computer cluster.
Proceedings of the IEEE 25th International Conference on Application-Specific Systems, 2014

2013
Design space exploration for sparse matrix-matrix multiplication on FPGAs.
Int. J. Circuit Theory Appl., 2013

Direct virtual memory access from FPGA for high-productivity heterogeneous computing.
Proceedings of the 2013 International Conference on Field-Programmable Technology, 2013

A Soft Coarse-Grained Reconfigurable Array Based High-level Synthesis Methodology: Promoting Design Productivity and Exploring Extreme FPGA Frequency.
Proceedings of the 21st IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2013

2012
Energy-efficient dataflow computations on FPGAs using application-specific coarse-grain architecture synthesis.
SIGARCH Comput. Archit. News, 2012

Design considerations of real-time adaptive beamformer for medical ultrasound research using FPGA and GPU.
Proceedings of the 2012 International Conference on Field-Programmable Technology, 2012

Extending BORPH for shared memory reconfigurable computers.
Proceedings of the 22nd International Conference on Field Programmable Logic and Applications (FPL), 2012

Operation scheduling and architecture co-synthesis for energy-efficient dataflow computations on FPGAs (abstract only).
Proceedings of the ACM/SIGDA 20th International Symposium on Field Programmable Gate Arrays, 2012

2011
Design space exploration of adaptive beamforming acceleration for bedside and portable medical ultrasound imaging.
SIGARCH Comput. Archit. News, 2011

Medical Ultrasound Imaging: To GPU or Not to GPU?
IEEE Micro, 2011

On IIR-based bit-stream multipliers.
Int. J. Circuit Theory Appl., 2011

A Model for Matrix Multiplication Performance on FPGAs.
Proceedings of the International Conference on Field Programmable Logic and Applications, 2011

A Model for Peak Matrix Performance on FPGAs.
Proceedings of the IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines, 2011

2010
Zero-configuration identity-based IP network encryptor.
IEEE Trans. Consumer Electron., 2010

Dynamic power reduction of FPGA-based reconfigurable computers using precomputation.
SIGARCH Comput. Archit. News, 2010

Design space exploration for sparse matrix-matrix multiplication on FPGAs.
Proceedings of the International Conference on Field-Programmable Technology, 2010

2009
Operation scheduling for FPGA-based reconfigurable computers.
Proceedings of the 19th International Conference on Field Programmable Logic and Applications, 2009

2008
A unified hardware/software runtime environment for FPGA-based reconfigurable computers using BORPH.
ACM Trans. Embed. Comput. Syst., 2008

Quad-level bit-stream signal processing on FPGAs.
Proceedings of the 2008 International Conference on Field-Programmable Technology, 2008

File system access from reconfigurable FPGA hardware processes in BORPH.
Proceedings of the FPL 2008, 2008

Direct sigma-delta modulated signal processing in FPGA.
Proceedings of the FPL 2008, 2008

Runtime Filesystem Support for Reconfigurable FPGA Hardware Processes in BORPH.
Proceedings of the 16th IEEE International Symposium on Field-Programmable Custom Computing Machines, 2008

2007
ASIC Design and Verification in an FPGA Environment.
Proceedings of the IEEE 2007 Custom Integrated Circuits Conference, 2007

2006
Improving Usability of FPGA-Based Reconfigurable Computers Through Operating System Support.
Proceedings of the 2006 International Conference on Field Programmable Logic and Applications (FPL), 2006

A unified hardware/software runtime environment for FPGA-based reconfigurable computers using BORPH.
Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis, 2006

2005
An integrated debugging environment for reprogrammble hardware systems.
Proceedings of the Sixth International Workshop on Automated Debugging, 2005


  Loading...