Cong Hao

Orcid: 0000-0002-2541-8767

According to our database1, Cong Hao authored at least 95 papers between 2012 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.



In proceedings 
PhD thesis 




AutoAI2C: An Automated Hardware Generator for DNN Acceleration on Both FPGA and ASIC.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., October, 2024

Programmable Analog System Benchmarks Leading to Efficient Analog Computation Synthesis.
ACM Trans. Reconfigurable Technol. Syst., March, 2024

Exploring and Exploiting Runtime Reconfigurable Floating Point Precision in Scientific Computing: a Case Study for Solving PDEs.
CoRR, 2024

Residual-INR: Communication Efficient On-Device Learning Using Implicit Neural Representation.
CoRR, 2024

ICGMM: CXL-enabled Memory Expansion with Intelligent Caching Using Gaussian Mixture Model.
CoRR, 2024

Understanding the Performance and Estimating the Cost of LLM Fine-Tuning.
CoRR, 2024

Accurate Low-Degree Polynomial Approximation of Non-Polynomial Operators for Fast Private Inference in Homomorphic Encryption.
Proceedings of the Seventh Annual Conference on Machine Learning and Systems, 2024

HLSFactory: A Framework Empowering High-Level Synthesis Datasets for Machine Learning and Beyond.
Proceedings of the 2024 ACM/IEEE International Symposium on Machine Learning for CAD, 2024

Hyperdimensional Computing vs. Neural Networks: Comparing Architecture and Learning Process.
Proceedings of the 25th International Symposium on Quality Electronic Design, 2024

LightningSimV2: Faster and Scalable Simulation for High-Level Synthesis via Graph Compilation and Optimization.
Proceedings of the 32nd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2024

Ph.D. Project: Modernizing High-Level Hardware Design Workflows.
Proceedings of the 32nd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2024

GNNBuilder: An Automated Framework for Generic Graph Neural Network Accelerator Generation, Simulation, and Optimization.
Softw. Pract. Exp., November, 2023

IronMan-Pro: Multiobjective Design Space Exploration in HLS via Reinforcement Learning and Graph Neural Network-Based Modeling.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., March, 2023

PreAxC: Error Distribution Prediction for Approximate Computing Quality Control using Graph Neural Networks.
Proceedings of the 24th International Symposium on Quality Electronic Design, 2023

Extensible and Efficient Proxy for Neural Architecture Search.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Edge-MoE: Memory-Efficient Multi-Task Vision Transformer Architecture with Task-Level Sparsity via Mixture-of-Experts.
Proceedings of the IEEE/ACM International Conference on Computer Aided Design, 2023

Rapid-INR: Storage Efficient CPU-Free DNN Training Using Implicit Neural Representation.
Proceedings of the IEEE/ACM International Conference on Computer Aided Design, 2023

INR-Arch: A Dataflow Architecture and Compiler for Arbitrary-Order Gradient Computations in Implicit Neural Representation Processing.
Proceedings of the IEEE/ACM International Conference on Computer Aided Design, 2023

FlowGNN: A Dataflow Architecture for Real-Time Workload-Agnostic Graph Neural Network Inference.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2023

From Acceleration to Accelerating Acceleration: Modernizing the Accelerator Landscape using High-Level Synthesis.
Proceedings of the 31st IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2023

LightningSim: Fast and Accurate Trace-Based Simulation for High-Level Synthesis.
Proceedings of the 31st IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2023

Hardware/Software Co-design for Machine Learning Accelerators.
Proceedings of the 31st IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2023

DGNN-Booster: A Generic FPGA Accelerator Framework For Dynamic Graph Neural Network Inference.
Proceedings of the 31st IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2023

M5: Multi-modal Multi-task Model Mapping on Multi-FPGA with Accelerator Configuration Search.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2023

Gamora: Graph Learning based Symbolic Reasoning for Large-Scale Boolean Networks.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

Data-Model-Circuit Tri-Design for Ultra-Light Video Intelligence on Edge Devices.
Proceedings of the 28th Asia and South Pacific Design Automation Conference, 2023

M<sup>3</sup>ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-design.
CoRR, 2022

Efficient Machine Learning, Compilers, and Optimizations for Embedded Systems.
CoRR, 2022

FlowGNN: A Dataflow Architecture for Universal Graph Neural Network Inference via Multi-Queue Streaming.
CoRR, 2022

Enabling Flexibility for Sparse Tensor Acceleration via Heterogeneity.
CoRR, 2022

GenGNN: A Generic FPGA Framework for Graph Neural Network Acceleration.
CoRR, 2022

Hybrid Graph Models for Logic Optimization via Spatio-Temporal Information.
CoRR, 2022

M³ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-design.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Unsupervised Learning for Combinatorial Optimization with Principled Objective Relaxation.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

RT-DNAS: Real-Time Constrained Differentiable Neural Architecture Search for 3D Cardiac Cine MRI Segmentation.
Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2022, 2022

Bottleneck Analysis of Dynamic Graph Neural Network Inference on CPU and GPU.
Proceedings of the IEEE International Symposium on Workload Characterization, 2022

AI-assisted Synthesis in Next Generation EDA: Promises, Challenges, and Prospects.
Proceedings of the IEEE 40th International Conference on Computer Design, 2022

ScaleHLS: A New Scalable High-Level Synthesis Framework on Multi-Level Intermediate Representation.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022

H2H: heterogeneous model to heterogeneous system mapping with computation and communication awareness.
Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022

High-level synthesis performance prediction using GNNs: benchmarking, modeling, and advancing.
Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022

LOSTIN: Logic Optimization via Spatio-Temporal Information with Hybrid Graph Models.
Proceedings of the 33rd IEEE International Conference on Application-specific Systems, 2022

Mask-Net: A Hardware-efficient Object Detection Network with Masked Region Proposals.
Proceedings of the 33rd IEEE International Conference on Application-specific Systems, 2022

Robotic Computing on FPGAs: Current Progress, Research Challenges, and Opportunities.
Proceedings of the 4th IEEE International Conference on Artificial Intelligence Circuits and Systems, 2022

VecQ: Minimal Loss DNN Model Compression With Vectorized Weight Quantization.
IEEE Trans. Computers, 2021

Improving the Generalization Ability of Deep Neural Networks for Cross-Domain Visual Recognition.
IEEE Trans. Cogn. Dev. Syst., 2021

Enabling Design Methodologies and Future Trends for Edge AI: Specialization and Codesign.
IEEE Des. Test, 2021

Program-to-Circuit: Exploiting GNNs for Program Representation and Circuit Translation.
CoRR, 2021

ScaleHLS: Scalable High-Level Synthesis through MLIR.
CoRR, 2021

3U-EdgeAI: Ultra-Low Memory Training, Ultra-Low BitwidthQuantization, and Ultra-Low Latency Acceleration.
CoRR, 2021

On-FPGA Training with Ultra Memory Reduction: A Low-Precision Tensor Method.
CoRR, 2021

Enabling Design Methodologies and Future Trends for Edge AI: Specialization and Co-design.
CoRR, 2021

Adversarial Graph Augmentation to Improve Graph Contrastive Learning.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Generic Neural Architecture Search via Regression.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

HW-NAS-Bench: Hardware-Aware Neural Architecture Search Benchmark.
Proceedings of the 9th International Conference on Learning Representations, 2021

IRONMAN: GNN-assisted Design Space Exploration in High-Level Synthesis via Reinforcement Learning.
Proceedings of the GLSVLSI '21: Great Lakes Symposium on VLSI 2021, 2021

3U-EdgeAI: Ultra-Low Memory Training, Ultra-Low Bitwidth Quantization, and Ultra-Low Latency Acceleration.
Proceedings of the GLSVLSI '21: Great Lakes Symposium on VLSI 2021, 2021

Workload-Aware Approximate Computing Configuration.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2021

MELOPPR: Software/Hardware Co-design for Memory-efficient Low-latency Personalized PageRank.
Proceedings of the 58th ACM/IEEE Design Automation Conference, 2021

WinoCNN: Kernel Sharing Winograd Systolic Array for Efficient Convolutional Neural Network Acceleration on FPGAs.
Proceedings of the 32nd IEEE International Conference on Application-specific Systems, 2021

Software/Hardware Co-design for Multi-modal Multi-task Learning in Autonomous Systems.
Proceedings of the 3rd IEEE International Conference on Artificial Intelligence Circuits and Systems, 2021

SkyNet: a Hardware-Efficient Method for Object Detection and Tracking on Embedded Systems.
Proceedings of the Third Conference on Machine Learning and Systems, 2020

Effective Algorithm-Accelerator Co-design for AI Solutions on Edge Devices.
Proceedings of the GLSVLSI '20: Great Lakes Symposium on VLSI 2020, 2020

AutoDNNchip: An Automated DNN Chip Predictor and Builder for Both FPGAs and ASICs.
Proceedings of the FPGA '20: The 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2020

EDD: Efficient Differentiable DNN Architecture and Implementation Co-search for Embedded AI Solutions.
Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020

SkyNet: A Champion Model for DAC-SDC on Low Power Object Detection.
CoRR, 2019

A Bi-Directional Co-Design Approach to Enable Deep Learning on IoT Devices.
CoRR, 2019

A Hybrid GPU + FPGA System Design for Autonomous Driving Cars.
Proceedings of the 2019 IEEE International Workshop on Signal Processing Systems, 2019

T-DLA: An Open-source Deep Learning Accelerator for Ternarized DNN Models on Embedded FPGA.
Proceedings of the 2019 IEEE Computer Society Annual Symposium on VLSI, 2019

µL2Q: An Ultra-Low Loss Quantization Method for DNN Compression.
Proceedings of the International Joint Conference on Neural Networks, 2019

NAIS: Neural Architecture and Implementation Search and its Applications in Autonomous Driving.
Proceedings of the International Conference on Computer-Aided Design, 2019

Cloud-DNN: An Open Framework for Mapping DNN Models to Cloud FPGAs.
Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2019

FPGA/DNN Co-Design: An Efficient Design Methodology for IoT Intelligence on the Edge.
Proceedings of the 56th Annual Design Automation Conference 2019, 2019

TSV Assignment of Thermal and Wirelength Optimization for 3D-IC Routing.
Proceedings of the 28th International Symposium on Power and Timing Modeling, 2018

Economical Smart Home Scheduling by Cuckoo Search optimization via Levy Flight.
Proceedings of the IEEE 61st International Midwest Symposium on Circuits and Systems, 2018

Triangle Counting and Truss Decomposition using FPGA.
Proceedings of the 2018 IEEE High Performance Extreme Computing Conference, 2018

The Sixth Visual Object Tracking VOT2018 Challenge Results.
Proceedings of the Computer Vision - ECCV 2018 Workshops, 2018

Interconnection Allocation Between Functional Units and Registers in High-Level Synthesis.
IEEE Trans. Very Large Scale Integr. Syst., 2017

A Unified Scheduling Approach for Power and Resource Optimization With Multiple V<sub>dd</sub> or/and V<sub>th</sub> in High-Level Synthesis.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2017

An Efficient Multi-Level Algorithm for 3D-IC TSV Assignment.
IEICE Trans. Fundam. Electron. Commun. Comput. Sci., 2017

3D-IC signal TSV assignment for thermal and wirelength optimization.
Proceedings of the 27th International Symposium on Power and Timing Modeling, 2017

A particle swarm optimization and branch and bound based algorithm for economical smart home scheduling.
Proceedings of the IEEE 60th International Midwest Symposium on Circuits and Systems, 2017

Application of on-line machine learning in optimization algorithms: A case study for local search.
Proceedings of the 2017 9th Computer Science and Electronic Engineering Conference, 2017

Leakage-Power-Aware Scheduling With Dual-Threshold Voltage Design.
IEEE Trans. Very Large Scale Integr. Syst., 2016

An efficient algorithm for 3D-IC TSV assignment.
Proceedings of the 14th IEEE International New Circuits and Systems Conference, 2016

Thermal-aware floorplanning for NoC-sprinting.
Proceedings of the IEEE 59th International Midwest Symposium on Circuits and Systems, 2016

Economical smart home scheduling for single and multiple users.
Proceedings of the IEEE 59th International Midwest Symposium on Circuits and Systems, 2016

Power-efficient partitioning and cluster generation design for application-specific Network-on-Chip.
Proceedings of the International SoC Design Conference, 2016

Primal-dual method based simultaneous functional unit and register binding.
Proceedings of the 2015 IEEE 11th International Conference on ASIC, 2015

Simultaneous scheduling and binding for resource usage and interconnect complexity reduction in high-level synthesis.
Proceedings of the 2015 IEEE 11th International Conference on ASIC, 2015

Leakage Power Aware Scheduling in High-Level Synthesis.
IEICE Trans. Fundam. Electron. Commun. Comput. Sci., 2014

Network simplex method based Multiple Voltage Scheduling in Power-efficient High-level synthesis.
Proceedings of the 18th Asia and South Pacific Design Automation Conference, 2013

Power and resource aware scheduling with multiple voltages.
Proceedings of the IEEE 10th International Conference on ASIC, 2013

Timing and resource constrained leakage power aware scheduling in high-level synthesis.
Proceedings of the IEEE 10th International Conference on ASIC, 2013

Interconnection allocation between functional units and registers in High-Level Synthesis.
Proceedings of the IEEE 10th International Conference on ASIC, 2013

Port assignment for interconnect reduction in high-level synthesis.
Proceedings of Technical Program of 2012 VLSI Design, Automation and Test, 2012
