Steve Dai

Orcid: 0000-0002-5045-1964

According to our database1, Steve Dai authored at least 34 papers between 2010 and 2023.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2023
A 95.6-TOPS/W Deep Learning Inference Accelerator With Per-Vector Scaled 4-bit Quantization in 5 nm.
IEEE J. Solid State Circuits, 2023

Gamora: Graph Learning based Symbolic Reasoning for Large-Scale Boolean Networks.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

Efficient Transformer Inference with Statically Structured Sparse Attention.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

2022
LNS-Madam: Low-Precision Training in Logarithmic Number System Using Multiplicative Weight Update.
IEEE Trans. Computers, 2022

A 17-95.6 TOPS/W Deep Learning Inference Accelerator with Per-Vector Scaled 4-bit Quantization for Transformers in 5nm.
Proceedings of the IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits 2022), 2022

Optimal Clipping and Magnitude-aware Differentiation for Improved Quantization-aware Training.
Proceedings of the International Conference on Machine Learning, 2022

2021
Low-Precision Training in Logarithmic Number System using Multiplicative Weight Update.
CoRR, 2021

Verifying High-Level Latency-Insensitive Designs with Formal Model Checking.
CoRR, 2021

VS-Quant: Per-vector Scaled Quantization for Accurate Low-Precision Neural Network Inference.
CoRR, 2021

VS-Quant: Per-vector Scaled Quantization for Accurate Low-Precision Neural Network Inference.
Proceedings of Machine Learning and Systems 2021, 2021

Clockwork: Resource-Efficient Static Scheduling for Multi-Rate Image Processing Applications on FPGAs.
Proceedings of the FPGA '21: The 2021 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, Virtual Event, USA, February 28, 2021

Softermax: Hardware/Software Co-Design of an Efficient Softmax for Transformers.
Proceedings of the 58th ACM/IEEE Design Automation Conference, 2021

2020
Accelerating Chip Design With Machine Learning.
IEEE Micro, 2020

2019
A 1.4 GHz 695 Giga Risc-V Inst/s 496-Core Manycore Processor With Mesh On-Chip Network and an All-Digital Synthesized PLL in 16nm CMOS.
Proceedings of the 2019 Symposium on VLSI Circuits, Kyoto, Japan, June 9-14, 2019, 2019

MAGNet: A Modular Accelerator Generator for Neural Networks.
Proceedings of the International Conference on Computer-Aided Design, 2019

Improving Scalability of Exact Modulo Scheduling with Specialized Conflict-Driven Learning.
Proceedings of the 56th Annual Design Automation Conference 2019, 2019

2018
The Celerity Open-Source 511-Core RISC-V Tiered Accelerator Fabric: Fast Architectures and Design Methodologies for Fast Chips.
IEEE Micro, 2018

High-level synthesis with timing-sensitive information flow enforcement.
Proceedings of the International Conference on Computer-Aided Design, 2018

Rosetta: A Realistic High-Level Synthesis Benchmark Suite for Software Programmable FPGAs.
Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2018

A Scalable Approach to Exact Resource-Constrained Scheduling Based on a Joint SDC and SAT Formulation.
Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2018

Fast and Accurate Estimation of Quality of Results in High-Level Synthesis with Machine Learning.
Proceedings of the 26th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2018

2017
Architecture and Synthesis for Area-Efficient Pipelining of Irregular Loop Nests.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2017

Accelerating Face Detection on Programmable SoC Using C-Based Synthesis.
Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017

Dynamic Hazard Resolution for Pipelining Irregular Loops in High-Level Synthesis.
Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017

Enabling adaptive loop pipelining in high-level synthesis.
Proceedings of the 51st Asilomar Conference on Signals, Systems, and Computers, 2017

2015
High-level Synthesis for Low-power Design.
IPSJ Trans. Syst. LSI Des. Methodol., 2015

ElasticFlow: A Complexity-Effective Approach for Pipelining Irregular Loop Nests.
Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, 2015

Mapping-Aware Constrained Scheduling for LUT-Based FPGAs.
Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2015

Area-efficient pipelining for FPGA-targeted high-level synthesis.
Proceedings of the 52nd Annual Design Automation Conference, 2015

2014
Multithreaded pipeline synthesis for data-parallel kernels.
Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, 2014

Flushing-Enabled Loop Pipelining for High-Level Synthesis.
Proceedings of the 51st Annual Design Automation Conference 2014, 2014

2013
Design, simulation, and evaluation of imaging oximeters.
Proceedings of the Digital Photography IX, 2013

2011
Design and Evaluation of Identifiable Key-Click Signals for Mobile Devices.
IEEE Trans. Haptics, 2011

2010
Redundant coding of simulated tactile key clicks with audio signals.
Proceedings of the 2010 IEEE Haptics Symposium, 2010


  Loading...