Steve Dai

Nathaniel Ross Pinckney

Proceedings of the IEEE International Solid-State Circuits Conference, 2026

2025

GalaxyDiT: Efficient Video Generation with Guidance Alignment and Adaptive Proxy in Diffusion Transformers.

[BibT_eX]

[DOI]

CoRR, December, 2025

TurboSAT: Gradient-Guided Boolean Satisfiability Accelerated on GPU-CPU Hybrid System.

[BibT_eX]

[DOI]

CoRR, November, 2025

SQ-DM: Accelerating Diffusion Models with Aggressive Quantization and Temporal Sparsity.

[BibT_eX]

[DOI]

Zichen Fan

Dennis Sylvester

Proceedings of the 62nd ACM/IEEE Design Automation Conference, 2025

2024

Survey of Machine Learning for Software-assisted Hardware Design Verification: Past, Present, and Prospect.

[BibT_eX]

[DOI]

ACM Trans. Design Autom. Electr. Syst., 2024

2023

A 95.6-TOPS/W Deep Learning Inference Accelerator With Per-Vector Scaled 4-bit Quantization in 5 nm.

[BibT_eX]

[DOI]

IEEE J. Solid State Circuits, 2023

Gamora: Graph Learning based Symbolic Reasoning for Large-Scale Boolean Networks.

[BibT_eX]

[DOI]

Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

Efficient Transformer Inference with Statically Structured Sparse Attention.

[BibT_eX]

[DOI]

Hasan Genc

Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

2022

LNS-Madam: Low-Precision Training in Logarithmic Number System Using Multiplicative Weight Update.

[BibT_eX]

[DOI]

Jiawei Zhao

IEEE Trans. Computers, 2022

A 17-95.6 TOPS/W Deep Learning Inference Accelerator with Per-Vector Scaled 4-bit Quantization for Transformers in 5nm.

[BibT_eX]

[DOI]

Proceedings of the IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits 2022), 2022

Optimal Clipping and Magnitude-aware Differentiation for Improved Quantization-aware Training.

[BibT_eX]

[DOI]

Charbel Sakr

Brian Zimmer

William J. Dally

Proceedings of the International Conference on Machine Learning, 2022

2021

Low-Precision Training in Logarithmic Number System using Multiplicative Weight Update.

[BibT_eX]

[DOI]

Jiawei Zhao

CoRR, 2021

Verifying High-Level Latency-Insensitive Designs with Formal Model Checking.

[BibT_eX]

[DOI]

Alicia Klinefelter

Haoxing Ren

Nathaniel Ross Pinckney

CoRR, 2021

VS-Quant: Per-vector Scaled Quantization for Accurate Low-Precision Neural Network Inference.

[BibT_eX]

[DOI]

CoRR, 2021

VS-Quant: Per-vector Scaled Quantization for Accurate Low-Precision Neural Network Inference.

[BibT_eX]

[DOI]

Proceedings of the Fourth Conference on Machine Learning and Systems, 2021

Clockwork: Resource-Efficient Static Scheduling for Multi-Rate Image Processing Applications on FPGAs.

[BibT_eX]

[DOI]

Dillon Huff

Pat Hanrahan

Proceedings of the 29th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2021

Softermax: Hardware/Software Co-Design of an Efficient Softmax for Transformers.

[BibT_eX]

[DOI]

Jacob R. Stevens

Anand Raghunathan

Proceedings of the 58th ACM/IEEE Design Automation Conference, 2021

2020

Accelerating Chip Design With Machine Learning.

[BibT_eX]

[DOI]

Yanqing Zhang

Bryan Catanzaro

William J. Dally

IEEE Micro, 2020

2019

A 1.4 GHz 695 Giga Risc-V Inst/s 496-Core Manycore Processor With Mesh On-Chip Network and an All-Digital Synthesized PLL in 16nm CMOS.

[BibT_eX]

[DOI]

Proceedings of the 2019 Symposium on VLSI Circuits, Kyoto, Japan, June 9-14, 2019, 2019

MAGNet: A Modular Accelerator Generator for Neural Networks.

[BibT_eX]

[DOI]

Nathaniel Ross Pinckney

Proceedings of the International Conference on Computer-Aided Design, 2019

Improving Scalability of Exact Modulo Scheduling with Specialized Conflict-Driven Learning.

[BibT_eX]

[DOI]

Gustavo Angarita Velasquez

Proceedings of the 56th Annual Design Automation Conference 2019, 2019

2018

The Celerity Open-Source 511-Core RISC-V Tiered Accelerator Fabric: Fast Architectures and Design Methodologies for Fast Chips.

[BibT_eX]

[DOI]

IEEE Micro, 2018

High-level synthesis with timing-sensitive information flow enforcement.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Computer-Aided Design, 2018

Rosetta: A Realistic High-Level Synthesis Benchmark Suite for Software Programmable FPGAs.

[BibT_eX]

[DOI]

Nitish Kumar Srivastava

Wenping Wang

Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2018

A Scalable Approach to Exact Resource-Constrained Scheduling Based on a Joint SDC and SAT Formulation.

[BibT_eX]

[DOI]

Gai Liu

Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2018

Fast and Accurate Estimation of Quality of Results in High-Level Synthesis with Machine Learning.

[BibT_eX]

[DOI]

Evangeline F. Y. Young

Proceedings of the 26th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2018

2017

Architecture and Synthesis for Area-Efficient Pipelining of Irregular Loop Nests.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2017

Accelerating Face Detection on Programmable SoC Using C-Based Synthesis.

[BibT_eX]

[DOI]

Nitish Kumar Srivastava

Rajit Manohar

Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017

Dynamic Hazard Resolution for Pipelining Irregular Loops in High-Level Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017

Enabling adaptive loop pipelining in high-level synthesis.

[BibT_eX]

[DOI]

Proceedings of the 51st Asilomar Conference on Signals, Systems, and Computers, 2017

2015

High-level Synthesis for Low-power Design.

[BibT_eX]

[DOI]

IPSJ Trans. Syst. LSI Des. Methodol., 2015

ElasticFlow: A Complexity-Effective Approach for Pipelining Irregular Loop Nests.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, 2015

Mapping-Aware Constrained Scheduling for LUT-Based FPGAs.

[BibT_eX]

[DOI]

Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2015

Area-efficient pipelining for FPGA-targeted high-level synthesis.

[BibT_eX]

[DOI]

Proceedings of the 52nd Annual Design Automation Conference, 2015

2014

Multithreaded pipeline synthesis for data-parallel kernels.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, 2014

Flushing-Enabled Loop Pipelining for High-Level Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 51st Annual Design Automation Conference 2014, 2014

2013

Design, simulation, and evaluation of imaging oximeters.

[BibT_eX]

[DOI]