Brucek Khailany

Orcid: 0000-0002-7584-3489

According to our database1, Brucek Khailany authored at least 80 papers between 1998 and 2023.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2023
Machine Learning and Algorithms: Let Us Team Up for EDA.
IEEE Des. Test, February, 2023

A 95.6-TOPS/W Deep Learning Inference Accelerator With Per-Vector Scaled 4-bit Quantization in 5 nm.
IEEE J. Solid State Circuits, 2023

ChipNeMo: Domain-Adapted LLMs for Chip Design.
CoRR, 2023

VerilogEval: Evaluating Large Language Models for Verilog Code Generation.
CoRR, 2023

NVCell 2: Routability-Driven Standard Cell Layout in Advanced Nodes with Lattice Graph Routability Model.
Proceedings of the 2023 International Symposium on Physical Design, 2023

AutoDMP: Automated DREAMPlace-based Macro Placement.
Proceedings of the 2023 International Symposium on Physical Design, 2023

An Adversarial Active Sampling-Based Data Augmentation Framework for AI-Assisted Lithography Modeling.
Proceedings of the IEEE/ACM International Conference on Computer Aided Design, 2023

Invited Paper: VerilogEval: Evaluating Large Language Models for Verilog Code Generation.
Proceedings of the IEEE/ACM International Conference on Computer Aided Design, 2023

GenFuzz: GPU-accelerated Hardware Fuzzing using Genetic Algorithm with Multiple Inputs.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

Late Breaking Results: Test Selection For RTL Coverage By Unsupervised Learning From Fast Functional Simulation.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

Efficient Transformer Inference with Statically Structured Sparse Attention.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

2022
LNS-Madam: Low-Precision Training in Logarithmic Number System Using Multiplicative Weight Update.
IEEE Trans. Computers, 2022

HEAT: Hardware-Efficient Automatic Tensor Decomposition for Transformer Compression.
CoRR, 2022

An Adversarial Active Sampling-based Data Augmentation Framework for Manufacturable Chip Design.
CoRR, 2022

Large Scale Mask Optimization Via Convolutional Fourier Neural Operator and Litho-Guided Self Training.
CoRR, 2022

A 17-95.6 TOPS/W Deep Learning Inference Accelerator with Per-Vector Scaled 4-bit Quantization for Transformers in 5nm.
Proceedings of the IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits 2022), 2022

XT-PRAGGMA: Crosstalk Pessimism Reduction Achieved with GPU Gate-level Simulations and Machine Learning.
Proceedings of the 2022 ACM/IEEE Workshop on Machine Learning for CAD, 2022

AutoCRAFT: Layout Automation for Custom Circuits in Advanced FinFET Technologies.
Proceedings of the ISPD 2022: International Symposium on Physical Design, Virtual Event, Canada, March 27, 2022

From RTL to CUDA: A GPU Acceleration Flow for RTL Simulation with Batch Stimulus.
Proceedings of the 51st International Conference on Parallel Processing, 2022

Optimal Clipping and Magnitude-aware Differentiation for Improved Quantization-aware Training.
Proceedings of the International Conference on Machine Learning, 2022

TransSizer: A Novel Transformer-Based Fast Gate Sizer.
Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design, 2022

GATSPI: GPU accelerated gate-level simulation for power improvement.
Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022

Generic lithography modeling with dual-band optics-inspired neural networks.
Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022

Generative self-supervised learning for gate sizing: invited.
Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022

2021
DREAMPlace: Deep Learning Toolkit-Enabled GPU Acceleration for Modern VLSI Placement.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2021

NVCell: Standard Cell Layout in Advanced Technology Nodes with Reinforcement Learning.
CoRR, 2021

Low-Precision Training in Logarithmic Number System using Multiplicative Weight Update.
CoRR, 2021

Verifying High-Level Latency-Insensitive Designs with Formal Model Checking.
CoRR, 2021

VS-Quant: Per-vector Scaled Quantization for Accurate Low-Precision Neural Network Inference.
CoRR, 2021

Simba: scaling deep-learning inference with chiplet-based architecture.
Commun. ACM, 2021

VS-Quant: Per-vector Scaled Quantization for Accurate Low-Precision Neural Network Inference.
Proceedings of Machine Learning and Systems 2021, 2021

3.2 The A100 Datacenter GPU and Ampere Architecture.
Proceedings of the IEEE International Solid-State Circuits Conference, 2021

Optimizing VLSI Implementation with Reinforcement Learning - ICCAD Special Session Paper.
Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2021

IPA: Floorplan-Aware SystemC Interconnect Performance Modeling and Generation for HLS-based SoCs.
Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2021

2021 ICCAD CAD Contest Problem C: GPU Accelerated Logic Rewriting.
Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2021

Parasitic-Aware Analog Circuit Sizing with Graph Neural Networks and Bayesian Optimization.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2021

MAVIREC: ML-Aided Vectored IR-Drop Estimation and Classification.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2021

Softermax: Hardware/Software Co-Design of an Efficient Softmax for Transformers.
Proceedings of the 58th ACM/IEEE Design Automation Conference, 2021

2020
ABCDPlace: Accelerated Batch-Based Concurrent Detailed Placement on Multithreaded CPUs and GPUs.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2020

Accelerating Chip Design With Machine Learning.
IEEE Micro, 2020

A 0.32-128 TOPS, Scalable Multi-Chip-Module-Based Deep Neural Network Inference Accelerator With Ground-Referenced Signaling in 16 nm.
IEEE J. Solid State Circuits, 2020

MAVIREC: ML-Aided Vectored IR-DropEstimation and Classification.
CoRR, 2020

Accelerating Chip Design with Machine Learning.
Proceedings of the MLCAD '20: 2020 ACM/IEEE Workshop on Machine Learning for CAD, 2020

Problem C: GPU Accelerated Logic Re-simulation : (Invited Talk).
Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2020

Opportunities for RTL and Gate Level Simulation using GPUs (Invited Talk).
Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2020

GRANNITE: Graph Neural Network Inference for Transferable Power Estimation.
Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020

PowerNet: Transferable Dynamic IR Drop Estimation via Maximum Convolutional Neural Network.
Proceedings of the 25th Asia and South Pacific Design Automation Conference, 2020

FIST: A Feature-Importance Sampling and Tree-Based Method for Automatic Design Flow Parameter Tuning.
Proceedings of the 25th Asia and South Pacific Design Automation Conference, 2020

2019
A 0.11 pJ/Op, 0.32-128 TOPS, Scalable Multi-Chip-Module-based Deep Neural Network Accelerator with Ground-Reference Signaling in 16nm.
Proceedings of the 2019 Symposium on VLSI Circuits, Kyoto, Japan, June 9-14, 2019, 2019

Simba: Scaling Deep-Learning Inference with Multi-Chip-Module-Based Architecture.
Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019

Timeloop: A Systematic Approach to DNN Accelerator Evaluation.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2019

MAGNet: A Modular Accelerator Generator for Neural Networks.
Proceedings of the International Conference on Computer-Aided Design, 2019

A 0.11 PJ/OP, 0.32-128 Tops, Scalable Multi-Chip-Module-Based Deep Neural Network Accelerator Designed with A High-Productivity vlsi Methodology.
Proceedings of the 2019 IEEE Hot Chips 31 Symposium (HCS), 2019

PRIMAL: Power Inference using Machine Learning.
Proceedings of the 56th Annual Design Automation Conference 2019, 2019

Analog/Mixed-Signal Hardware Error Modeling for Deep Learning Inference.
Proceedings of the 56th Annual Design Automation Conference 2019, 2019

High Performance Graph Convolutional Networks with Applications in Testability Analysis.
Proceedings of the 56th Annual Design Automation Conference 2019, 2019

DREAMPlace: Deep Learning Toolkit-Enabled GPU Acceleration for Modern VLSI Placement.
Proceedings of the 56th Annual Design Automation Conference 2019, 2019

A Fine-Grained GALS SoC with Pausible Adaptive Clocking in 16 nm FinFET.
Proceedings of the 25th IEEE International Symposium on Asynchronous Circuits and Systems, 2019

2018
Hardware-Enabled Artificial Intelligence.
Proceedings of the 2018 IEEE Symposium on VLSI Circuits, 2018


2017
SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks.
Proceedings of the 44th Annual International Symposium on Computer Architecture, 2017

2016
A real-time energy-efficient superpixel hardware accelerator for mobile computer vision applications.
Proceedings of the 53rd Annual Design Automation Conference, 2016

Modeling and Analysis of Power Supply Noise Tolerance with Fine-Grained GALS Adaptive Clocks.
Proceedings of the 22nd IEEE International Symposium on Asynchronous Circuits and Systems, 2016

2015
A Pausible Bisynchronous FIFO for GALS Systems.
Proceedings of the 21st IEEE International Symposium on Asynchronous Circuits and Systems, 2015

2013
GPU design in a power-limited era.
Proceedings of the 2013 IEEE International Conference on Microelectronic Systems Education, 2013

2012
Unifying Primary Cache, Scratch, and Register File Memories in a Throughput Processor.
Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture, 2012

2011
GPUs and the Future of Parallel Computing.
IEEE Micro, 2011

CudaDMA: optimizing GPU memory bandwidth via warp specialization.
Proceedings of the Conference on High Performance Computing Networking, 2011

2008
A Programmable 512 GOPS Stream Processor for Signal, Image, and Video Processing.
IEEE J. Solid State Circuits, 2008

2004
Stream Processors: Progammability and Efficiency.
ACM Queue, 2004

Evaluating the Imagine Stream Architecture.
Proceedings of the 31st International Symposium on Computer Architecture (ISCA 2004), 2004

2003
Programmable Stream Processors.
Computer, 2003

Exploring the VLSI Scalability of Stream Processors.
Proceedings of the Ninth International Symposium on High-Performance Computer Architecture (HPCA'03), 2003

2002
VLSI Design and Verification of the Imagine Processor.
Proceedings of the 20th International Conference on Computer Design (ICCD 2002), 2002

The Imagine Stream Processor.
Proceedings of the 20th International Conference on Computer Design (ICCD 2002), 2002

Comparing Reyes and OpenGL on a Stream Architecture.
Proceedings of the 2002 ACM SIGGRAPH/EUROGRAPHICS Conference on Graphics Hardware, 2002

2001
Imagine: Media Processing with Streams.
IEEE Micro, 2001

2000
Efficient conditional operations for data-parallel architectures.
Proceedings of the 33rd Annual IEEE/ACM International Symposium on Microarchitecture, 2000

Register Organization for Media Processing.
Proceedings of the Sixth International Symposium on High-Performance Computer Architecture, 2000

1998
A Bandwidth-efficient Architecture for Media Processing.
Proceedings of the 31st Annual IEEE/ACM International Symposium on Microarchitecture, 1998


  Loading...