We stand with Ukraine

We stand with Ukraine

Naigang Wang

Orcid: 0000-0001-7664-0061

According to our database¹, Naigang Wang authored at least 40 papers between 2012 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2025

Generative AI Through CAS Lens: An Integrated Overview of Algorithmic Optimizations, Architectural Advances, and Automated Designs.

[BibT_eX]

[DOI]

,

,

,

,

IEEE J. Emerg. Sel. Topics Circuits Syst., June, 2025

Guest Editorial Generative Artificial Intelligence Compute: Algorithms, Implementations, and Applications to CAS.

[BibT_eX]

[DOI]

,

,

,

IEEE J. Emerg. Sel. Topics Circuits Syst., June, 2025

DiaBlo: Diagonal Blocks Are Sufficient For Finetuning.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

CoRR, June, 2025

EvidenceMoE: A Physics-Guided Mixture-of-Experts with Evidential Critics for Advancing Fluorescence Light Detection and Ranging in Scattering Media.

[BibT_eX]

[DOI]

,

Ferhat Demirkiran

,

Karthik Swaminathan

,

,

Navid Ibtehaj Nizam

,

Stefan T. Radev

,

Kaoutar El Maghraoui

,

,

CoRR, May, 2025

Compressed Decentralized Momentum Stochastic Gradient Methods for Nonconvex Optimization.

[BibT_eX]

[DOI]

,

,

,

Christopher Brissette

,

,

George M. Slota

,

,

,

Trans. Mach. Learn. Res., 2025

CLoQ: Enhancing Fine-Tuning of Quantized LLMs via Calibrated LoRA Initialization.

[BibT_eX]

[DOI]

,

,

,

,

,

Trans. Mach. Learn. Res., 2025

Accelerating LLM Inference via Dynamic KV Cache Placement in Heterogeneous Memory System.

[BibT_eX]

[DOI]

,

,

,

,

Kaoutar El Maghraoui

,

,

,

,

IEEE Comput. Archit. Lett., 2025

COMQ: A Backpropagation-Free Algorithm for Post-Training Quantization.

[BibT_eX]

[DOI]

,

,

,

,

,

,

IEEE Access, 2025

No Time to Lose: Enabling Real-Time Fluorescence Lifetime Imaging on Resource-constrained FPGAs Through Efficient Scheduling.

[BibT_eX]

[DOI]

,

Aporva Amarnath

,

,

Karthik Swaminathan

,

,

Proceedings of the 2025 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2025

2024

Unlocking Real-Time Fluorescence Lifetime Imaging: Multi-Pixel Parallelism for FPGA-Accelerated Processing.

[BibT_eX]

[DOI]

,

Aporva Amarnath

,

,

Karthik Swaminathan

,

,

CoRR, 2024

Compressing Recurrent Neural Networks for FPGA-accelerated Implementation in Fluorescence Lifetime Imaging.

[BibT_eX]

[DOI]

,

,

Aporva Amarnath

,

,

Karthik Swaminathan

,

Stefan T. Radev

,

CoRR, 2024

MagR: Weight Magnitude Reduction for Enhancing Post-Training Quantization.

[BibT_eX]

[DOI]

,

,

,

,

,

CoRR, 2024

Mitigating the Impact of Outlier Channels for Language Model Quantization with Activation Regularization.

[BibT_eX]

[DOI]

Aniruddha Nrusimha

,

,

,

,

,

CoRR, 2024

COMQ: A Backpropagation-Free Algorithm for Post-Training Quantization.

[BibT_eX]

[DOI]

,

,

,

,

,

,

CoRR, 2024

Improved Techniques for Quantizing Deep Networks with Adaptive Bit-Widths.

[BibT_eX]

[DOI]

,

,

Chun-Fu Richard Chen

,

,

,

,

,

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024

MagR: Weight Magnitude Reduction for Enhancing Post-Training Quantization.

[BibT_eX]

[DOI]

,

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

A Provably Effective Method for Pruning Experts in Fine-tuned Sparse Mixture-of-Experts.

[BibT_eX]

[DOI]

Mohammed Nowaz Rabbani Chowdhury

,

,

Kaoutar El Maghraoui

,

,

,

Christopher D. Carothers

Proceedings of the Forty-first International Conference on Machine Learning, 2024

2022

A 7-nm Four-Core Mixed-Precision AI Chip With 26.2-TFLOPS Hybrid-FP8 Training, 104.9-TOPS INT4 Inference, and Workload-Aware Throttling.

[BibT_eX]

[DOI]

IEEE J. Solid State Circuits, 2022

Deep Compression of Pre-trained Transformer Models.

[BibT_eX]

[DOI]

,

Chi-Chun (Charlie) Liu

,

Swagath Venkataramani

,

,

,

Kaoutar El Maghraoui

,

Vijayalakshmi Srinivasan

,

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

2021

All at Once Network Quantization via Collaborative Knowledge Transfer.

[BibT_eX]

[DOI]

,

,

,

,

,

Kailash Gopalakrishnan

,

,

,

CoRR, 2021

A Comprehensive Survey on Hardware-Aware Neural Architecture Search.

[BibT_eX]

[DOI]

Hadjer Benmeziane

,

Kaoutar El Maghraoui

,

Hamza Ouarnoughi

,

,

,

CoRR, 2021

A 7nm 4-Core AI Chip with 25.6TFLOPS Hybrid FP8 Training, 102.4TOPS INT4 Inference and Workload-Aware Throttling.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Solid-State Circuits Conference, 2021

RaPiD: AI Accelerator for Ultra-low Precision Training and Inference.

[BibT_eX]

[DOI]

Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021

4-Bit Quantization of LSTM-Based Speech Recognition Models.

[BibT_eX]

[DOI]

,

,

Mauricio J. Serrano

,

,

,

Swagath Venkataramani

,

,

,

Brian Kingsbury

,

,

,

Kailash Gopalakrishnan

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Hardware-Aware Neural Architecture Search: Survey and Taxonomy.

[BibT_eX]

[DOI]

Hadjer Benmeziane

,

Kaoutar El Maghraoui

,

Hamza Ouarnoughi

,

,

,

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, 2021

2020

Efficient AI System Design With Cross-Layer Approximate Computing.

[BibT_eX]

[DOI]

Proc. IEEE, 2020

A 3.0 TFLOPS 0.62V Scalable Processor Core for High Compute Utilization AI Training and Inference.

[BibT_eX]

[DOI]

Proceedings of the IEEE Symposium on VLSI Circuits, 2020

Ultra-Low Precision 4-bit Training of Deep Neural Networks.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Swagath Venkataramani

,

Kaoutar El Maghraoui

,

Vijayalakshmi Srinivasan

,

Kailash Gopalakrishnan

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

ScaleCom: Scalable Sparsified Gradient Compression for Communication-Efficient Distributed Training.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

Swagath Venkataramani

,

Vijayalakshmi Srinivasan

,

,

Kailash Gopalakrishnan

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

2019

Innovate Practices on CyberSecurity of Hardware Semiconductor Devices.

[BibT_eX]

[DOI]

Alfred L. Crouch

,

,

Jennifer Dworak

,

Lakshmi Ramakrishnan

,

,

,

,

,

,

Scott McWilliams

,

,

Franco Stellari

,

,

Proceedings of the 37th IEEE VLSI Test Symposium, 2019

Hybrid 8-bit Floating Point (HFP8) Training and Inference for Deep Neural Networks.

[BibT_eX]

[DOI]

,

,

,

,

Swagath Venkataramani

,

Vijayalakshmi Srinivasan

,

,

,

Kailash Gopalakrishnan

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Accumulation Bit-Width Scaling For Ultra-Low Precision Training Of Deep Networks.

[BibT_eX]

[DOI]

,

,

,

,

,

Naresh R. Shanbhag

,

Kailash Gopalakrishnan

Proceedings of the 7th International Conference on Learning Representations, 2019

DLFloat: A 16-b Floating Point Format Designed for Deep Learning Training and Inference.

[BibT_eX]

[DOI]

,

Bruce M. Fleischer

,

Silvia M. Mueller

,

,

,

,

Kailash Gopalakrishnan

Proceedings of the 26th IEEE Symposium on Computer Arithmetic, 2019

2018

A Scalable Multi- TeraOPS Deep Learning Processor Core for AI Trainina and Inference.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Symposium on VLSI Circuits, 2018

Training Deep Neural Networks with 8-bit Floating Point Numbers.

[BibT_eX]

[DOI]

,

,

,

,

Kailash Gopalakrishnan

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Across the Stack Opportunities for Deep Learning Acceleration.

[BibT_eX]

[DOI]

Proceedings of the International Symposium on Low Power Electronics and Design, 2018

Novel IC Sub-Threshold IDDQ Signature And Its Relationship To Aging During High Voltage Stress.

[BibT_eX]

[DOI]

Franco Stellari

,

,

Proceedings of the 48th European Solid-State Device Research Conference, 2018

2015

An 82%-efficient multiphase voltage-regulator 3D interposer with on-chip magnetic inductors.

[BibT_eX]

[DOI]

,

,

,

,

,

Eugene J. O'Sullivan

,

,

Michele Petracca

,

Luca P. Carloni

,

William J. Gallagher

,

Kenneth L. Shepard

Proceedings of the Symposium on VLSI Circuits, 2015

2013

A 2.5D Integrated Voltage Regulator Using Coupled-Magnetic-Core Inductors on Silicon Interposer.

[BibT_eX]

[DOI]

,

Eugene J. O'Sullivan

,

,

,

Bucknell C. Webb

,

Lubomyr T. Romankiw

,

Michele Petracca

,

,

Robert E. Fontana Jr.

,

,

Ioannis Kymissis

,

Angel V. Peterchev

,

Luca P. Carloni

,

William J. Gallagher

,

Kenneth L. Shepard

IEEE J. Solid State Circuits, 2013

2012

A 2.5D integrated voltage regulator using coupled-magnetic-core inductors on silicon interposer delivering 10.8A/mm<sup>2</sup>.

[BibT_eX]

[DOI]

,

Eugene J. O'Sullivan

,

,

,

Bucknell C. Webb

,

Lubomyr T. Romankiw

,

Michele Petracca

,

,

Robert E. Fontana Jr.

,

,

Ioannis Kymissis

,

Angel V. Peterchev

,

Luca P. Carloni

,

William J. Gallagher

,

Kenneth L. Shepard

Proceedings of the 2012 IEEE International Solid-State Circuits Conference, 2012

Loading...