Zhewei Yao

CoRR, 2023

ZeroQuant-HERO: Hardware-Enhanced Robust Optimized Post-Training Quantization Framework for W8A8 Transformers.

[BibT_eX]

[DOI]

CoRR, 2023

DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via Multi-Modal Causal Attention.

[BibT_eX]

[DOI]

CoRR, 2023

DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales.

[BibT_eX]

[DOI]

CoRR, 2023

ZeroQuant-FP: A Leap Forward in LLMs Post-Training W4A8 Quantization Using Floating-Point Formats.

[BibT_eX]

[DOI]

Xiaoxia Wu

CoRR, 2023

Selective Guidance: Are All the Denoising Steps of Guided Diffusion Important?

[BibT_eX]

[DOI]

Pareesa Ameneh Golnari

CoRR, 2023

A Comprehensive Study on Post-Training Quantization for Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

Understanding INT4 Quantization for Transformer Models: Latency Speedup, Composability, and Failure Cases.

[BibT_eX]

[DOI]

Xiaoxia Wu

Cheng Li

CoRR, 2023

Understanding Int4 Quantization for Language Models: Latency Speedup, Composability, and Failure Cases.

[BibT_eX]

[DOI]

Xiaoxia Wu

Cheng Li

Proceedings of the International Conference on Machine Learning, 2023

DySR: Adaptive Super-Resolution via Algorithm and System Co-design.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Scaling Vision-Language Models with Sparse Mixture of Experts.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

2022

Random-LTD: Random and Layerwise Token Dropping Brings Efficient Training for Large-scale Transformers.

[BibT_eX]

[DOI]

CoRR, 2022

BiFeat: Supercharge GNN Training via Graph Feature Quantization.

[BibT_eX]

[DOI]

CoRR, 2022

Extreme Compression for Pre-trained Transformers Made Simple and Efficient.

[BibT_eX]

[DOI]

CoRR, 2022

Hessian-Aware Pruning and Optimal Neural Implant.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2022

ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

XTC: Extreme Compression for Pre-trained Transformers Made Simple and Efficient.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale.

[BibT_eX]

[DOI]

Ammar Ahmad Awan

Jeff Rasley

Proceedings of the International Conference on Machine Learning, 2022

How Much Can CLIP Benefit Vision-and-Language Tasks?

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Learning Representations, 2022

Integer-Only Zero-Shot Quantization for Efficient Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

Inexact Nonconvex Newton-Type Methods.

[BibT_eX]

[DOI]

INFORMS J. Optim., January, 2021

MLPruning: A Multilevel Structured Pruning Framework for Transformer-based Models.

[BibT_eX]

[DOI]

CoRR, 2021

Q-ASR: Integer-only Zero-shot Quantization for Efficient Speech Recognition.

[BibT_eX]

[DOI]

CoRR, 2021

A Survey of Quantization Methods for Efficient Neural Network Inference.

[BibT_eX]

[DOI]

CoRR, 2021

Hessian-Aware Pruning and Optimal Neural Implant.

[BibT_eX]

[DOI]

CoRR, 2021

HAWQ-V3: Dyadic Neural Network Quantization.

[BibT_eX]

[DOI]

Proceedings of the 38th International Conference on Machine Learning, 2021

I-BERT: Integer-only BERT Quantization.

[BibT_eX]

[DOI]

Proceedings of the 38th International Conference on Machine Learning, 2021

ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training.

[BibT_eX]

[DOI]

Proceedings of the 38th International Conference on Machine Learning, 2021

What's Hidden in a One-layer Randomly Weighted Transformer?

[BibT_eX]

[DOI]

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

Improving Semi-supervised Federated Learning by Reducing the Gradient Diversity of Models.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE International Conference on Big Data (Big Data), 2021

ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020

HAWQV3: Dyadic Neural Network Quantization.

[BibT_eX]

[DOI]

CoRR, 2020

Benchmarking Semi-supervised Federated Learning.

[BibT_eX]

[DOI]

CoRR, 2020

ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning.

[BibT_eX]

[DOI]

CoRR, 2020

Rethinking Batch Normalization in Transformers.

[BibT_eX]

[DOI]

CoRR, 2020

HAWQ-V2: Hessian Aware trace-Weighted Quantization of Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

A Statistical Framework for Low-bitwidth Training of Deep Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

JumpReLU: A Retrofit Defense Strategy for Adversarial Attacks.

[BibT_eX]

[DOI]

N. Benjamin Erichson

Michael W. Mahoney

Proceedings of the 9th International Conference on Pattern Recognition Applications and Methods, 2020

PowerNorm: Rethinking Batch Normalization in Transformers.

[BibT_eX]

[DOI]

Proceedings of the 37th International Conference on Machine Learning, 2020

MAF: Multimodal Alignment Framework for Weakly-Supervised Phrase Grounding.

[BibT_eX]

[DOI]

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

ZeroQ: A Novel Zero Shot Quantization Framework.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

PyHessian: Neural Networks Through the Lens of the Hessian.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Big Data (IEEE BigData 2020), 2020

Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

Inefficiency of K-FAC for Large Batch Size Training.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019

改进型循环生成对抗网络的血管内超声图像增强 (Improved CycleGANs for Intravascular Ultrasound Image Enhancement).

[BibT_eX]

[DOI]

计算机科学, 2019

HAWQ-V2: Hessian Aware trace-Weighted Quantization of Neural Networks.

[BibT_eX]

[DOI]

CoRR, 2019

ANODEV2: A Coupled Neural ODE Evolution Framework.

[BibT_eX]

[DOI]

CoRR, 2019

Residual Networks as Nonlinear Systems: Stability Analysis using Linearization.

[BibT_eX]

[DOI]

CoRR, 2019

Shallow Learning for Fluid Flow Reconstruction with Limited Sensors and Limited Data.

[BibT_eX]

[DOI]

CoRR, 2019

ANODEV2: A Coupled Neural ODE Framework.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

HAWQ: Hessian AWare Quantization of Neural Networks With Mixed-Precision.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Trust Region Based Adversarial Attack on Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018

Parameter Re-Initialization through Cyclical Batch Size Schedules.

[BibT_eX]

[DOI]

CoRR, 2018

On the Computational Inefficiency of Large Batch Sizes for Stochastic Gradient Descent.

[BibT_eX]

[DOI]

CoRR, 2018

Large batch size training of neural networks with adversarial training and second-order information.

[BibT_eX]

[DOI]

CoRR, 2018

Hessian-based Analysis of Large Batch Training and Robustness to Adversaries.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

2017

A Hybrid Adaptive MCMC Algorithm in Function Spaces.

[BibT_eX]

[DOI]

SIAM/ASA J. Uncertain. Quantification, 2017

On an adaptive preconditioned Crank-Nicolson MCMC algorithm for infinite dimensional Bayesian inference.

[BibT_eX]

[DOI]

Zixi Hu