Reza Yazdani

Orcid: 0000-0002-7949-6453

According to our database1, Reza Yazdani authored at least 25 papers between 2016 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference.
CoRR, 2024

2023
SHARP: An Adaptable, Energy-Efficient Accelerator for Recurrent Neural Networks.
ACM Trans. Embed. Comput. Syst., March, 2023

The hybrid DHP method for evaluation, ranking and selection of green suppliers in the supply chain.
Int. J. Math. Oper. Res., 2023

ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks.
CoRR, 2023

ZeroQuant-HERO: Hardware-Enhanced Robust Optimized Post-Training Quantization Framework for W8A8 Transformers.
CoRR, 2023

DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales.
CoRR, 2023

Understanding INT4 Quantization for Transformer Models: Latency Speedup, Composability, and Failure Cases.
CoRR, 2023

Understanding Int4 Quantization for Language Models: Latency Speedup, Composability, and Failure Cases.
Proceedings of the International Conference on Machine Learning, 2023

2022
A lion optimization algorithm for an integrating maintenance planning and production scheduling problem with a total absolute deviation of completion times objective.
Soft Comput., December, 2022

Exploring the impacts of COVID-19 pandemic on risks faced by infrastructure projects in Pakistan.
Int. J. Appl. Decis. Sci., 2022

Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model.
CoRR, 2022

DeepSpeed- Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale.
Proceedings of the SC22: International Conference for High Performance Computing, 2022

ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale.
Proceedings of the International Conference on Machine Learning, 2022

2021
ZeRO-Offload: Democratizing Billion-Scale Model Training.
Proceedings of the 2021 USENIX Annual Technical Conference, 2021

2020
LAWS: Locality-AWare Scheme for Automatic Speech Recognition.
IEEE Trans. Computers, 2020

2019
Ultra low-power, high-performance accelerator for speech recognition.
PhD thesis, 2019

A Low-Power, High-Performance Speech Recognition Accelerator.
IEEE Trans. Computers, 2019

LSTM-Sharp: An Adaptable, Energy-Efficient Hardware Accelerator for Long Short-Term Memory.
CoRR, 2019

POSTER: Leveraging Run-Time Feedback for Efficient ASR Acceleration.
Proceedings of the 28th International Conference on Parallel Architectures and Compilation Techniques, 2019

2018
The Dark Side of DNN Pruning.
Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture, 2018

2017
Low-Power Automatic Speech Recognition Through a Mobile GPU and a Viterbi Accelerator.
IEEE Micro, 2017

UNFOLD: a memory-efficient speech recognizer using on-the-fly WFST composition.
Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

2016
An ultra low-power hardware accelerator for automatic speech recognition.
Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016

Fault-tolerant 3-D network-on-chip design using dynamic link sharing.
Proceedings of the 2016 Design, Automation & Test in Europe Conference & Exhibition, 2016


  Loading...