Chengming Zhang

Orcid: 0000-0003-3008-9133

Affiliations:
  • Washington State University, Pullman, WA, USA
  • University of Alabama, Al, USA


According to our database1, Chengming Zhang authored at least 18 papers between 2020 and 2023.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2023
DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies.
CoRR, 2023

DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models.
CoRR, 2023

Benchmarking and In-depth Performance Study of Large Language Models on Habana Gaudi Processors.
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

TDC: Towards Extremely Efficient CNNs on GPUs via Hardware-Aware Tucker Decomposition.
Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2023

HEAT: A Highly Efficient and Affordable Training System for Collaborative Filtering Based Recommendation on CPUs.
Proceedings of the 37th International Conference on Supercomputing, 2023

HALOC: Hardware-Aware Automatic Low-Rank Compression for Compact Neural Networks.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
SOLAR: A Highly Optimized Data Loading Framework for Distributed Training of CNN-based Scientific Surrogates.
CoRR, 2022

CEAZ: accelerating parallel I/O via hardware-algorithm co-designed adaptive lossy compression.
Proceedings of the ICS '22: 2022 International Conference on Supercomputing, Virtual Event, June 28, 2022

H-GCN: A Graph Convolutional Network Accelerator on Versal ACAP Architecture.
Proceedings of the 32nd International Conference on Field-Programmable Logic and Applications, 2022

2021
COMET: A Novel Memory-Efficient Deep Learning Training Framework by Using Error-Bounded Lossy Compression.
Proc. VLDB Endow., 2021

CEAZ: Accelerating Parallel I/O via Hardware-Algorithm Co-Design of Efficient and Adaptive Lossy Compression.
CoRR, 2021

Improving DNN Fault Tolerance using Weight Pruning and Differential Crossbar Mapping for ReRAM-based Edge AI.
Proceedings of the 22nd International Symposium on Quality Electronic Design, 2021

ClickTrain: efficient and accurate end-to-end deep learning training via fine-grained architecture-preserving pruning.
Proceedings of the ICS '21: 2021 International Conference on Supercomputing, 2021

2020
An Efficient End-to-End Deep Learning Training Framework via Fine-Grained Pattern-Based Pruning.
CoRR, 2020

RTMobile: Beyond Real-Time Mobile Acceleration of RNNs for Speech Recognition.
CoRR, 2020

waveSZ: a hardware-algorithm co-design of efficient lossy compression for scientific data.
Proceedings of the PPoPP '20: 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2020

CurvaNet: Geometric Deep Learning based on Directional Curvature for 3D Shape Analysis.
Proceedings of the KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2020

RTMobile: Beyond Real-Time Mobile Acceleration of RNNs for Speech Recognition.
Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020


  Loading...