Shaohuai Shi
Orcid: 0000-0002-1418-5160
According to our database1,
Shaohuai Shi
authored at least 56 papers
between 2010 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on orcid.org
On csauthors.net:
Bibliography
2024
ScheMoE: An Extensible Mixture-of-Experts Distributed Training System with Tasks Scheduling.
Proceedings of the Nineteenth European Conference on Computer Systems, 2024
2023
GossipFL: A Decentralized Federated Learning Framework With Sparsified and Adaptive Communication.
IEEE Trans. Parallel Distributed Syst., March, 2023
IEEE Trans. Cloud Comput., 2023
Dissecting the Runtime Performance of the Training, Fine-tuning, and Inference of Large Language Models.
CoRR, 2023
Reliable and Efficient In-Memory Fault Tolerance of Large Language Model Pretraining.
CoRR, 2023
FusionAI: Decentralized Training and Deploying LLMs with Massive Consumer-Level GPUs.
CoRR, 2023
CoRR, 2023
CoRR, 2023
A Generic Multi-Player Transformation Algorithm for Solving Large-Scale Zero-Sum Extensive-Form Adversarial Team Games.
CoRR, 2023
FedML Parrot: A Scalable Federated Learning System via Heterogeneity-aware Scheduling on Sequential and Hierarchical Training.
CoRR, 2023
CoRR, 2023
Accelerating Distributed K-FAC with Efficient Collective Communication and Scheduling.
Proceedings of the IEEE INFOCOM 2023, 2023
Proceedings of the IEEE INFOCOM 2023, 2023
Proceedings of the Eleventh International Conference on Learning Representations, 2023
Proceedings of the 43rd IEEE International Conference on Distributed Computing Systems, 2023
DeAR: Accelerating Distributed Deep Learning with Fine-Grained All-Reduce Pipelining.
Proceedings of the 43rd IEEE International Conference on Distributed Computing Systems, 2023
2022
CoRR, 2022
Nebula-I: A General Framework for Collaboratively Training Deep Learning Models on Low-Bandwidth Cloud Clusters.
CoRR, 2022
Virtual Homogeneity Learning: Defending against Data Heterogeneity in Federated Learning.
Proceedings of the International Conference on Machine Learning, 2022
Proceedings of the Computer Vision - ECCV 2022, 2022
2021
MG-WFBP: Merging Gradients Wisely for Efficient Communication in Distributed Deep Learning.
IEEE Trans. Parallel Distributed Syst., 2021
IEEE Netw., 2021
CoRR, 2021
Automated Model Design and Benchmarking of 3D Deep Learning Models for COVID-19 Detection with Chest CT Scans.
CoRR, 2021
Proceedings of Machine Learning and Systems 2021, 2021
Exploiting Simultaneous Communications to Accelerate Data Parallel Distributed Deep Learning.
Proceedings of the 40th IEEE Conference on Computer Communications, 2021
Accelerating Distributed K-FAC with Smart Parallelism of Computing and Communication Tasks.
Proceedings of the 41st IEEE International Conference on Distributed Computing Systems, 2021
Automated Model Design and Benchmarking of Deep Learning Models for COVID-19 Detection with Chest CT Scans.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021
2020
Communication-Efficient Distributed Deep Learning: Survey, Evaluation, and Challenges.
CoRR, 2020
CoRR, 2020
CoRR, 2020
Communication-Efficient Distributed Deep Learning with Merged Gradient Sparsification on GPUs.
Proceedings of the 39th IEEE Conference on Computer Communications, 2020
Proceedings of the 2020 IEEE International Conference on Robotics and Automation, 2020
Efficient Sparse-Dense Matrix-Matrix Multiplication on GPUs Using the Customized Sparse Storage Format.
Proceedings of the 26th IEEE International Conference on Parallel and Distributed Systems, 2020
Communication-Efficient Decentralized Learning with Sparsification and Adaptive Peer Selection.
Proceedings of the 40th IEEE International Conference on Distributed Computing Systems, 2020
Layer-Wise Adaptive Gradient Sparsification for Distributed Deep Learning with Convergence Guarantees.
Proceedings of the ECAI 2020 - 24th European Conference on Artificial Intelligence, 29 August-8 September 2020, Santiago de Compostela, Spain, August 29 - September 8, 2020, 2020
Benchmarking the Performance and Energy Efficiency of AI Accelerators for AI Training.
Proceedings of the 20th IEEE/ACM International Symposium on Cluster, 2020
2019
Proceedings of the 2019 IEEE Conference on Computer Communications, 2019
A Convergence Analysis of Distributed SGD with Communication-Efficient Gradient Sparsification.
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019
A Distributed Synchronous SGD Algorithm with Global Top-k Sparsification for Low Bandwidth Networks.
Proceedings of the 39th IEEE International Conference on Distributed Computing Systems, 2019
Computer-Aided Clinical Skin Disease Diagnosis Using CNN and Object Detection Models.
Proceedings of the 2019 IEEE International Conference on Big Data (IEEE BigData), 2019
2018
CoRR, 2018
Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes.
CoRR, 2018
Modeling and Evaluation of Synchronous Stochastic Gradient Descent in Distributed Deep Learning on Multiple GPUs.
CoRR, 2018
Proceedings of the 24th IEEE International Conference on Parallel and Distributed Systems, 2018
Proceedings of the 2018 IEEE 16th Intl Conf on Dependable, 2018
2017
CoRR, 2017
Improving the Performance of Fully Connected Neural Networks by Out-of-Place Matrix Transpose.
CoRR, 2017
Speeding up Convolutional Neural Networks By Exploiting the Sparsity of Rectifier Units.
CoRR, 2017
Proceedings of the 23rd IEEE International Conference on Parallel and Distributed Systems, 2017
Proceedings of the 3rd International Conference on Big Data Computing and Communications, 2017
2016
Proceedings of the 7th International Conference on Cloud Computing and Big Data, 2016
2011
Proceedings of the 14th IEEE International Conference on Computational Science and Engineering, 2011
2010
Proceedings of the 10th IEEE International Conference on Computer and Information Technology, 2010