Shaohuai Shi

Orcid: 0000-0002-1418-5160

According to our database1, Shaohuai Shi authored at least 56 papers between 2010 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
FedImpro: Measuring and Improving Client Update in Federated Learning.
CoRR, 2024

ScheMoE: An Extensible Mixture-of-Experts Distributed Training System with Tasks Scheduling.
Proceedings of the Nineteenth European Conference on Computer Systems, 2024

2023
GossipFL: A Decentralized Federated Learning Framework With Sparsified and Adaptive Communication.
IEEE Trans. Parallel Distributed Syst., March, 2023

Scalable K-FAC Training for Deep Neural Networks With Distributed Preconditioning.
IEEE Trans. Cloud Comput., 2023

Dissecting the Runtime Performance of the Training, Fine-tuning, and Inference of Large Language Models.
CoRR, 2023

Reliable and Efficient In-Memory Fault Tolerance of Large Language Model Pretraining.
CoRR, 2023

FusionAI: Decentralized Training and Deploying LLMs with Massive Consumer-Level GPUs.
CoRR, 2023

LoRA-FA: Memory-efficient Low-rank Adaptation for Large Language Models Fine-tuning.
CoRR, 2023

Eva: A General Vectorized Approximation Framework for Second-order Optimization.
CoRR, 2023

A Generic Multi-Player Transformation Algorithm for Solving Large-Scale Zero-Sum Extensive-Form Adversarial Team Games.
CoRR, 2023

FedML Parrot: A Scalable Federated Learning System via Heterogeneity-aware Scheduling on Sequential and Hierarchical Training.
CoRR, 2023

Decoupling the All-Reduce Primitive for Accelerating Distributed Deep Learning.
CoRR, 2023

Accelerating Distributed K-FAC with Efficient Collective Communication and Scheduling.
Proceedings of the IEEE INFOCOM 2023, 2023

PipeMoE: Accelerating Mixture-of-Experts through Adaptive Pipelining.
Proceedings of the IEEE INFOCOM 2023, 2023

Eva: Practical Second-order Optimization with Kronecker-vectorized Approximation.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Evaluation and Optimization of Gradient Compression for Distributed Deep Learning.
Proceedings of the 43rd IEEE International Conference on Distributed Computing Systems, 2023

DeAR: Accelerating Distributed Deep Learning with Fine-Grained All-Reduce Pipelining.
Proceedings of the 43rd IEEE International Conference on Distributed Computing Systems, 2023

2022
An Efficient Split Fine-tuning Framework for Edge and Cloud Collaborative Learning.
CoRR, 2022

Nebula-I: A General Framework for Collaboratively Training Deep Learning Models on Low-Bandwidth Cloud Clusters.
CoRR, 2022

Virtual Homogeneity Learning: Defending against Data Heterogeneity in Federated Learning.
Proceedings of the International Conference on Machine Learning, 2022

EASNet: Searching Elastic and Accurate Network Architecture for Stereo Matching.
Proceedings of the Computer Vision - ECCV 2022, 2022

2021
MG-WFBP: Merging Gradients Wisely for Efficient Communication in Distributed Deep Learning.
IEEE Trans. Parallel Distributed Syst., 2021

A Quantitative Survey of Communication Optimizations in Distributed Deep Learning.
IEEE Netw., 2021

FADNet++: Real-Time and Accurate Disparity Estimation with Configurable Networks.
CoRR, 2021

Automated Model Design and Benchmarking of 3D Deep Learning Models for COVID-19 Detection with Chest CT Scans.
CoRR, 2021

Towards Scalable Distributed Training of Deep Learning on Public Cloud Clusters.
Proceedings of Machine Learning and Systems 2021, 2021

Exploiting Simultaneous Communications to Accelerate Data Parallel Distributed Deep Learning.
Proceedings of the 40th IEEE Conference on Computer Communications, 2021

Accelerating Distributed K-FAC with Smart Parallelism of Computing and Communication Tasks.
Proceedings of the 41st IEEE International Conference on Distributed Computing Systems, 2021

Automated Model Design and Benchmarking of Deep Learning Models for COVID-19 Detection with Chest CT Scans.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
Communication-Efficient Distributed Deep Learning: Survey, Evaluation, and Challenges.
CoRR, 2020

Communication-Efficient Distributed Deep Learning: A Comprehensive Survey.
CoRR, 2020

Communication Contention Aware Scheduling of Multiple Deep Learning Training Jobs.
CoRR, 2020

Communication-Efficient Distributed Deep Learning with Merged Gradient Sparsification on GPUs.
Proceedings of the 39th IEEE Conference on Computer Communications, 2020

FADNet: A Fast and Accurate Network for Disparity Estimation.
Proceedings of the 2020 IEEE International Conference on Robotics and Automation, 2020

Efficient Sparse-Dense Matrix-Matrix Multiplication on GPUs Using the Customized Sparse Storage Format.
Proceedings of the 26th IEEE International Conference on Parallel and Distributed Systems, 2020

Communication-Efficient Decentralized Learning with Sparsification and Adaptive Peer Selection.
Proceedings of the 40th IEEE International Conference on Distributed Computing Systems, 2020

Layer-Wise Adaptive Gradient Sparsification for Distributed Deep Learning with Convergence Guarantees.
Proceedings of the ECAI 2020 - 24th European Conference on Artificial Intelligence, 29 August-8 September 2020, Santiago de Compostela, Spain, August 29 - September 8, 2020, 2020

Benchmarking the Performance and Energy Efficiency of AI Accelerators for AI Training.
Proceedings of the 20th IEEE/ACM International Symposium on Cluster, 2020

2019
Understanding Top-k Sparsification in Distributed Deep Learning.
CoRR, 2019

MG-WFBP: Efficient Data Communication for Distributed Synchronous SGD Algorithms.
Proceedings of the 2019 IEEE Conference on Computer Communications, 2019

A Convergence Analysis of Distributed SGD with Communication-Efficient Gradient Sparsification.
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019

A Distributed Synchronous SGD Algorithm with Global Top-k Sparsification for Low Bandwidth Networks.
Proceedings of the 39th IEEE International Conference on Distributed Computing Systems, 2019

Computer-Aided Clinical Skin Disease Diagnosis Using CNN and Object Detection Models.
Proceedings of the 2019 IEEE International Conference on Big Data (IEEE BigData), 2019

2018
MG-WFBP: Efficient Data Communication for Distributed Synchronous SGD Algorithms.
CoRR, 2018

Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes.
CoRR, 2018

Modeling and Evaluation of Synchronous Stochastic Gradient Descent in Distributed Deep Learning on Multiple GPUs.
CoRR, 2018

A DAG Model of Synchronous Stochastic Gradient Descent in Distributed Deep Learning.
Proceedings of the 24th IEEE International Conference on Parallel and Distributed Systems, 2018

Performance Modeling and Evaluation of Distributed Deep Learning Frameworks on GPUs.
Proceedings of the 2018 IEEE 16th Intl Conf on Dependable, 2018

2017
Performance Modeling and Evaluation of Distributed Deep Learning Frameworks on GPUs.
CoRR, 2017

Improving the Performance of Fully Connected Neural Networks by Out-of-Place Matrix Transpose.
CoRR, 2017

Speeding up Convolutional Neural Networks By Exploiting the Sparsity of Rectifier Units.
CoRR, 2017

Supervised Learning Based Algorithm Selection for Deep Neural Networks.
Proceedings of the 23rd IEEE International Conference on Parallel and Distributed Systems, 2017

Performance Evaluation of Deep Learning Tools in Docker Containers.
Proceedings of the 3rd International Conference on Big Data Computing and Communications, 2017

2016
Benchmarking State-of-the-Art Deep Learning Software Tools.
Proceedings of the 7th International Conference on Cloud Computing and Big Data, 2016

2011
Mixed Precision Method for GPU-based FFT.
Proceedings of the 14th IEEE International Conference on Computational Science and Engineering, 2011

2010
The GPU-based String Matching System in Advanced AC Algorithm.
Proceedings of the 10th IEEE International Conference on Computer and Information Technology, 2010


  Loading...