Wencong Xiao

Orcid: 0000-0002-3043-522X

According to our database1, Wencong Xiao authored at least 26 papers between 2015 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Multi-residual unit fusion and Wasserstein distance-based deep transfer learning for mill load recognition.
Signal Image Video Process., June, 2024

Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache.
CoRR, 2024

2023
GoldMiner: Elastic Scaling of Training Data Pre-Processing Pipelines for Deep Learning.
Proc. ACM Manag. Data, 2023

FusionAD: Multi-modality Fusion for Prediction and Planning Tasks of Autonomous Driving.
CoRR, 2023

MIGPerf: A Comprehensive Benchmark for Deep Learning Training and Inference Workloads on Multi-Instance GPUs.
CoRR, 2023

EasyScale: Elastic Training with Consistent Accuracy and Improved Utilization on GPUs.
Proceedings of the International Conference for High Performance Computing, 2023

2022
EasyScale: Accuracy-consistent Elastic Training for Deep Learning.
CoRR, 2022

Whale: Efficient Giant Model Training over Heterogeneous GPUs.
Proceedings of the 2022 USENIX Annual Technical Conference, 2022

CoGNN: Efficient Scheduling for Concurrent GNN Training on GPUs.
Proceedings of the SC22: International Conference for High Performance Computing, 2022

MLaaS in the Wild: Workload Analysis and Scheduling in Large-Scale Heterogeneous GPU Clusters.
Proceedings of the 19th USENIX Symposium on Networked Systems Design and Implementation, 2022

2021
Zico: Efficient GPU Memory Sharing for Concurrent DNN Training.
Proceedings of the 2021 USENIX Annual Technical Conference, 2021

2020
Distributed Graph Computation Meets Machine Learning.
IEEE Trans. Parallel Distributed Syst., 2020

Focusing More on Conflicts with Mis-Predictions Helps Language Pre-Training.
CoRR, 2020

AntMan: Dynamic Scaling on GPU Clusters for Deep Learning.
Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation, 2020

An empirical study on program failures of deep learning jobs.
Proceedings of the ICSE '20: 42nd International Conference on Software Engineering, Seoul, South Korea, 27 June, 2020

2019
BeamRaster: A Practical Fast Massive MU-MIMO System With Pre-Computed Precoders.
IEEE Trans. Mob. Comput., 2019

Analysis of Large-Scale Multi-Tenant GPU Clusters for DNN Training Workloads.
Proceedings of the 2019 USENIX Annual Technical Conference, 2019

Efficient and Effective Sparse LSTM on FPGA with Bank-Balanced Sparsity.
Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2019

SeerNet: Predicting Convolutional Neural Network Feature-Map Sparsity Through Low-Bit Quantization.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Balanced Sparsity for Efficient DNN Inference on GPU.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018
Gandiva: Introspective Cluster Scheduling for Deep Learning.
Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation, 2018

Scheduling CPU for GPU-based Deep Learning Jobs.
Proceedings of the ACM Symposium on Cloud Computing, 2018

2017
KV-Direct: High-Performance In-Memory Key-Value Store with Programmable NIC.
Proceedings of the 26th Symposium on Operating Systems Principles, 2017

Tux<sup>2</sup>: Distributed Graph Computation for Machine Learning.
Proceedings of the 14th USENIX Symposium on Networked Systems Design and Implementation, 2017

Memory Efficient Loss Recovery for Hardware-based Transport in Datacenter.
Proceedings of the First Asia-Pacific Workshop on Networking, 2017

2015
GraM: scaling graph computation to the trillions.
Proceedings of the Sixth ACM Symposium on Cloud Computing, 2015


  Loading...