Wencong Xiao

Orcid: 0000-0002-3043-522X

According to our database¹, Wencong Xiao authored at least 42 papers between 2015 and 2026.

Collaborative distances:

Dijkstra number² of three.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

TensorHub: Scalable and Elastic Weight Transfer for LLM RL Training.

[BibT_eX]

[DOI]

Andrea C. Arpaci-Dusseau

Remzi H. Arpaci-Dusseau

CoRR, April, 2026

DistRS: Disaggregated Reward Service for RLVR with Batch-Level Constraint.

[BibT_eX]

[DOI]

Proceedings of the 23rd USENIX Symposium on Networked Systems Design and Implementation, 2026

2025

Fast LLM Post-training via Decoupled and Fastest-of-N Speculation.

[BibT_eX]

[DOI]

CoRR, November, 2025

Fine-Grained Structured Sparse Computing for FPGA-Based AI Inference.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., July, 2025

BootSeer: Analyzing and Mitigating Initialization Bottlenecks in Large-Scale LLM Training.

[BibT_eX]

[DOI]

CoRR, July, 2025

Subdivision load identification of ball mill based on multi-domain feature extraction and UMAP-BOXGBoost.

[BibT_eX]

[DOI]

Signal Image Video Process., April, 2025

Robust LLM Training Infrastructure at ByteDance.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGOPS 31st Symposium on Operating Systems Principles, 2025

Mycroft: Tracing Dependencies in Collective Communication Towards Reliable LLM Training.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGOPS 31st Symposium on Operating Systems Principles, 2025

Efficient Pre-Training of LLMs via Topology-Aware Communication Alignment on More Than 9600 GPUs.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

Rubick: Exploiting Job Reconfigurability for Deep Learning Cluster Scheduling.

[BibT_eX]

[DOI]

Proceedings of the Eighth Conference on Machine Learning and Systems, 2025

Enhancing Large-Scale AI Training Efficiency: The C4 Solution for Real-Time Anomaly Detection and Communication Optimization.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2025

Voyager: Input-Adaptive Algebraic Transformations for High-Performance Graph Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2025

2024

Multi-residual unit fusion and Wasserstein distance-based deep transfer learning for mill load recognition.

[BibT_eX]

[DOI]

Huazhi Xu

Xiaoyan Luo

Wencong Xiao

Signal Image Video Process., June, 2024

ElasticBatch: A Learning-Augmented Elastic Scheduling System for Batch Inference on MIG.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2024

Boosting Large-scale Parallel Training Efficiency with C4: A Communication-Driven Approach.

[BibT_eX]

[DOI]

CoRR, 2024

Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache.

[BibT_eX]

[DOI]

CoRR, 2024

Crux: GPU-Efficient Communication Scheduling for Deep Learning Training.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGCOMM 2024 Conference, 2024

Llumnix: Dynamic Scheduling for Large Language Model Serving.

[BibT_eX]

[DOI]

Proceedings of the 18th USENIX Symposium on Operating Systems Design and Implementation, 2024

2023

GoldMiner: Elastic Scaling of Training Data Pre-Processing Pipelines for Deep Learning.

[BibT_eX]

[DOI]

Proc. ACM Manag. Data, 2023

FusionAD: Multi-modality Fusion for Prediction and Planning Tasks of Autonomous Driving.

[BibT_eX]

[DOI]

CoRR, 2023

MIGPerf: A Comprehensive Benchmark for Deep Learning Training and Inference Workloads on Multi-Instance GPUs.

[BibT_eX]

[DOI]

CoRR, 2023

EasyScale: Elastic Training with Consistent Accuracy and Improved Utilization on GPUs.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2023

2022

EasyScale: Accuracy-consistent Elastic Training for Deep Learning.

[BibT_eX]

[DOI]

CoRR, 2022

Whale: Efficient Giant Model Training over Heterogeneous GPUs.

[BibT_eX]

[DOI]

Proceedings of the 2022 USENIX Annual Technical Conference, 2022

CoGNN: Efficient Scheduling for Concurrent GNN Training on GPUs.

[BibT_eX]

[DOI]

Proceedings of the SC22: International Conference for High Performance Computing, 2022

MLaaS in the Wild: Workload Analysis and Scheduling in Large-Scale Heterogeneous GPU Clusters.

[BibT_eX]

[DOI]

Proceedings of the 19th USENIX Symposium on Networked Systems Design and Implementation, 2022

2021

Zico: Efficient GPU Memory Sharing for Concurrent DNN Training.

[BibT_eX]

[DOI]

Proceedings of the 2021 USENIX Annual Technical Conference, 2021

2020

Distributed Graph Computation Meets Machine Learning.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2020

Focusing More on Conflicts with Mis-Predictions Helps Language Pre-Training.

[BibT_eX]

[DOI]

CoRR, 2020

AntMan: Dynamic Scaling on GPU Clusters for Deep Learning.

[BibT_eX]

[DOI]

Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation, 2020

An empirical study on program failures of deep learning jobs.

[BibT_eX]

[DOI]

Proceedings of the ICSE '20: 42nd International Conference on Software Engineering, Seoul, South Korea, 27 June, 2020

2019

BeamRaster: A Practical Fast Massive MU-MIMO System With Pre-Computed Precoders.

[BibT_eX]

[DOI]

IEEE Trans. Mob. Comput., 2019

Analysis of Large-Scale Multi-Tenant GPU Clusters for DNN Training Workloads.

[BibT_eX]

[DOI]

Myeongjae Jeon

Shivaram Venkataraman

Proceedings of the 2019 USENIX Annual Technical Conference, 2019

Efficient and Effective Sparse LSTM on FPGA with Bank-Balanced Sparsity.

[BibT_eX]

[DOI]

Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2019

SeerNet: Predicting Convolutional Neural Network Feature-Map Sparsity Through Low-Bit Quantization.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Balanced Sparsity for Efficient DNN Inference on GPU.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018

Gandiva: Introspective Cluster Scheduling for Deep Learning.

[BibT_eX]

[DOI]

Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation, 2018

Scheduling CPU for GPU-based Deep Learning Jobs.

[BibT_eX]

[DOI]

Proceedings of the ACM Symposium on Cloud Computing, 2018

2017

KV-Direct: High-Performance In-Memory Key-Value Store with Programmable NIC.

[BibT_eX]

[DOI]

Proceedings of the 26th Symposium on Operating Systems Principles, 2017

Tux<sup>2</sup>: Distributed Graph Computation for Machine Learning.

[BibT_eX]

[DOI]

Proceedings of the 14th USENIX Symposium on Networked Systems Design and Implementation, 2017

Memory Efficient Loss Recovery for Hardware-based Transport in Datacenter.

[BibT_eX]

[DOI]

Proceedings of the First Asia-Pacific Workshop on Networking, 2017

2015

GraM: scaling graph computation to the trillions.

[BibT_eX]

[DOI]

Proceedings of the Sixth ACM Symposium on Cloud Computing, 2015

Wencong Xiao

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...