We stand with Ukraine

We stand with Ukraine

Haibin Lin

Orcid: 0000-0003-4879-5335

According to our database¹, Haibin Lin authored at least 66 papers between 2013 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

Online presence:

on orcid.org

On csauthors.net:

Bibliography

2026

DisagMoE: Computation-Communication overlapped MoE Training via Disaggregated AF-Pipe Parallelism.

[DOI]

,

,

,

,

,

,

Cesar A. Stuardo

,

,

Mohamed S. Abdelfattah

,

,

,

,

CoRR, May, 2026

SRT: Accelerating Reinforcement Learning via Speculative Rollout with Tree-Structured Cache.

[DOI]

,

,

,

,

,

Mohamed S. Abdelfattah

,

,

CoRR, January, 2026

MegaScale-Data: Scaling DataLoader for Multisource Large Foundation Model Training.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the 21st European Conference on Computer Systems, 2026

MegaScale-Omni: A Hyper-Scale, Workload-Resilient System for MultiModal LLM Training in Production.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the 21st European Conference on Computer Systems, 2026

Laminar: A Scalable Asynchronous RL Post-Training Framework.

[DOI]

Guangming Sheng

,

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the 21st European Conference on Computer Systems, 2026

MegaScale-MoE: Large-Scale Communication-Efficient Training of Mixture-of-Experts Models in Production.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the 21st European Conference on Computer Systems, 2026

SwiftSpec: Disaggregated Speculative Decoding and Fused Kernels for Low-Latency LLM Inference.

[DOI]

,

,

Chengquan Jiang

,

,

,

,

,

Proceedings of the 31st ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2026

It Takes Two to Entangle.

[DOI]

,

,

,

,

Proceedings of the 31st ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2026

2025

FAPO: Flawed-Aware Policy Optimization for Efficient and Reliable Reasoning.

[DOI]

,

,

,

,

,

CoRR, October, 2025

Boosting Embodied AI Agents through Perception-Generation Disaggregation and Asynchronous Pipeline Execution.

[DOI]

,

,

,

,

,

,

,

,

CoRR, September, 2025

veScale: Consistent and Efficient Tensor Programming with Eager-Mode SPMD.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, September, 2025

Verify Distributed Deep Learning Model Implementation Refinement with Iterative Relation Inference.

[DOI]

,

,

,

,

CoRR, August, 2025

SwiftSpec: Ultra-Low Latency LLM Decoding by Scaling Asynchronous Speculative Decoding.

[DOI]

,

,

Chengquan Jiang

,

,

,

,

,

CoRR, June, 2025

MegaScale-MoE: Large-Scale Communication-Efficient Training of Mixture-of-Experts Models in Production.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, May, 2025

Triton-distributed: Programming Overlapping Kernels on Distributed AI Systems with the Triton Compiler.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, April, 2025

OVERLORD: Ultimate Scaling of DataLoader for Multi-Source Large Foundation Model Training.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, April, 2025

VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks.

[DOI]

,

,

,

,

,

,

,

Cheng-Xiang Wang

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, April, 2025

MegaScale-Infer: Serving Mixture-of-Experts at Scale with Disaggregated Expert Parallelism.

[DOI]

,

,

,

,

Cesar A. Stuardo

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, April, 2025

DAPO: An Open-Source LLM Reinforcement Learning System at Scale.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

Guangming Sheng

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, March, 2025

ByteScale: Efficient Scaling of LLM Training with a 2048K Context Length on More Than 12,000 GPUs.

[DOI]

,

,

,

,

,

,

,

,

CoRR, February, 2025

Optimus: Accelerating Large-Scale Multi-Modal LLM Training by Bubble Exploitation.

[DOI]

,

,

,

,

,

Proceedings of the 2025 USENIX Annual Technical Conference, 2025

Robust LLM Training Infrastructure at ByteDance.

[DOI]

,

,

,

,

,

Guangming Sheng

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Yongqiang Zhang

,

,

,

,

,

,

,

,

Proceedings of the ACM SIGOPS 31st Symposium on Operating Systems Principles, 2025

MegaScale-Infer: Efficient Mixture-of-Experts Model Serving with Disaggregated Expert Parallelism.

[DOI]

,

,

,

,

Cesar A. Stuardo

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the ACM SIGCOMM 2025 Conference, 2025

From ATOP to ZCube: Automated Topology Optimization Pipeline and A Highly Cost-Effective Network Topology for Large Model Training.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the ACM SIGCOMM 2025 Conference, 2025

ByteScale: Communication-Efficient Scaling of LLM Training with a 2048K Context Length on 16384 GPUs.

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the ACM SIGCOMM 2025 Conference, 2025

Understanding Stragglers in Large Model Training Using What-if Analysis.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the 19th USENIX Symposium on Operating Systems Design and Implementation, 2025

ByteCheckpoint: A Unified Checkpointing System for Large Foundation Model Development.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the 22nd USENIX Symposium on Networked Systems Design and Implementation, 2025

Minder: Faulty Machine Detection for Large-scale Distributed Model Training.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the 22nd USENIX Symposium on Networked Systems Design and Implementation, 2025

DAPO: An Open-Source LLM Reinforcement Learning System at Scale.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Guangming Sheng

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

DUO: No Compromise to Accuracy Degradation.

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

COMET: Fine-grained Computation-communication Overlapping for Mixture-of-Experts.

[DOI]

,

,

,

,

,

Chengquan Jiang

,

,

,

,

,

,

Proceedings of the Eighth Conference on Machine Learning and Systems, 2025

TileLink: Generating Efficient Compute-Communication Overlapping Kernels using Tile-Centric Primitives.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the Eighth Conference on Machine Learning and Systems, 2025

HybridFlow: A Flexible and Efficient RLHF Framework.

[DOI]

Guangming Sheng

,

,

,

,

,

,

,

,

Proceedings of the Twentieth European Conference on Computer Systems, 2025

SplitQuant: Resource-Efficient LLM Offline Serving on Heterogeneous GPUs via Phase-Aware Model Partition and Adaptive Quantization.

[DOI]

,

,

,

,

Proceedings of the IEEE International Conference on Cluster Computing, 2025

2024

ByteCheckpoint: A Unified Checkpointing System for LLM Development.

[DOI]

,

,

,

,

,

,

,

,

,

CoRR, 2024

FLUX: Fast Software-based Communication Overlap On GPUs Through Kernel Fusion.

[DOI]

,

,

,

Chengquan Jiang

,

,

,

,

,

,

,

,

CoRR, 2024

LLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization.

[DOI]

,

,

,

,

CoRR, 2024

POSTER: LLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization.

[DOI]

,

,

,

,

Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2024

MegaScale: Scaling Large Language Model Training to More Than 10, 000 GPUs.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation, 2024

SDP4Bit: Toward 4-bit Communication Quantization in Sharded Data Parallelism for LLM Training.

[DOI]

,

,

,

,

,

Chengming Zhang

,

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices.

[DOI]

,

,

,

,

,

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

LEMON: Lossless model expansion.

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the Twelfth International Conference on Learning Representations, 2024

CDMPP: A Device-Model Agnostic Framework for Latency Prediction of Tensor Programs.

[DOI]

,

,

,

,

,

,

Proceedings of the Nineteenth European Conference on Computer Systems, 2024

2023

Hi-Speed DNN Training with Espresso: Unleashing the Full Potential of Gradient Compression with Near-Optimal Usage Strategies.

[DOI]

,

,

,

T. S. Eugene Ng

Proceedings of the Eighteenth European Conference on Computer Systems, 2023

2022

Espresso: Revisiting Gradient Compression from the System Perspective.

[DOI]

,

,

,

T. S. Eugene Ng

CoRR, 2022

dPRO: A Generic Profiling and Optimization System for Expediting Distributed DNN Training.

[DOI]

,

,

,

,

,

,

,

CoRR, 2022

SAPipe: Staleness-Aware Pipeline for Data Parallel DNN Training.

[DOI]

,

,

,

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

dPRO: A Generic Performance Diagnosis and Optimization Toolkit for Expediting Distributed DNN Training.

[DOI]

,

,

,

,

,

,

,

Proceedings of the Fifth Conference on Machine Learning and Systems, 2022

ResNeSt: Split-Attention Networks.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

Alexander J. Smola

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2022

The World of 5G - Volume 5: Intelligent Medicine

[DOI]

,

WorldScientific, ISBN: 9789811244216, 2022

2021

Compressed Communication for Distributed Training: Adaptive Methods and System.

[DOI]

,

,

,

CoRR, 2021

2020

GluonCV and GluonNLP: Deep Learning in Computer Vision and Natural Language Processing.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

J. Mach. Learn. Res., 2020

Accelerated Large Batch Optimization of BERT Pretraining in 54 minutes.

[DOI]

,

,

,

CoRR, 2020

Is Network the Bottleneck of Distributed Training?

[DOI]

,

,

,

,

,

Proceedings of the 2020 Workshop on Network Meets AI & ML, 2020

CSER: Communication-efficient SGD with Error Reset.

[DOI]

,

,

Oluwasanmi Koyejo

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Temporal-Contextual Recommendation in Real-Time.

[DOI]

,

Balakrishnan (Murali) Narayanaswamy

,

,

Proceedings of the KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2020

2019

Local AdaAlter: Communication-Efficient Stochastic Gradient Descent with Adaptive Learning Rates.

[DOI]

,

Oluwasanmi Koyejo

,

,

CoRR, 2019

Deep Graph Library: Towards Efficient and Scalable Deep Learning on Graphs.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Alexander J. Smola

,

CoRR, 2019

GluonCV and GluonNLP: Deep Learning in Computer Vision and Natural Language Processing.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, 2019

Dynamic Mini-batch SGD for Elastic Distributed Training: Learning in the Limbo of Resources.

[DOI]

,

,

,

,

,

,

CoRR, 2019

Just-in-Time Dynamic-Batching.

[DOI]

,

,

,

CoRR, 2019

Dive into Deep Learning for Natural Language Processing.

[DOI]

,

,

,

,

,

,

Alexander J. Smola

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019

2017

Self-Driving Database Management Systems.

[DOI]

,

,

,

,

,

,

Prashanth Menon

,

,

,

,

Siddharth Santurkar

,

Anthony Tomasic

,

,

,

,

,

,

Proceedings of the 8th Biennial Conference on Innovative Data Systems Research, 2017

2013

Salience-based feature preserving resizing for 3D models.

[DOI]

,

Proceedings of the SIGGRAPH Asia 2013, 2013

3D reconstruction of complex geometric solids from 2D line drawings.

[DOI]

,

Proceedings of the SIGGRAPH Asia 2013, 2013

Visual Saliency Guided Global and Local Resizing for 3D Models.

[DOI]

,

Proceedings of the 2013 International Conference on Computer-Aided Design and Computer Graphics, 2013

Loading...