Haibin Lin

Orcid: 0000-0003-4879-5335

According to our database1, Haibin Lin authored at least 49 papers between 2013 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
Verify Distributed Deep Learning Model Implementation Refinement with Iterative Relation Inference.
CoRR, August, 2025

SwiftSpec: Ultra-Low Latency LLM Decoding by Scaling Asynchronous Speculative Decoding.
CoRR, June, 2025

MegaScale-MoE: Large-Scale Communication-Efficient Training of Mixture-of-Experts Models in Production.
CoRR, May, 2025

Seed1.5-VL Technical Report.
CoRR, May, 2025

Triton-distributed: Programming Overlapping Kernels on Distributed AI Systems with the Triton Compiler.
CoRR, April, 2025

OVERLORD: Ultimate Scaling of DataLoader for Multi-Source Large Foundation Model Training.
CoRR, April, 2025

VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks.
CoRR, April, 2025

MegaScale-Infer: Serving Mixture-of-Experts at Scale with Disaggregated Expert Parallelism.
CoRR, April, 2025

TileLink: Generating Efficient Compute-Communication Overlapping Kernels using Tile-Centric Primitives.
CoRR, March, 2025

DAPO: An Open-Source LLM Reinforcement Learning System at Scale.
CoRR, March, 2025

ByteScale: Efficient Scaling of LLM Training with a 2048K Context Length on More Than 12,000 GPUs.
CoRR, February, 2025

Comet: Fine-grained Computation-communication Overlapping for Mixture-of-Experts.
CoRR, February, 2025

Optimus: Accelerating Large-Scale Multi-Modal LLM Training by Bubble Exploitation.
Proceedings of the 2025 USENIX Annual Technical Conference, 2025

Understanding Stragglers in Large Model Training Using What-if Analysis.
Proceedings of the 19th USENIX Symposium on Operating Systems Design and Implementation, 2025

ByteCheckpoint: A Unified Checkpointing System for Large Foundation Model Development.
Proceedings of the 22nd USENIX Symposium on Networked Systems Design and Implementation, 2025

Minder: Faulty Machine Detection for Large-scale Distributed Model Training.
Proceedings of the 22nd USENIX Symposium on Networked Systems Design and Implementation, 2025

HybridFlow: A Flexible and Efficient RLHF Framework.
Proceedings of the Twentieth European Conference on Computer Systems, 2025

2024
ByteCheckpoint: A Unified Checkpointing System for LLM Development.
CoRR, 2024

FLUX: Fast Software-based Communication Overlap On GPUs Through Kernel Fusion.
CoRR, 2024

LLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization.
CoRR, 2024

POSTER: LLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization.
Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2024

MegaScale: Scaling Large Language Model Training to More Than 10, 000 GPUs.
Proceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation, 2024

SDP4Bit: Toward 4-bit Communication Quantization in Sharded Data Parallelism for LLM Training.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

LEMON: Lossless model expansion.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

CDMPP: A Device-Model Agnostic Framework for Latency Prediction of Tensor Programs.
Proceedings of the Nineteenth European Conference on Computer Systems, 2024

2023
Hi-Speed DNN Training with Espresso: Unleashing the Full Potential of Gradient Compression with Near-Optimal Usage Strategies.
Proceedings of the Eighteenth European Conference on Computer Systems, 2023

2022
Espresso: Revisiting Gradient Compression from the System Perspective.
CoRR, 2022

dPRO: A Generic Profiling and Optimization System for Expediting Distributed DNN Training.
CoRR, 2022

SAPipe: Staleness-Aware Pipeline for Data Parallel DNN Training.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

dPRO: A Generic Performance Diagnosis and Optimization Toolkit for Expediting Distributed DNN Training.
Proceedings of the Fifth Conference on Machine Learning and Systems, 2022

ResNeSt: Split-Attention Networks.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2022

The World of 5G - Volume 5: Intelligent Medicine
WorldScientific, ISBN: 9789811244216, 2022

2021
Compressed Communication for Distributed Training: Adaptive Methods and System.
CoRR, 2021

2020
GluonCV and GluonNLP: Deep Learning in Computer Vision and Natural Language Processing.
J. Mach. Learn. Res., 2020

Accelerated Large Batch Optimization of BERT Pretraining in 54 minutes.
CoRR, 2020

Is Network the Bottleneck of Distributed Training?
Proceedings of the 2020 Workshop on Network Meets AI & ML, 2020

CSER: Communication-efficient SGD with Error Reset.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Temporal-Contextual Recommendation in Real-Time.
Proceedings of the KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2020

2019
Local AdaAlter: Communication-Efficient Stochastic Gradient Descent with Adaptive Learning Rates.
CoRR, 2019

Deep Graph Library: Towards Efficient and Scalable Deep Learning on Graphs.
CoRR, 2019

GluonCV and GluonNLP: Deep Learning in Computer Vision and Natural Language Processing.
CoRR, 2019

Dynamic Mini-batch SGD for Elastic Distributed Training: Learning in the Limbo of Resources.
CoRR, 2019

Just-in-Time Dynamic-Batching.
CoRR, 2019

Dive into Deep Learning for Natural Language Processing.
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019

2017
Self-Driving Database Management Systems.
Proceedings of the 8th Biennial Conference on Innovative Data Systems Research, 2017

2013
Salience-based feature preserving resizing for 3D models.
Proceedings of the SIGGRAPH Asia 2013, 2013

3D reconstruction of complex geometric solids from 2D line drawings.
Proceedings of the SIGGRAPH Asia 2013, 2013

Visual Saliency Guided Global and Local Resizing for 3D Models.
Proceedings of the 2013 International Conference on Computer-Aided Design and Computer Graphics, 2013


  Loading...