Haibin Lin
Orcid: 0000-0003-4879-5335
According to our database1,
Haibin Lin
authored at least 49 papers
between 2013 and 2025.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
2025
Verify Distributed Deep Learning Model Implementation Refinement with Iterative Relation Inference.
CoRR, August, 2025
SwiftSpec: Ultra-Low Latency LLM Decoding by Scaling Asynchronous Speculative Decoding.
CoRR, June, 2025
MegaScale-MoE: Large-Scale Communication-Efficient Training of Mixture-of-Experts Models in Production.
CoRR, May, 2025
Triton-distributed: Programming Overlapping Kernels on Distributed AI Systems with the Triton Compiler.
CoRR, April, 2025
OVERLORD: Ultimate Scaling of DataLoader for Multi-Source Large Foundation Model Training.
CoRR, April, 2025
CoRR, April, 2025
MegaScale-Infer: Serving Mixture-of-Experts at Scale with Disaggregated Expert Parallelism.
CoRR, April, 2025
TileLink: Generating Efficient Compute-Communication Overlapping Kernels using Tile-Centric Primitives.
CoRR, March, 2025
ByteScale: Efficient Scaling of LLM Training with a 2048K Context Length on More Than 12,000 GPUs.
CoRR, February, 2025
CoRR, February, 2025
Proceedings of the 2025 USENIX Annual Technical Conference, 2025
Proceedings of the 19th USENIX Symposium on Operating Systems Design and Implementation, 2025
ByteCheckpoint: A Unified Checkpointing System for Large Foundation Model Development.
Proceedings of the 22nd USENIX Symposium on Networked Systems Design and Implementation, 2025
Proceedings of the 22nd USENIX Symposium on Networked Systems Design and Implementation, 2025
Proceedings of the Twentieth European Conference on Computer Systems, 2025
2024
CoRR, 2024
LLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization.
CoRR, 2024
POSTER: LLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization.
Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2024
Proceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation, 2024
SDP4Bit: Toward 4-bit Communication Quantization in Sharded Data Parallelism for LLM Training.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024
Proceedings of the Twelfth International Conference on Learning Representations, 2024
Proceedings of the Nineteenth European Conference on Computer Systems, 2024
2023
Hi-Speed DNN Training with Espresso: Unleashing the Full Potential of Gradient Compression with Near-Optimal Usage Strategies.
Proceedings of the Eighteenth European Conference on Computer Systems, 2023
2022
dPRO: A Generic Profiling and Optimization System for Expediting Distributed DNN Training.
CoRR, 2022
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022
dPRO: A Generic Performance Diagnosis and Optimization Toolkit for Expediting Distributed DNN Training.
Proceedings of the Fifth Conference on Machine Learning and Systems, 2022
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2022
WorldScientific, ISBN: 9789811244216, 2022
2021
CoRR, 2021
2020
GluonCV and GluonNLP: Deep Learning in Computer Vision and Natural Language Processing.
J. Mach. Learn. Res., 2020
Proceedings of the 2020 Workshop on Network Meets AI & ML, 2020
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020
Proceedings of the KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2020
2019
Local AdaAlter: Communication-Efficient Stochastic Gradient Descent with Adaptive Learning Rates.
CoRR, 2019
CoRR, 2019
GluonCV and GluonNLP: Deep Learning in Computer Vision and Natural Language Processing.
CoRR, 2019
Dynamic Mini-batch SGD for Elastic Distributed Training: Learning in the Limbo of Resources.
CoRR, 2019
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019
2017
Proceedings of the 8th Biennial Conference on Innovative Data Systems Research, 2017
2013
Proceedings of the SIGGRAPH Asia 2013, 2013
Proceedings of the SIGGRAPH Asia 2013, 2013
Proceedings of the 2013 International Conference on Computer-Aided Design and Computer Graphics, 2013