Xin Liu

Orcid: 0009-0004-0341-3860

Affiliations:
  • East China Normal University, Shanghai, China
  • Shanghai AI Laboratory, Shanghai, China


According to our database1, Xin Liu authored at least 49 papers between 2022 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

Online presence:

On csauthors.net:

Bibliography

2026
veScale-FSDP: Flexible and High-Performance FSDP at Scale.
CoRR, February, 2026

DASH: Deterministic Attention Scheduling for High-throughput Reproducible LLM Training.
CoRR, January, 2026

MegaScale-Omni: A Hyper-Scale, Workload-Resilient System for MultiModal LLM Training in Production.
Proceedings of the 21st European Conference on Computer Systems, 2026

Laminar: A Scalable Asynchronous RL Post-Training Framework.
Proceedings of the 21st European Conference on Computer Systems, 2026

MegaScale-MoE: Large-Scale Communication-Efficient Training of Mixture-of-Experts Models in Production.
Proceedings of the 21st European Conference on Computer Systems, 2026

SwiftSpec: Disaggregated Speculative Decoding and Fused Kernels for Low-Latency LLM Inference.
Proceedings of the 31st ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2026

OmniScale: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025
Cannikin: No Lagger of SLO in Concurrent Multiple LoRA LLM Serving.
IEEE Trans. Parallel Distributed Syst., September, 2025

VoxCPM: Tokenizer-Free TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning.
CoRR, September, 2025

Boosting Embodied AI Agents through Perception-Generation Disaggregation and Asynchronous Pipeline Execution.
CoRR, September, 2025

LiquidGEMM: Hardware-Efficient W4A8 GEMM Kernel for High-Performance LLM Serving.
CoRR, September, 2025

A Scalable Pipeline for Enabling Non-Verbal Speech Generation and Understanding.
CoRR, August, 2025

VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo.
CoRR, August, 2025

SwiftSpec: Ultra-Low Latency LLM Decoding by Scaling Asynchronous Speculative Decoding.
CoRR, June, 2025

MegaScale-MoE: Large-Scale Communication-Efficient Training of Mixture-of-Experts Models in Production.
CoRR, May, 2025

Triton-distributed: Programming Overlapping Kernels on Distributed AI Systems with the Triton Compiler.
CoRR, April, 2025

OVERLORD: Ultimate Scaling of DataLoader for Multi-Source Large Foundation Model Training.
CoRR, April, 2025

MegaScale-Infer: Serving Mixture-of-Experts at Scale with Disaggregated Expert Parallelism.
CoRR, April, 2025

ByteScale: Efficient Scaling of LLM Training with a 2048K Context Length on More Than 12,000 GPUs.
CoRR, February, 2025


MegaScale-Infer: Efficient Mixture-of-Experts Model Serving with Disaggregated Expert Parallelism.
Proceedings of the ACM SIGCOMM 2025 Conference, 2025

ByteScale: Communication-Efficient Scaling of LLM Training with a 2048K Context Length on 16384 GPUs.
Proceedings of the ACM SIGCOMM 2025 Conference, 2025

LiquidGEMM: Hardware-Efficient W4A8 GEMM Kernel for High-Performance LLM Serving.
Proceedings of the International Conference for High Performance Computing, 2025

Understanding Stragglers in Large Model Training Using What-if Analysis.
Proceedings of the 19th USENIX Symposium on Operating Systems Design and Implementation, 2025

ByteCheckpoint: A Unified Checkpointing System for Large Foundation Model Development.
Proceedings of the 22nd USENIX Symposium on Networked Systems Design and Implementation, 2025

COMET: Fine-grained Computation-communication Overlapping for Mixture-of-Experts.
Proceedings of the Eighth Conference on Machine Learning and Systems, 2025

TileLink: Generating Efficient Compute-Communication Overlapping Kernels using Tile-Centric Primitives.
Proceedings of the Eighth Conference on Machine Learning and Systems, 2025

ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference.
Proceedings of the Forty-second International Conference on Machine Learning, 2025

A Comprehensive Overhaul of Multimodal Assistant with Small Language Models.
Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024
ByteCheckpoint: A Unified Checkpointing System for LLM Development.
CoRR, 2024

MMRo: Are Multimodal LLMs Eligible as the Brain for In-Home Robotics?
CoRR, 2024

FLUX: Fast Software-based Communication Overlap On GPUs Through Kernel Fusion.
CoRR, 2024

RAGCache: Efficient Knowledge Caching for Retrieval-Augmented Generation.
CoRR, 2024

Mipha: A Comprehensive Overhaul of Multimodal Assistant with Small Language Models.
CoRR, 2024

MuxFlow: efficient GPU sharing in production-level clusters with more than 10000 GPUs.
Sci. China Inf. Sci., 2024

MegaScale: Scaling Large Language Model Training to More Than 10, 000 GPUs.
Proceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation, 2024

SDP4Bit: Toward 4-bit Communication Quantization in Sharded Data Parallelism for LLM Training.
Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Safety of Multimodal Large Language Models on Images and Text.
Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024

MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models.
Proceedings of the Computer Vision - ECCV 2024, 2024

2023
Query-Relevant Images Jailbreak Large Multi-Modal Models.
CoRR, 2023

MuxFlow: Efficient and Safe GPU Sharing in Large-Scale Production Deep Learning Clusters.
CoRR, 2023

vMF Loss: Exploring a Scattered Intra-class Hypersphere for Few-Shot Learning.
Proceedings of the Machine Learning and Knowledge Discovery in Databases: Research Track, 2023

Not All Tasks Are Equal: A Parameter-Efficient Task Reweighting Method for Few-Shot Learning.
Proceedings of the Machine Learning and Knowledge Discovery in Databases: Research Track, 2023

Improving Zero-shot Visual Question Answering via Large Language Models with Reasoning Question Prompts.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023

Recognizable Information Bottleneck.
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

2022
Adaptive distribution calibration for few-shot learning via optimal transport.
Inf. Sci., 2022

BaGuaLu: targeting brain scale pretrained models with over 37 million cores.
Proceedings of the PPoPP '22: 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Seoul, Republic of Korea, April 2, 2022

Teach Less, Learn More: On the Undistillable Classes in Knowledge Distillation.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022


  Loading...