Xupeng Miao

Orcid: 0000-0002-9371-8358

According to our database¹, Xupeng Miao authored at least 78 papers between 2019 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Bibliography

2026

AutoSci: A Memory-Centric Agentic System for the Full Scientific Research Lifecycle.

[BibT_eX]

[DOI]

CoRR, May, 2026

DARTS: Distribution-Aware Active Rollout Trajectory Shaping for Accelerating LLM Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, May, 2026

Coral: Cost-Efficient Multi-LLM Serving over Heterogeneous Cloud GPUs.

[BibT_eX]

[DOI]

CoRR, May, 2026

Event Tensor: A Unified Abstraction for Compiling Dynamic Megakernel.

[BibT_eX]

[DOI]

CoRR, April, 2026

Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems.

[BibT_eX]

[DOI]

ACM Comput. Surv., January, 2026

FlexLLM: Token-Level Co-Serving of LLM Inference and Finetuning with SLO Guarantees.

[BibT_eX]

[DOI]

Proceedings of the 23rd USENIX Symposium on Networked Systems Design and Implementation, 2026

AdaServe: Accelerating Multi-SLO LLM Serving with SLO-Customized Speculative Decoding.

[BibT_eX]

[DOI]

Proceedings of the 21st European Conference on Computer Systems, 2026

2025

Mirage Persistent Kernel: A Compiler and Runtime for Mega-Kernelizing Tensor Programs.

[BibT_eX]

[DOI]

CoRR, December, 2025

TridentServe: A Stage-level Serving System for Diffusion Pipelines.

[BibT_eX]

[DOI]

CoRR, October, 2025

PQCache: Product Quantization-based KVCache for Long Context LLM Inference.

[BibT_eX]

[DOI]

Proc. ACM Manag. Data, June, 2025

Efficient and scalable huge embedding model training via distributed cache management.

[BibT_eX]

[DOI]

VLDB J., May, 2025

Hetu v2: A General and Scalable Deep Learning System with Hierarchical and Heterogeneous Single Program Multiple Data Annotations.

[BibT_eX]

[DOI]

CoRR, April, 2025

AdaServe: SLO-Customized LLM Serving with Fine-Grained Speculative Decoding.

[BibT_eX]

[DOI]

CoRR, January, 2025

Mirage: A Multi-Level Superoptimizer for Tensor Programs.

[BibT_eX]

[DOI]

Proceedings of the 19th USENIX Symposium on Operating Systems Design and Implementation, 2025

Demystifying Cost-Efficiency in LLM Serving over Heterogeneous GPUs.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

NetMoE: Accelerating MoE Training through Dynamic Sample Placement.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Spindle: Efficient Distributed Training of Multi-Task Large Models via Wavefront Scheduling.

[BibT_eX]

[DOI]

Proceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2025

Helix: Serving Large Language Models over Heterogeneous GPUs and Network via Max-Flow.

[BibT_eX]

[DOI]

Proceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2025

GraphPipe: Improving Performance and Scalability of DNN Training with Graph Pipeline Parallelism.

[BibT_eX]

[DOI]

Proceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2025

2024

Distributed Graph Neural Network Training: A Survey.

[BibT_eX]

[DOI]

ACM Comput. Surv., August, 2024

Improving Automatic Parallel Training via Balanced Memory Workload Optimization.

[BibT_eX]

[DOI]

IEEE Trans. Knowl. Data Eng., 2024

A System for Microserving of LLMs.

[BibT_eX]

[DOI]

CoRR, 2024

Efficient Multi-Task Large Model Training via Data Heterogeneity-aware Model Management.

[BibT_eX]

[DOI]

CoRR, 2024

Atlas: Hierarchical Partitioning for Quantum Circuit Simulation on GPUs (Extended Version).

[BibT_eX]

[DOI]

CoRR, 2024

Helix: Distributed Serving of Large Language Models via Max-Flow on Heterogeneous GPUs.

[BibT_eX]

[DOI]

CoRR, 2024

FlexLLM: A System for Co-Serving Large Language Model Inference and Parameter-Efficient Finetuning.

[BibT_eX]

[DOI]

CoRR, 2024

Quantized Side Tuning: Fast and Memory-Efficient Tuning of Quantized Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

Enabling Parallelism Hot Switching for Efficient Training of Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles, 2024

Demystifying Data Management for Large Language Models.

[BibT_eX]

[DOI]

Xupeng Miao

Zhihao Jia

Bin Cui

Proceedings of the Companion of the 2024 International Conference on Management of Data, 2024

Atlas: Hierarchical Partitioning for Quantum Circuit Simulation on GPUs.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2024

Parcae: Proactive, Liveput-Optimized DNN Training on Preemptible Instances.

[BibT_eX]

[DOI]

Proceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation, 2024

LSH-MoE: Communication-efficient MoE Training via Locality-Sensitive Hashing.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

X-former Elucidator: Reviving Efficient Attention for Long Context Language Modeling.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024

MFIX: An Efficient and Reliable Index Advisor via Multi-Fidelity Bayesian Optimization.

[BibT_eX]

[DOI]

Proceedings of the 40th IEEE International Conference on Data Engineering, 2024

Generative Dense Retrieval: Memory Can Be a Burden.

[BibT_eX]

[DOI]

Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics, 2024

SpotServe: Serving Generative Large Language Models on Preemptible Instances.

[BibT_eX]

[DOI]

Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

SpecInfer: Accelerating Large Language Model Serving with Tree-based Speculative Inference and Verification.

[BibT_eX]

[DOI]

Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

Optimal Kernel Orchestration for Tensor Programs with Korch.

[BibT_eX]

[DOI]

Muyan Hu

Ashwin Venkatram

Shreyashri Biswas

Balamurugan Marimuthu

Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

Quantized Side Tuning: Fast and Memory-Efficient Tuning of Quantized Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

Accelerating Text-to-Image Editing via Cache-Enabled Sparse Diffusion Inference.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

Experimental Analysis of Large-scale Learnable Vector Storage Compression.

[BibT_eX]

[DOI]

Proc. VLDB Endow., December, 2023

P<sup>2</sup>CG: a privacy preserving collaborative graph neural network training framework.

[BibT_eX]

[DOI]

VLDB J., July, 2023

Hetu: a highly efficient automatic parallel distributed deep learning system.

[BibT_eX]

[DOI]

Sci. China Inf. Sci., January, 2023

Lasagne: A Multi-Layer Graph Convolutional Network Framework via Node-Aware Deep Architecture.

[BibT_eX]

[DOI]

IEEE Trans. Knowl. Data Eng., 2023

Angel-PTM: A Scalable and Economical Large-scale Pre-training System in Tencent.

[BibT_eX]

[DOI]

Proc. VLDB Endow., 2023

SDPipe: A Semi-Decentralized Framework for Heterogeneity-aware Pipeline-parallel Training.

[BibT_eX]

[DOI]

Proc. VLDB Endow., 2023

FlexMoE: Scaling Large-scale Sparse Pre-trained Model Training via Dynamic Device Placement.

[BibT_eX]

[DOI]

Proc. ACM Manag. Data, 2023

Improving Automatic Parallel Training via Balanced Memory Workload Optimization.

[BibT_eX]

[DOI]

CoRR, 2023

FISEdit: Accelerating Text-to-image Editing via Cache-enabled Sparse Diffusion Inference.

[BibT_eX]

[DOI]

CoRR, 2023

SpecInfer: Accelerating Generative LLM Serving with Speculative Inference and Token Tree Verification.

[BibT_eX]

[DOI]

CoRR, 2023

EINNET: Optimizing Tensor Programs with Derivation-Based Transformations.

[BibT_eX]

[DOI]

Proceedings of the 17th USENIX Symposium on Operating Systems Design and Implementation, 2023

Model-enhanced Vector Index.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

OSDP: Optimal Sharded Data Parallel for Distributed Deep Learning.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

CALIP: Zero-Shot Enhancement of CLIP with Parameter-Free Attention.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022

CuWide: Towards Efficient Flow-Based Training for Sparse Wide Models on GPUs.

[BibT_eX]

[DOI]

IEEE Trans. Knowl. Data Eng., 2022

Galvatron: Efficient Transformer Training over Multiple GPUs Using Automatic Parallelism.

[BibT_eX]

[DOI]

Proc. VLDB Endow., 2022

Towards Communication-efficient Vertical Federated Learning Training via Cache-enabled Local Update.

[BibT_eX]

[DOI]

Proc. VLDB Endow., 2022

OSDP: Optimal Sharded Data Parallel for Distributed Deep Learning.

[BibT_eX]

[DOI]

CoRR, 2022

Towards Communication-efficient Vertical Federated Learning Training via Cache-enabled Local Updates.

[BibT_eX]

[DOI]

CoRR, 2022

HetuMoE: An Efficient Trillion-scale Mixture-of-Expert Distributed Training System.

[BibT_eX]

[DOI]

CoRR, 2022

HET-GMP: A Graph-based System Approach to Scaling Large Embedding Model Training.

[BibT_eX]

[DOI]

Proceedings of the SIGMOD '22: International Conference on Management of Data, Philadelphia, PA, USA, June 12, 2022

TSPLIT: Fine-grained GPU Memory Management for Efficient DNN Training via Tensor Splitting.

[BibT_eX]

[DOI]

Proceedings of the 38th IEEE International Conference on Data Engineering, 2022

Lasagne: A Multi-Layer Graph Convolutional Network Framework via Node-aware Deep Architecture (Extended Abstract).

[BibT_eX]

[DOI]

Proceedings of the 38th IEEE International Conference on Data Engineering, 2022

Zoomer: Boosting Retrieval on Web-scale Graphs by Regions of Interest.

[BibT_eX]

[DOI]

Proceedings of the 38th IEEE International Conference on Data Engineering, 2022

HET-KG: Communication-Efficient Knowledge Graph Embedding Training via Hotness-Aware Cache.

[BibT_eX]

[DOI]

Proceedings of the 38th IEEE International Conference on Data Engineering, 2022

PointCLIP: Point Cloud Understanding by CLIP.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Scalable Graph Sampling on GPUs with Compressed Graph.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Information & Knowledge Management, 2022

2021

Memory-aware framework for fast and scalable second-order random walk over billion-edge natural graphs.

[BibT_eX]

[DOI]

VLDB J., 2021

HET: Scaling out Huge Embedding Model Training via Cache-enabled Distributed Framework.

[BibT_eX]

[DOI]

Proc. VLDB Endow., 2021

Dense-to-Sparse Gate for Mixture-of-Experts.

[BibT_eX]

[DOI]

CoRR, 2021

Heterogeneity-Aware Distributed Machine Learning Training via Partial Reduce.

[BibT_eX]

[DOI]

Proceedings of the SIGMOD '21: International Conference on Management of Data, 2021

ROD: Reception-aware Online Distillation for Sparse Graphs.

[BibT_eX]

[DOI]

Proceedings of the KDD '21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2021

DeGNN: Improving Graph Neural Networks with Graph Decomposition.

[BibT_eX]

[DOI]

Proceedings of the KDD '21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2021

CuWide: Towards Efficient Flow-based Training for Sparse Wide Models on GPUs (Extended Abstract).

[BibT_eX]

[DOI]

Proceedings of the 37th IEEE International Conference on Data Engineering, 2021

2020

Reliable Data Distillation on Graph Convolutional Network.

[BibT_eX]

[DOI]

Proceedings of the 2020 International Conference on Management of Data, 2020

Memory-Aware Framework for Efficient Second-Order Random Walk on Large Graphs.

[BibT_eX]

[DOI]

Proceedings of the 2020 International Conference on Management of Data, 2020

PSGraph: How Tencent trains extremely large-scale graphs with Spark?

[BibT_eX]

[DOI]

Proceedings of the 36th IEEE International Conference on Data Engineering, 2020

2019

PS2: Parameter Server on Spark.

[BibT_eX]

[DOI]

Proceedings of the 2019 International Conference on Management of Data, 2019

Xupeng Miao

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...