Aurick Qiao

Orcid: 0009-0004-9119-8696

According to our database1, Aurick Qiao authored at least 23 papers between 2014 and 2026.

Collaborative distances:

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

Online presence:

On csauthors.net:

Bibliography

2026
MoE-Prefill: Zero Redundancy Overheads in MoE Prefill Serving.
CoRR, May, 2026

TAGQuant: Token-Aware Clustering for Group-Wise Quantization.
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics, 2026

Shift Parallelism: Low-Latency, High-Throughput LLM Inference for Dynamic Workloads.
Proceedings of the 31st ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2026

2025
OWL: Overcoming Window Length-Dependence in Speculative Decoding for Long-Context Inputs.
CoRR, October, 2025

Arctic Inference with Shift Parallelism: Fast and Efficient Open Source Inference System for Enterprise AI.
CoRR, July, 2025

Arctic Long Sequence Training: Scalable And Efficient Training For Multi-Million Token Sequences.
CoRR, June, 2025

TALE: Token-Adaptive Low-Rank KVCache Approximation with Reconstruction Elimination.
Trans. Assoc. Comput. Linguistics, 2025

SuffixDecoding: Extreme Speculative Decoding for Emerging AI Applications.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

Efficiently Scaling LLM Reasoning Programs with Certaindex.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

SwiftKV: Fast Prefill-Optimized Inference with Knowledge-Preserving Model Transformation.
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024
Efficiently Serving LLM Reasoning Programs with Certaindex.
CoRR, 2024

SuffixDecoding: A Model-Free Approach to Speeding Up Large Language Model Inference.
CoRR, 2024

Efficient LLM Scheduling by Learning to Rank.
Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

2023
LLM360: Towards Fully Transparent Open-Source LLMs.
CoRR, 2023

Sia: Heterogeneity-aware, goodput-optimized ML-cluster scheduling.
Proceedings of the 29th Symposium on Operating Systems Principles, 2023

2021
Elastic Machine Learning Systems with Co-adaptation.
PhD thesis, 2021

Pollux: Co-adaptive Cluster Scheduling for Goodput-Optimized Deep Learning.
Proceedings of the 15th USENIX Symposium on Operating Systems Design and Implementation, 2021

2020
Pollux: Co-adaptive Cluster Scheduling for Goodput-Optimized Deep Learning.
CoRR, 2020

2019
Fault Tolerance in Iterative-Convergent Machine Learning.
Proceedings of the 36th International Conference on Machine Learning, 2019

2018
Litz: Elastic Framework for High-Performance Distributed Machine Learning.
Proceedings of the 2018 USENIX Annual Technical Conference, 2018

2015
Managed communication and consistency for fast data-parallel iterative analytics.
Proceedings of the Sixth ACM Symposium on Cloud Computing, 2015

2014
Multi-Pivot Quicksort: Theory and Experiments.
Proceedings of the 2014 Proceedings of the Sixteenth Workshop on Algorithm Engineering and Experiments, 2014


  Loading...