Zaifeng Pan

Orcid: 0000-0002-6759-2616

According to our database1, Zaifeng Pan authored at least 20 papers between 2021 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
AMMA: A Multi-Chiplet Memory-Centric Architecture for Low-Latency 1M Context Attention Serving.
CoRR, April, 2026

JigsawRL: Assembling RL Pipelines for Efficient LLM Post-Training.
CoRR, April, 2026

Pancake: Hierarchical Memory System for Multi-Agent LLM Serving.
CoRR, February, 2026

ScaleSim: Serving Large-Scale Multi-Agent Simulation with Invocation Distance-Based Memory Management.
CoRR, January, 2026

ChipBench: A Next-Step Benchmark for Evaluating LLM Performance in AI-Aided Chip Design.
CoRR, January, 2026

2025
Yggdrasil: Bridging Dynamic Speculation and Static Runtime for Latency-Optimal Tree-Based LLM Decoding.
CoRR, December, 2025

HedraRAG: Coordinating LLM Generation and Database Retrieval in Heterogeneous RAG Serving.
CoRR, July, 2025

KVFlow: Efficient Prefix Caching for Accelerating LLM-Based Multi-Agent Workflows.
CoRR, July, 2025

PluS: Highly Efficient and Expandable ML Compiler with Pluggable Graph Schedules.
Proceedings of the 2025 USENIX Annual Technical Conference, 2025

HedraRAG: Co-Optimizing Generation and Retrieval for Heterogeneous RAG Workflows.
Proceedings of the ACM SIGOPS 31st Symposium on Operating Systems Principles, 2025

Mercury: Unlocking Multi-GPU Operator Optimization for LLMs via Remote Memory Scheduling.
Proceedings of the ACM SIGOPS 31st Symposium on Operating Systems Principles, 2025

WLB-LLM: Workload-Balanced 4D Parallelism for Large Language Model Training.
Proceedings of the 19th USENIX Symposium on Operating Systems Design and Implementation, 2025

FastTree: Optimizing Attention Kernel and Runtime for Tree-Structured LLM Inference.
Proceedings of the Eighth Conference on Machine Learning and Systems, 2025

2024
Compressed data direct computing for Chinese dataset on DCU.
CCF Trans. High Perform. Comput., April, 2024

RecFlex: Enabling Feature Heterogeneity-Aware Optimization for Deep Recommendation Models with Flexible Schedules.
Proceedings of the International Conference for High Performance Computing, 2024

2023
BladeDISC: Optimizing Dynamic Shape Machine Learning Workloads via Compiler Approach.
Proc. ACM Manag. Data, September, 2023

RECom: A Compiler Approach to Accelerating Recommendation Model Inference with Massive Embedding Columns.
Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

2022
Exploring Data Analytics Without Decompression on Embedded GPU Systems.
IEEE Trans. Parallel Distributed Syst., 2022

G-SLIDE: A GPU-Based Sub-Linear Deep Learning Engine via LSH Sparsification.
IEEE Trans. Parallel Distributed Syst., 2022

2021
G-TADOC: Enabling Efficient GPU-Based Text Analytics without Decompression.
Proceedings of the 37th IEEE International Conference on Data Engineering, 2021


  Loading...