Chaoyi Jiang

CoRR, April, 2026

2025

Striking the Right Balance between Compute and Copy: Improving LLM Inferencing Under Speculative Decoding.

[BibT_eX]

[DOI]

Arun Ramachandran

Ramaswamy Govindarajan

Prakash Sathyanath Raghavendra

CoRR, November, 2025

DuetServe: Harmonizing Prefill and Decode for LLM Serving via Adaptive GPU Multiplexing.

[BibT_eX]

[DOI]

Daniel Wong

CoRR, November, 2025

DELTA: Dynamic Layer-Aware Token Attention for Efficient Long-Context Reasoning.

[BibT_eX]

[DOI]

Murali Annavarm

CoRR, October, 2025

MARché: Fast Masked Autoregressive Image Generation with Cache-Aware Attention.

[BibT_eX]

[DOI]

Sungwoo Kim

Won Woo Ro

CoRR, June, 2025

DEL: Context-Aware Dynamic Exit Layer for Efficient Self-Speculative Decoding.

[BibT_eX]

[DOI]

CoRR, April, 2025

LEAF: Lightweight, Efficient, Adaptive and Flexible Embedding for Large-Scale Recommendation Models.

[BibT_eX]

[DOI]

Abdulla Alshabanah

Proceedings of the Nineteenth ACM Conference on Recommender Systems, 2025

Efficient Processing of Dynamic Rank-Happiness Maximization Queries.

[BibT_eX]

[DOI]

Jiping Zheng

Proceedings of the Web and Big Data - 9th International Joint Conference, 2025

KVPR: Efficient LLM Inference with I/O-Aware KV Cache Partial Recomputation.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2025

Balancing Fairness Among User Groups in Happiness Maximization Queries.

[BibT_eX]

[DOI]

Proceedings of the Web Information Systems and Applications, 2025

2024

Efficient LLM Inference with I/O-Aware Partial KV Cache Recomputation.

[BibT_eX]

[DOI]

CoRR, 2024

CADC: Encoding User-Item Interactions for Compressing Recommendation Model Training Data.

[BibT_eX]

[DOI]

Abdulla Alshabanah