Chi-Chih Chang

CoRR, May, 2026

DARE: Diffusion Language Model Activation Reuse for Efficient Inference.

[BibT_eX]

[DOI]

Diana Marculescu

CoRR, May, 2026

Faster LLM Inference via Sequential Monte Carlo.

[BibT_eX]

[DOI]

Yahya Emara

Mauricio Barba da Costa

CoRR, April, 2026

Bit-Serial Acceleration of LLM Inference With Mixture-of-Datatype Quantization.

[BibT_eX]

[DOI]

Yuzong Chen

Xilai Dai

Marta Andronic

George A. Constantinides

IEEE Trans. Computers, February, 2026

SRT: Accelerating Reinforcement Learning via Speculative Rollout with Tree-Structured Cache.

[BibT_eX]

[DOI]

Ziheng Jiang

Xuehai Qian

CoRR, January, 2026

2025

UniQL: Unified Quantization and Low-rank Compression for Adaptive Edge LLMs.

[BibT_eX]

[DOI]

Diana Marculescu

CoRR, December, 2025

Accelerating Large-Scale Reasoning Model Inference with Sparse Self-Speculative Decoding.

[BibT_eX]

[DOI]

CoRR, December, 2025

Speculate Deep and Accurate: Lossless and Training-Free Acceleration for Offloaded LLMs via Substitute Speculative Decoding.

[BibT_eX]

[DOI]

Kai-Chiang Wu

CoRR, September, 2025

SplitReason: Learning To Offload Reasoning.

[BibT_eX]

[DOI]

Yash Akhauri

Anthony Fei

Yueying Li

CoRR, April, 2025

xKV: Cross-Layer SVD for KV-Cache Compression.

[BibT_eX]

[DOI]

CoRR, March, 2025

TokenButler: Token Importance is Predictable.

[BibT_eX]

[DOI]

Yash Akhauri

Yifei Gao

Nilesh Jain

CoRR, March, 2025

SparAMX: Accelerating Compressed LLMs Token Generation on AMX-powered CPUs.

[BibT_eX]

[DOI]

CoRR, February, 2025

The Power of Negative Zero: Datatype Customization for Quantized Large Language Models.

[BibT_eX]

[DOI]

CoRR, January, 2025

Quamba2: A Robust and Scalable Post-training Quantization Framework for Selective State Space Models.

[BibT_eX]

[DOI]

Diana Marculescu

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Quamba: A Post-Training Quantization Recipe for Selective State Space Models.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Palu: KV-Cache Compression with Low-Rank Projection.

[BibT_eX]

[DOI]

Kai-Chiang Wu

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Systolic Sparse Tensor Slices: FPGA Building Blocks for Sparse and Dense AI Acceleration.

[BibT_eX]

[DOI]

Proceedings of the 2025 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2025

FLRC: Fine-grained Low-Rank Compressor for Efficient LLM Inference.

[BibT_eX]

[DOI]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

2024

V"Mean"ba: Visual State Space Models only need 1 hidden dimension.

[BibT_eX]

[DOI]

CoRR, 2024

ELSA: Exploiting Layer-wise N:M Sparsity for Vision Transformer Acceleration.

[BibT_eX]

[DOI]

CoRR, 2024

Palu: Compressing KV-Cache with Low-Rank Projection.

[BibT_eX]

[DOI]

CoRR, 2024

FLORA: Fine-grained Low-Rank Architecture Search for Vision Transformer.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024

Transformer and Its Variants for Identifying Good Dice in Bad Neighborhoods.

[BibT_eX]

[DOI]

Proceedings of the 42nd IEEE VLSI Test Symposium, 2024

ELSA: Exploiting Layer-wise N: M Sparsity for Vision Transformer Acceleration.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

Q-YOLOP: Quantization-Aware You Only Look Once for Panoptic Driving Perception.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Multimedia and Expo Workshops, 2023

2004

Embedding information within dynamic visual patterns.

[BibT_eX]

[DOI]

Wen-Hung Liao