Guangxuan Xiao

Orcid: 0000-0002-7182-9284

According to our database1, Guangxuan Xiao authored at least 22 papers between 2021 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
FastComposer: Tuning-Free Multi-subject Image Generation with Localized Attention.
Int. J. Comput. Vis., March, 2025

XAttention: Block Sparse Attention with Antidiagonal Scoring.
CoRR, March, 2025

LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention.
CoRR, February, 2025

DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Retrieval Head Mechanistically Explains Long-Context Factuality.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024
AWQ: Activation-aware Weight Quantization for On-Device LLM Compression and Acceleration.
GetMobile Mob. Comput. Commun., December, 2024

Correction to: Red Alarm for Pre-trained Models: Universal Vulnerability to Neuron-level Backdoor Attacks.
Mach. Intell. Res., December, 2024

FreshGNN: Reducing Memory Access via Stable Historical Embeddings for Graph Neural Network Training.
Proc. VLDB Endow., February, 2024

QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving.
CoRR, 2024

InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding Extremely Long Sequences with Training-Free Memory.
CoRR, 2024

InfLLM: Training-Free Long-Context Extrapolation for LLMs with an Efficient Context Memory.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

BitDelta: Your Fine-Tune May Only Be Worth One Bit.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

AWQ: Activation-aware Weight Quantization for On-Device LLM Compression and Acceleration.
Proceedings of the Seventh Annual Conference on Machine Learning and Systems, 2024

QUEST: Query-Aware Sparsity for Efficient Long-Context LLM Inference.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Efficient Streaming Language Models with Attention Sinks.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

2023
Red Alarm for Pre-trained Models: Universal Vulnerability to Neuron-level Backdoor Attacks.
Mach. Intell. Res., April, 2023

Offsite-Tuning: Transfer Learning without Full Model.
CoRR, 2023

ReFresh: Reducing Memory Access from Exploiting Stable Historical Embeddings for Graph Neural Network Training.
CoRR, 2023

SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models.
Proceedings of the International Conference on Machine Learning, 2023

2022
SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models.
CoRR, 2022

Sparse and Local Networks for Hypergraph Reasoning.
Proceedings of the Learning on Graphs Conference, 2022

2021
Red Alarm for Pre-trained Models: Universal Vulnerabilities by Neuron-Level Backdoor Attacks.
CoRR, 2021


  Loading...