Yue Guan

Orcid: 0009-0005-7433-2627

Affiliations:

Shanghai Jiao Tong University, Department of Computer Science and Engineering, China

According to our database¹, Yue Guan authored at least 18 papers between 2020 and 2025.

Collaborative distances:

Dijkstra number² of three.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Bibliography

2025

Orders in Chaos: Enhancing Large-Scale MoE LLM Serving with Data Movement Forecasting.

[BibT_eX]

[DOI]

CoRR, October, 2025

KVFlow: Efficient Prefix Caching for Accelerating LLM-Based Multi-Agent Workflows.

[BibT_eX]

[DOI]

CoRR, July, 2025

KPerfIR: Towards an Open and Compiler-centric Ecosystem for GPU Kernel Performance Tooling on Modern AI Workloads.

[BibT_eX]

[DOI]

CoRR, May, 2025

An Efficient Private GPT Never Autoregressively Decodes.

[BibT_eX]

[DOI]

CoRR, May, 2025

Mercury: Unlocking Multi-GPU Operator Optimization for LLMs via Remote Memory Scheduling.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGOPS 31st Symposium on Operating Systems Principles, 2025

KPerfIR: Towards a Open and Compiler-centric Ecosystem for GPU Kernel Performance Tooling on Modern AI Workloads.

[BibT_eX]

[DOI]

Proceedings of the 19th USENIX Symposium on Operating Systems Design and Implementation, 2025

WLB-LLM: Workload-Balanced 4D Parallelism for Large Language Model Training.

[BibT_eX]

[DOI]

Proceedings of the 19th USENIX Symposium on Operating Systems Design and Implementation, 2025

M-ANT: Efficient Low-bit Group Quantization for LLMs via Mathematically Adaptive Numerical Type.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2025

VQ-LLM: High-performance Code Generation for Vector Quantization Augmented LLM Inference.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2025

2024

Accelerating Sparse DNNs Based on Tiled GEMM.

[BibT_eX]

[DOI]

IEEE Trans. Computers, May, 2024

Fractal: Joint Multi-Level Sparse Pattern Tuning of Accuracy and Performance for DNN Pruning.

[BibT_eX]

[DOI]

Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

Amanda: Unified Instrumentation Framework for Deep Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

2022

PAME: precision-aware multi-exit DNN serving for reducing latencies of batched inferences.

[BibT_eX]

[DOI]

Proceedings of the ICS '22: 2022 International Conference on Supercomputing, Virtual Event, June 28, 2022

Transkimmer: Transformer Learns to Layer-wise Skim.

[BibT_eX]

[DOI]

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

Block-Skim: Efficient Question Answering for Transformer.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2020

Co-Design of Binary Processing in Memory ReRAM Array and DNN Model Optimization Algorithm.

[BibT_eX]

[DOI]

Yue Guan

Takashi Ohsawa

IEICE Trans. Electron., 2020

Accelerating sparse DNN models without hardware-support via tile-wise sparsity.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2020

How Far Does BERT Look At: Distance-based Clustering and Analysis of BERT's Attention.

[BibT_eX]

[DOI]

Proceedings of the 28th International Conference on Computational Linguistics, 2020

Yue Guan

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...