Yue Guan

Orcid: 0009-0005-7433-2627

Affiliations:
  • Shanghai Jiao Tong University, Department of Computer Science and Engineering, China


According to our database1, Yue Guan authored at least 18 papers between 2020 and 2025.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
Orders in Chaos: Enhancing Large-Scale MoE LLM Serving with Data Movement Forecasting.
CoRR, October, 2025

KVFlow: Efficient Prefix Caching for Accelerating LLM-Based Multi-Agent Workflows.
CoRR, July, 2025

KPerfIR: Towards an Open and Compiler-centric Ecosystem for GPU Kernel Performance Tooling on Modern AI Workloads.
CoRR, May, 2025

An Efficient Private GPT Never Autoregressively Decodes.
CoRR, May, 2025

Mercury: Unlocking Multi-GPU Operator Optimization for LLMs via Remote Memory Scheduling.
Proceedings of the ACM SIGOPS 31st Symposium on Operating Systems Principles, 2025

KPerfIR: Towards a Open and Compiler-centric Ecosystem for GPU Kernel Performance Tooling on Modern AI Workloads.
Proceedings of the 19th USENIX Symposium on Operating Systems Design and Implementation, 2025

WLB-LLM: Workload-Balanced 4D Parallelism for Large Language Model Training.
Proceedings of the 19th USENIX Symposium on Operating Systems Design and Implementation, 2025

M-ANT: Efficient Low-bit Group Quantization for LLMs via Mathematically Adaptive Numerical Type.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2025

VQ-LLM: High-performance Code Generation for Vector Quantization Augmented LLM Inference.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2025

2024
Accelerating Sparse DNNs Based on Tiled GEMM.
IEEE Trans. Computers, May, 2024

Fractal: Joint Multi-Level Sparse Pattern Tuning of Accuracy and Performance for DNN Pruning.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

Amanda: Unified Instrumentation Framework for Deep Neural Networks.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

2022
PAME: precision-aware multi-exit DNN serving for reducing latencies of batched inferences.
Proceedings of the ICS '22: 2022 International Conference on Supercomputing, Virtual Event, June 28, 2022

Transkimmer: Transformer Learns to Layer-wise Skim.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

Block-Skim: Efficient Question Answering for Transformer.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2020
Co-Design of Binary Processing in Memory ReRAM Array and DNN Model Optimization Algorithm.
IEICE Trans. Electron., 2020

Accelerating sparse DNN models without hardware-support via tile-wise sparsity.
Proceedings of the International Conference for High Performance Computing, 2020

How Far Does BERT Look At: Distance-based Clustering and Analysis of BERT's Attention.
Proceedings of the 28th International Conference on Computational Linguistics, 2020


  Loading...