Yue Guan
Orcid: 0009-0005-7433-2627Affiliations:
- Shanghai Jiao Tong University, Department of Computer Science and Engineering, China
According to our database1,
Yue Guan
authored at least 18 papers
between 2020 and 2025.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on orcid.org
On csauthors.net:
Bibliography
2025
Orders in Chaos: Enhancing Large-Scale MoE LLM Serving with Data Movement Forecasting.
CoRR, October, 2025
CoRR, July, 2025
KPerfIR: Towards an Open and Compiler-centric Ecosystem for GPU Kernel Performance Tooling on Modern AI Workloads.
CoRR, May, 2025
Mercury: Unlocking Multi-GPU Operator Optimization for LLMs via Remote Memory Scheduling.
Proceedings of the ACM SIGOPS 31st Symposium on Operating Systems Principles, 2025
KPerfIR: Towards a Open and Compiler-centric Ecosystem for GPU Kernel Performance Tooling on Modern AI Workloads.
Proceedings of the 19th USENIX Symposium on Operating Systems Design and Implementation, 2025
Proceedings of the 19th USENIX Symposium on Operating Systems Design and Implementation, 2025
M-ANT: Efficient Low-bit Group Quantization for LLMs via Mathematically Adaptive Numerical Type.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2025
VQ-LLM: High-performance Code Generation for Vector Quantization Augmented LLM Inference.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2025
2024
Fractal: Joint Multi-Level Sparse Pattern Tuning of Accuracy and Performance for DNN Pruning.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024
2022
PAME: precision-aware multi-exit DNN serving for reducing latencies of batched inferences.
Proceedings of the ICS '22: 2022 International Conference on Supercomputing, Virtual Event, June 28, 2022
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022
2020
Co-Design of Binary Processing in Memory ReRAM Array and DNN Model Optimization Algorithm.
IEICE Trans. Electron., 2020
Proceedings of the International Conference for High Performance Computing, 2020
How Far Does BERT Look At: Distance-based Clustering and Analysis of BERT's Attention.
Proceedings of the 28th International Conference on Computational Linguistics, 2020