Yue Guan
Orcid: 0009-0005-7433-2627Affiliations:
- Shanghai Jiao Tong University, Department of Computer Science and Engineering, China
According to our database1,
Yue Guan authored at least 27 papers
between 2020 and 2026.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on orcid.org
On csauthors.net:
Bibliography
2026
TLX: Hardware-Native, Evolvable MIMW GPU Compiler for Large-scale Production Environments.
CoRR, May, 2026
FlashEvolve: Accelerating Agent Self-Evolution with Asynchronous Stage Orchestration.
CoRR, May, 2026
CoRR, February, 2026
ScaleSim: Serving Large-Scale Multi-Agent Simulation with Invocation Distance-Based Memory Management.
CoRR, January, 2026
AutoOverlap: Enabling Fine-Grained Overlap of Computation and Communication with Chunk-Based Scheduling.
CoRR, January, 2026
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2026
2025
Orders in Chaos: Enhancing Large-Scale MoE LLM Serving with Data Movement Forecasting.
CoRR, October, 2025
KPerfIR: Towards an Open and Compiler-centric Ecosystem for GPU Kernel Performance Tooling on Modern AI Workloads.
CoRR, May, 2025
Mercury: Unlocking Multi-GPU Operator Optimization for LLMs via Remote Memory Scheduling.
Proceedings of the ACM SIGOPS 31st Symposium on Operating Systems Principles, 2025
KPerfIR: Towards a Open and Compiler-centric Ecosystem for GPU Kernel Performance Tooling on Modern AI Workloads.
Proceedings of the 19th USENIX Symposium on Operating Systems Design and Implementation, 2025
Proceedings of the 19th USENIX Symposium on Operating Systems Design and Implementation, 2025
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025
Yggdrasil: Bridging Dynamic Speculation and Static Runtime for Latency-Optimal Tree-Based LLM Decoding.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025
Proceedings of the Eighth Conference on Machine Learning and Systems, 2025
Proceedings of the Forty-second International Conference on Machine Learning, 2025
M-ANT: Efficient Low-bit Group Quantization for LLMs via Mathematically Adaptive Numerical Type.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2025
VQ-LLM: High-performance Code Generation for Vector Quantization Augmented LLM Inference.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2025
2024
Fractal: Joint Multi-Level Sparse Pattern Tuning of Accuracy and Performance for DNN Pruning.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024
2022
PAME: precision-aware multi-exit DNN serving for reducing latencies of batched inferences.
Proceedings of the ICS '22: 2022 International Conference on Supercomputing, Virtual Event, June 28, 2022
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022
2020
Co-Design of Binary Processing in Memory ReRAM Array and DNN Model Optimization Algorithm.
IEICE Trans. Electron., 2020
Proceedings of the International Conference for High Performance Computing, 2020
How Far Does BERT Look At: Distance-based Clustering and Analysis of BERT's Attention.
Proceedings of the 28th International Conference on Computational Linguistics, 2020