Yue Guan

Orcid: 0009-0005-7433-2627

Affiliations:

Shanghai Jiao Tong University, Department of Computer Science and Engineering, China

According to our database¹, Yue Guan authored at least 27 papers between 2020 and 2026.

Collaborative distances:

Dijkstra number² of three.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Bibliography

2026

TLX: Hardware-Native, Evolvable MIMW GPU Compiler for Large-scale Production Environments.

[BibT_eX]

[DOI]

Nicholas J. Riasanovsky

CoRR, May, 2026

FlashEvolve: Accelerating Agent Self-Evolution with Asynchronous Stage Orchestration.

[BibT_eX]

[DOI]

CoRR, May, 2026

JigsawRL: Assembling RL Pipelines for Efficient LLM Post-Training.

[BibT_eX]

[DOI]

CoRR, April, 2026

Pancake: Hierarchical Memory System for Multi-Agent LLM Serving.

[BibT_eX]

[DOI]

CoRR, February, 2026

ScaleSim: Serving Large-Scale Multi-Agent Simulation with Invocation Distance-Based Memory Management.

[BibT_eX]

[DOI]

CoRR, January, 2026

AutoOverlap: Enabling Fine-Grained Overlap of Computation and Communication with Chunk-Based Scheduling.

[BibT_eX]

[DOI]

CoRR, January, 2026

Proton: Towards Multi-level, Adaptive Profiling for Triton.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2026

2025

Orders in Chaos: Enhancing Large-Scale MoE LLM Serving with Data Movement Forecasting.

[BibT_eX]

[DOI]

CoRR, October, 2025

KPerfIR: Towards an Open and Compiler-centric Ecosystem for GPU Kernel Performance Tooling on Modern AI Workloads.

[BibT_eX]

[DOI]

CoRR, May, 2025

Mercury: Unlocking Multi-GPU Operator Optimization for LLMs via Remote Memory Scheduling.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGOPS 31st Symposium on Operating Systems Principles, 2025

KPerfIR: Towards a Open and Compiler-centric Ecosystem for GPU Kernel Performance Tooling on Modern AI Workloads.

[BibT_eX]

[DOI]

Proceedings of the 19th USENIX Symposium on Operating Systems Design and Implementation, 2025

WLB-LLM: Workload-Balanced 4D Parallelism for Large Language Model Training.

[BibT_eX]

[DOI]

Proceedings of the 19th USENIX Symposium on Operating Systems Design and Implementation, 2025

KVFlow: Efficient Prefix Caching for Accelerating LLM-Based Multi-Agent Workflows.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

Yggdrasil: Bridging Dynamic Speculation and Static Runtime for Latency-Optimal Tree-Based LLM Decoding.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

FastTree: Optimizing Attention Kernel and Runtime for Tree-Structured LLM Inference.

[BibT_eX]

[DOI]

Proceedings of the Eighth Conference on Machine Learning and Systems, 2025

An Efficient Private GPT Never Autoregressively Decodes.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

M-ANT: Efficient Low-bit Group Quantization for LLMs via Mathematically Adaptive Numerical Type.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2025

VQ-LLM: High-performance Code Generation for Vector Quantization Augmented LLM Inference.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2025

2024

Accelerating Sparse DNNs Based on Tiled GEMM.

[BibT_eX]

[DOI]

IEEE Trans. Computers, May, 2024

Fractal: Joint Multi-Level Sparse Pattern Tuning of Accuracy and Performance for DNN Pruning.

[BibT_eX]

[DOI]

Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

Amanda: Unified Instrumentation Framework for Deep Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

2022

PAME: precision-aware multi-exit DNN serving for reducing latencies of batched inferences.

[BibT_eX]

[DOI]

Proceedings of the ICS '22: 2022 International Conference on Supercomputing, Virtual Event, June 28, 2022

Transkimmer: Transformer Learns to Layer-wise Skim.

[BibT_eX]

[DOI]

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

Block-Skim: Efficient Question Answering for Transformer.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2020

Co-Design of Binary Processing in Memory ReRAM Array and DNN Model Optimization Algorithm.

[BibT_eX]

[DOI]

Yue Guan

Takashi Ohsawa

IEICE Trans. Electron., 2020

Accelerating sparse DNN models without hardware-support via tile-wise sparsity.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2020

How Far Does BERT Look At: Distance-based Clustering and Analysis of BERT's Attention.

[BibT_eX]

[DOI]

Proceedings of the 28th International Conference on Computational Linguistics, 2020

Yue Guan

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...