Fan Yang

Orcid: 0000-0002-0378-060X

Affiliations:

Microsoft Research Asia, Beijing, China
Nanjing Universiiy, Department of Computer Science, State Key Lab for Novel Software Technology, China (former)

According to our database¹, Fan Yang authored at least 104 papers between 2003 and 2026.

Collaborative distances:

Dijkstra number² of three.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Bibliography

2026

RetroInfer: A Vector Storage Engine for Scalable Long-Context LLM Inference.

[BibT_eX]

[DOI]

Proc. VLDB Endow., January, 2026

Neuro-Symbolic Verification on Instruction Following of LLMs.

[BibT_eX]

[DOI]

CoRR, January, 2026

MetaAttention: A Unified and Performant Attention Framework across Hardware Backends.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2026

2025

LoongRL: Reinforcement Learning for Advanced Reasoning over Long Contexts.

[BibT_eX]

[DOI]

CoRR, October, 2025

rStar2-Agent: Agentic Reasoning Technical Report.

[BibT_eX]

[DOI]

CoRR, August, 2025

SeerAttention-R: Sparse Attention Adaptation for Long Reasoning.

[BibT_eX]

[DOI]

CoRR, June, 2025

rStar-Coder: Scaling Competitive Code Reasoning with a Large-Scale Verified Dataset.

[BibT_eX]

[DOI]

CoRR, May, 2025

TileLang: A Composable Tiled Programming Model for AI Systems.

[BibT_eX]

[DOI]

CoRR, April, 2025

AttentionEngine: A Versatile Framework for Efficient Attention Mechanisms on Diverse Hardware Platforms.

[BibT_eX]

[DOI]

CoRR, February, 2025

WaferLLM: A Wafer-Scale LLM Inference System.

[BibT_eX]

[DOI]

CoRR, February, 2025

LUT-DLA: Lookup Table as Efficient Extreme Low-Bit Deep Learning Accelerator.

[BibT_eX]

[DOI]

CoRR, January, 2025

AutoVerus: Automated Proof Generation for Rust Code.

[BibT_eX]

[DOI]

Chenyuan Yang

Xuheng Li

Md Rakib Hossain Misu

Proc. ACM Program. Lang., 2025

TrainVerify: Equivalence-Based Verification for Distributed LLM Training.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGOPS 31st Symposium on Operating Systems Principles, 2025

WaferLLM: Large Language Model Inference at Wafer Scale.

[BibT_eX]

[DOI]

Proceedings of the 19th USENIX Symposium on Operating Systems Design and Implementation, 2025

PipeThreader: Software-Defined Pipelining for Efficient DNN Execution.

[BibT_eX]

[DOI]

Proceedings of the 19th USENIX Symposium on Operating Systems Design and Implementation, 2025

LUT Tensor Core: A Software-Hardware Co-Design for LUT-Based Low-Bit LLM Inference.

[BibT_eX]

[DOI]

Proceedings of the 52nd Annual International Symposium on Computer Architecture, 2025

LongRoPE2: Near-Lossless LLM Context Window Scaling.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solver.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Automated Proof Generation for Rust Code via Self-Evolution.

[BibT_eX]

[DOI]

Md Rakib Hossain Misu

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

LUT-DLA: Lookup Table as Efficient Extreme Low-Bit Deep Learning Accelerator.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2025

2024

Efficient Schedule Construction for Distributed Execution of Large DNN Models.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., December, 2024

Baichuan Alignment Technical Report.

[BibT_eX]

[DOI]

CoRR, 2024

SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs.

[BibT_eX]

[DOI]

CoRR, 2024

RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval.

[BibT_eX]

[DOI]

CoRR, 2024

Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers.

[BibT_eX]

[DOI]

CoRR, 2024

LUT Tensor Core: Lookup Table Enables Efficient Low-Bit LLM Inference Acceleration.

[BibT_eX]

[DOI]

CoRR, 2024

CFBench: A Comprehensive Constraints-Following Benchmark for LLMs.

[BibT_eX]

[DOI]

CoRR, 2024

OneSparse: A Unified System for Multi-index Vector Search.

[BibT_eX]

[DOI]

Proceedings of the Companion Proceedings of the ACM on Web Conference 2024, 2024

MS MARCO Web Search: A Large-scale Information-rich Web Dataset with Millions of Real Click Labels.

[BibT_eX]

[DOI]

Harsha Vardhan Simhadri

Proceedings of the Companion Proceedings of the ACM on Web Conference 2024, 2024

Uncovering Nested Data Parallelism and Data Reuse in DNN Computation with FractalTensor.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles, 2024

Efficient Deployment of Large Language Model across Cloud-Device Systems.

[BibT_eX]

[DOI]

Proceedings of the 37th IEEE International System-on-Chip Conference, 2024

Ladder: Enabling Efficient Low-Precision Deep Learning Computing through Hardware-aware Tensor Transformation.

[BibT_eX]

[DOI]

Proceedings of the 18th USENIX Symposium on Operating Systems Design and Implementation, 2024

nnScaler: Constraint-Guided Parallelization Plan Generation for Deep Learning Training.

[BibT_eX]

[DOI]

Proceedings of the 18th USENIX Symposium on Operating Systems Design and Implementation, 2024

Parrot: Efficient Serving of LLM-based Applications with Semantic Variable.

[BibT_eX]

[DOI]

Proceedings of the 18th USENIX Symposium on Operating Systems Design and Implementation, 2024

Understanding the Weakness of Large Language Model Agents within a Complex Android Environment.

[BibT_eX]

[DOI]

Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2024

LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Integer or Floating Point? New Outlooks for Low-Bit Quantization on Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Multimedia and Expo, 2024

GLITCHES: GPU-FPGA LLM Inference Through a Collaborative Heterogeneous System.

[BibT_eX]

[DOI]

Proceedings of the IEEE High Performance Extreme Computing Conference, 2024

Tessel: Boosting Distributed Execution of Large DNN Models via Flexible Schedule Search.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2024

Aceso: Efficient Parallel DNN Training through Iterative Bottleneck Alleviation.

[BibT_eX]

[DOI]

Proceedings of the Nineteenth European Conference on Computer Systems, 2024

Fewer is More: Boosting Math Reasoning with Reinforced Context Pruning.

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

IRGen: Generative Modeling for Image Retrieval.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Amanda: Unified Instrumentation Framework for Deep Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

2023

BitNet: Scaling 1-bit Transformers for Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

Integer or Floating Point? New Outlooks for Low-Bit Quantization on Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

NUWA-XL: Diffusion over Diffusion for eXtremely Long Video Generation.

[BibT_eX]

[DOI]

CoRR, 2023

SparDA: Accelerating Dynamic Sparse Deep Neural Networks via Sparse-Dense Transformation.

[BibT_eX]

[DOI]

CoRR, 2023

SuperScaler: Supporting Flexible DNN Parallelization via a Unified Abstraction.

[BibT_eX]

[DOI]

CoRR, 2023

PIT: Optimization of Dynamic Sparse Deep Learning Models via Permutation Invariant Transformation.

[BibT_eX]

[DOI]

Proceedings of the 29th Symposium on Operating Systems Principles, 2023

SPFresh: Incremental In-Place Update for Billion-Scale Vector Search.

[BibT_eX]

[DOI]

Proceedings of the 29th Symposium on Operating Systems Principles, 2023

VBASE: Unifying Online Vector Similarity Search and Relational Queries via Relaxed Monotonicity.

[BibT_eX]

[DOI]

Proceedings of the 17th USENIX Symposium on Operating Systems Design and Implementation, 2023

Cocktailer: Analyzing and Optimizing Dynamic Control Flow in Deep Learning.

[BibT_eX]

[DOI]

Proceedings of the 17th USENIX Symposium on Operating Systems Design and Implementation, 2023

Optimizing Dynamic Neural Networks with Brainstorm.

[BibT_eX]

[DOI]

Proceedings of the 17th USENIX Symposium on Operating Systems Design and Implementation, 2023

Welder: Scheduling Deep Learning Memory Access via Tile-graph.

[BibT_eX]

[DOI]

Proceedings of the 17th USENIX Symposium on Operating Systems Design and Implementation, 2023

On Modular Learning of Distributed Systems for Predicting End-to-End Latency.

[BibT_eX]

[DOI]

Proceedings of the 20th USENIX Symposium on Networked Systems Design and Implementation, 2023

Model-enhanced Vector Index.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Efficient GPU Kernels for N: M-Sparse Weights in Deep Learning.

[BibT_eX]

[DOI]

Proceedings of the Sixth Conference on Machine Learning and Systems, 2023

Tutel: Adaptive Mixture-of-Experts at Scale.

[BibT_eX]

[DOI]

Proceedings of the Sixth Conference on Machine Learning and Systems, 2023

OliVe: Accelerating Large Language Models via Hardware-friendly Outlier-Victim Pair Quantization.

[BibT_eX]

[DOI]

Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

Learning 3D Photography Videos via Self-supervised Diffusion on Single Images.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

SiloD: A Co-design of Caching and Scheduling for Deep Learning Clusters.

[BibT_eX]

[DOI]

Proceedings of the Eighteenth European Conference on Computer Systems, 2023

Adam Accumulation to Reduce Memory Footprints of Both Activations and Gradients for Large-Scale DNN Training.

[BibT_eX]

[DOI]

Proceedings of the ECAI 2023 - 26th European Conference on Artificial Intelligence, September 30 - October 4, 2023, Kraków, Poland, 2023

ElasticFlow: An Elastic Serverless Training Platform for Distributed Deep Learning.

[BibT_eX]

[DOI]

Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

NUWA-XL: Diffusion over Diffusion for eXtremely Long Video Generation.

[BibT_eX]

[DOI]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

2022

Gating PatternPyramid for diversified image style transfer.

[BibT_eX]

[DOI]

J. Electronic Imaging, 2022

Distill-VQ: Learning Retrieval Oriented Vector Quantization By Distilling Knowledge from Dense Embeddings.

[BibT_eX]

[DOI]

CoRR, 2022

PilotFish: Harvesting Free Cycles of Cloud Gaming with Deep Learning Training.

[BibT_eX]

[DOI]

Proceedings of the 2022 USENIX Annual Technical Conference, 2022

Distill-VQ: Learning Retrieval Oriented Vector Quantization By Distilling Knowledge from Dense Embeddings.

[BibT_eX]

[DOI]

Proceedings of the SIGIR '22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, July 11, 2022

ROLLER: Fast and Efficient Tensor Compilation for Deep Learning.

[BibT_eX]

[DOI]

Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation, 2022

SparTA: Deep-Learning Model Sparsity via Tensor-with-Sparsity-Attribute.

[BibT_eX]

[DOI]

Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation, 2022

ANT: Exploiting Adaptive Numerical Data Type for Low-bit Deep Neural Network Quantization.

[BibT_eX]

[DOI]

Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022

SQuant: On-the-Fly Data-Free Quantization via Diagonal Hessian Approximation.

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Learning Representations, 2022

Nesting Forward Automatic Differentiation for Memory-Efficient Deep Neural Network Training.

[BibT_eX]

[DOI]

Proceedings of the IEEE 40th International Conference on Computer Design, 2022

NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

2021

GODIVA: Generating Open-DomaIn Videos from nAtural Descriptions.

[BibT_eX]

[DOI]

CoRR, 2021

2020

XGLUE: A New Benchmark Dataset for Cross-lingual Pre-training, Understanding and Generation.

[BibT_eX]

[DOI]

CoRR, 2020

HiveD: Sharing a GPU Cluster for Deep Learning with Guarantees.

[BibT_eX]

[DOI]

Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation, 2020

Retiarii: A Deep Learning Exploratory-Training Framework.

[BibT_eX]

[DOI]

Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation, 2020

Rammer: Enabling Holistic Deep Learning Compiler Optimizations with rTasks.

[BibT_eX]

[DOI]

Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation, 2020

XGLUE: A New Benchmark Datasetfor Cross-lingual Pre-training, Understanding and Generation.

[BibT_eX]

[DOI]

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

Capuchin: Tensor-based GPU Memory Management for Deep Learning.

[BibT_eX]

[DOI]

Proceedings of the ASPLOS '20: Architectural Support for Programming Languages and Operating Systems, 2020

2019

Scaling out NUMA-Aware Applications with RDMA-Based Distributed Shared Memory.

[BibT_eX]

[DOI]

J. Comput. Sci. Technol., 2019

Analysis of Large-Scale Multi-Tenant GPU Clusters for DNN Training Workloads.

[BibT_eX]

[DOI]

Myeongjae Jeon

Shivaram Venkataraman

Proceedings of the 2019 USENIX Annual Technical Conference, 2019

2018

Gandiva: Introspective Cluster Scheduling for Deep Learning.

[BibT_eX]

[DOI]

Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation, 2018

Scheduling CPU for GPU-based Deep Learning Jobs.

[BibT_eX]

[DOI]

Proceedings of the ACM Symposium on Cloud Computing, 2018

2015

ImmortalGraph: A System for Storage and Analysis of Temporal Graphs.

[BibT_eX]

[DOI]

ACM Trans. Storage, 2015

GraM: scaling graph computation to the trillions.

[BibT_eX]

[DOI]

Proceedings of the Sixth ACM Symposium on Cloud Computing, 2015

2014

Chronos: a graph engine for temporal graph analysis.

[BibT_eX]

[DOI]

Proceedings of the Ninth Eurosys Conference 2014, 2014

2012

Kineograph: taking the pulse of a fast-changing and connected world.

[BibT_eX]

[DOI]

Proceedings of the European Conference on Computer Systems, 2012

2007

Modeling path capacity in multi-hop IEEE 802.11 networks for QoS services.

[BibT_eX]

[DOI]

IEEE Trans. Wirel. Commun., 2007

Distributed Cooperative Rate Adaptation for Energy Efficiency in IEEE 802.11-Based Multihop Networks.

[BibT_eX]

[DOI]

IEEE Trans. Veh. Technol., 2007

Cooperative and opportunistic transmission for wireless ad hoc networks.

[BibT_eX]

[DOI]

IEEE Netw., 2007

2006

LION: Layered Overlay Multicast With Network Coding.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2006

Distributed Channel Assignment and Routing in Multiradio Multichannel Multihop Wireless Networks.

[BibT_eX]

[DOI]

IEEE J. Sel. Areas Commun., 2006

Distributed cooperative rate adaptation for energy efficiency in IEEE 802.11-based multi-hop networks.

[BibT_eX]

[DOI]

Proceedings of the 3rd International ICST Conference on Quality of Service in Heterogeneous Wired/Wireless Networks, 2006

Modeling Path Capacity in Multi-hop IEEE 802.11 Networks for QoS Services.

[BibT_eX]

[DOI]

Proceedings of the IEEE 3rd International Conference on Mobile Adhoc and Sensor Systems, 2006

Impact of Power and Rate Selection on the Throughput of Ad Hoc Networks.

[BibT_eX]

[DOI]

Proceedings of IEEE International Conference on Communications, 2006

On Improving the Throughput of Media Delivery Applications in Heterogeneous Overlay Network.

[BibT_eX]

[DOI]

Proceedings of the Global Telecommunications Conference, 2006. GLOBECOM '06, San Francisco, CA, USA, 27 November, 2006

2005

Cross-layer QoS Support for Multimedia Delivery over Wireless Internet.

[BibT_eX]

[DOI]

Qian Zhang

Fan Yang

Wenwu Zhu

EURASIP J. Adv. Signal Process., 2005

AMTP: a multipath multimedia streaming protocol for mobile ad hoc networks.

[BibT_eX]

[DOI]

Kultida Rojviboonchai

Proceedings of IEEE International Conference on Communications, 2005

2004

End-to-end TCP-friendly streaming protocol and bit allocation for scalable video over wireless Internet.

[BibT_eX]

[DOI]

IEEE J. Sel. Areas Commun., 2004

Streaming and Bit Allocation for Scalable Video over Mobile Wireless Internet.

[BibT_eX]

[DOI]

Proceedings of the Proceedings IEEE INFOCOM 2004, 2004

2003

An end-to-end TCP-friendly streaming protocol for multimedia over wireless Internet.

[BibT_eX]

[DOI]

Proceedings of the 2003 IEEE International Conference on Multimedia and Expo, 2003

Fan Yang

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...