Fan Yang

Orcid: 0009-0005-4570-5885

Affiliations:
  • KuaiShou Inc., Beijing, China


According to our database1, Fan Yang authored at least 52 papers between 2019 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

Online presence:

On csauthors.net:

Bibliography

2026
VCap: Hypergeometric Rewards for Weak-to-Strong Visual Captioning.
CoRR, May, 2026

CaC: Advancing Video Reward Models via Hierarchical Spatiotemporal Concentrating.
CoRR, May, 2026

From Semantics to Pixels: Coarse-to-Fine Masked Autoencoders for Hierarchical Visual Understanding.
CoRR, March, 2026

ContextRL: Enhancing MLLM's Knowledge Discovery Efficiency with Context-Augmented RL.
CoRR, February, 2026

CREM: Compression-Driven Representation Enhancement for Multimodal Retrieval and Comprehension.
CoRR, February, 2026

VideoTemp-o3: Harmonizing Temporal Grounding and Video Understanding in Agentic Thinking-with-Videos.
CoRR, February, 2026

ALPBench: A Benchmark for Attribution-level Long-term Personal Behavior Understanding.
CoRR, February, 2026


Compressing then Matching: An Efficient Pre-training Paradigm for Multimodal Embedding.
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2026

TIME: Temporal-Sensitive Multi-Dimensional Instruction Tuning and Robust Benchmarking for Video-LLMs.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025
Meta Lattice: Model Space Redesign for Cost-Effective Industry-Scale Ads Recommendations.
CoRR, December, 2025

Compression then Matching: An Efficient Pre-training Paradigm for Multimodal Embedding.
CoRR, November, 2025

LiveStar: Live Streaming Assistant for Real-World Online Video Understanding.
CoRR, November, 2025

Kwai Keye-VL 1.5 Technical Report.
CoRR, September, 2025

OneRec-V2 Technical Report.
CoRR, August, 2025

Thyme: Think Beyond Images.
CoRR, August, 2025

COMPEER: Controllable Empathetic Reinforcement Reasoning for Emotional Support Conversation.
CoRR, August, 2025

Kwai Keye-VL Technical Report.
CoRR, July, 2025

Long-Tailed Distribution-Aware Router For Mixture-of-Experts in Large Vision-Language Model.
CoRR, July, 2025

OneRec Technical Report.
CoRR, June, 2025

Why Distillation can Outperform Zero-RL: The Role of Flexible Reasoning.
CoRR, May, 2025

Who You Are Matters: Bridging Topics and Social Roles via LLM-Enhanced Logical Recommendation.
CoRR, May, 2025

R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning.
CoRR, May, 2025

InstructEngine: Instruction-driven Text-to-Image Alignment.
CoRR, April, 2025

TIME: Temporal-sensitive Multi-dimensional Instruction Tuning and Benchmarking for Video-LLMs.
CoRR, March, 2025

Exo2Ego: Exocentric Knowledge Guided MLLM for Egocentric Video Understanding.
CoRR, March, 2025

External Large Foundation Model: How to Efficiently Serve Trillions of Parameters for Online Ads Recommendation.
CoRR, February, 2025

iMOVE: Instance-Motion-Aware Video Understanding.
CoRR, February, 2025

MM-RLHF: The Next Step Forward in Multimodal LLM Alignment.
CoRR, February, 2025

TaskGalaxy: Scaling Multi-modal Instruction Fine-tuning with Tens of Thousands Vision Task Types.
CoRR, February, 2025

External Large Foundation Model: How to Efficiently Serve Trillions of Parameters for Online Ads Recommendation.
Proceedings of the Companion Proceedings of the ACM on Web Conference 2025, 2025

Who You Are Matters: Bridging Interests and Social Roles via LLM-Enhanced Logic Recommendation.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

Decoupling Contrastive Decoding: Robust Hallucination Mitigation in Multimodal Large Language Models.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

StreamingCoT: A Dataset for Temporal Dynamics and Multimodal Chain-of-Thought Reasoning in Streaming VideoQA.
Proceedings of the 33rd ACM International Conference on Multimedia, 2025

VLM as Policy: Common-Law Content Moderation Framework for Short Video Platform.
Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, V.2, 2025

MM-RLHF: The Next Step Forward in Multimodal LLM Alignment.
Proceedings of the Forty-second International Conference on Machine Learning, 2025

SVBench: A Benchmark with Temporal Multi-Turn Dialogues for Streaming Video Understanding.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

TaskGalaxy: Scaling Multi-modal Instruction Fine-tuning with Tens of Thousands Vision Task Types.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

CoMM: A Coherent Interleaved Image-Text Dataset for Multimodal Understanding and Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

iMOVE : Instance-Motion-Aware Video Understanding.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

2024
Kwai-STaR: Transform LLMs into State-Transition Reasoners.
CoRR, 2024

EVLM: An Efficient Vision-Language Model for Visual Understanding.
CoRR, 2024

Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model.
CoRR, 2024

Semantic Editing Increment Benefits Zero-Shot Composed Image Retrieval.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

A Multimodal Transformer for Live Streaming Highlight Prediction.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2024

2023
ContentCTR: Frame-level Live Streaming Click-Through Rate Prediction with Multimodal Transformer.
CoRR, 2023

A Unified Model for Video Understanding and Knowledge Embedding with Heterogeneous Knowledge Graph Dataset.
Proceedings of the 2023 ACM International Conference on Multimedia Retrieval, 2023

2022
A Unified Model for Video Understanding and Knowledge Embedding with Heterogeneous Knowledge Graph Dataset.
CoRR, 2022

MLTR: Multi-Label Classification with Transformer.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2022

2021
Learning Interpretable Decision Rule Sets: A Submodular Optimization Approach.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Time Series Data Augmentation for Deep Learning: A Survey.
Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, 2021

2019
A High Performance Text Vector Similarity Search Method Based on Overlapping Degree.
Proceedings of the 2019 International Conference on Data Mining Workshops, 2019


  Loading...