Fan Yang

Orcid: 0009-0005-4570-5885

Affiliations:

KuaiShou Inc., Beijing, China

According to our database¹, Fan Yang authored at least 52 papers between 2019 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Bibliography

2026

VCap: Hypergeometric Rewards for Weak-to-Strong Visual Captioning.

[BibT_eX]

[DOI]

CoRR, May, 2026

CaC: Advancing Video Reward Models via Hierarchical Spatiotemporal Concentrating.

[BibT_eX]

[DOI]

CoRR, May, 2026

From Semantics to Pixels: Coarse-to-Fine Masked Autoencoders for Hierarchical Visual Understanding.

[BibT_eX]

[DOI]

CoRR, March, 2026

ContextRL: Enhancing MLLM's Knowledge Discovery Efficiency with Context-Augmented RL.

[BibT_eX]

[DOI]

CoRR, February, 2026

CREM: Compression-Driven Representation Enhancement for Multimodal Retrieval and Comprehension.

[BibT_eX]

[DOI]

CoRR, February, 2026

VideoTemp-o3: Harmonizing Temporal Grounding and Video Understanding in Agentic Thinking-with-Videos.

[BibT_eX]

[DOI]

CoRR, February, 2026

ALPBench: A Benchmark for Attribution-level Long-term Personal Behavior Understanding.

[BibT_eX]

[DOI]

CoRR, February, 2026

Meta Lattice: Model Space Redesign for Cost-Effective Industry-Scale Ads Recommendations.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.1, 2026

Compressing then Matching: An Efficient Pre-training Paradigm for Multimodal Embedding.

[BibT_eX]

[DOI]

Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2026

TIME: Temporal-Sensitive Multi-Dimensional Instruction Tuning and Robust Benchmarking for Video-LLMs.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025

Meta Lattice: Model Space Redesign for Cost-Effective Industry-Scale Ads Recommendations.

[BibT_eX]

[DOI]

CoRR, December, 2025

Compression then Matching: An Efficient Pre-training Paradigm for Multimodal Embedding.

[BibT_eX]

[DOI]

CoRR, November, 2025

LiveStar: Live Streaming Assistant for Real-World Online Video Understanding.

[BibT_eX]

[DOI]

CoRR, November, 2025

Kwai Keye-VL 1.5 Technical Report.

[BibT_eX]

[DOI]

CoRR, September, 2025

OneRec-V2 Technical Report.

[BibT_eX]

[DOI]

CoRR, August, 2025

Thyme: Think Beyond Images.

[BibT_eX]

[DOI]

CoRR, August, 2025

COMPEER: Controllable Empathetic Reinforcement Reasoning for Emotional Support Conversation.

[BibT_eX]

[DOI]

CoRR, August, 2025

Kwai Keye-VL Technical Report.

[BibT_eX]

[DOI]

CoRR, July, 2025

Long-Tailed Distribution-Aware Router For Mixture-of-Experts in Large Vision-Language Model.

[BibT_eX]

[DOI]

CoRR, July, 2025

OneRec Technical Report.

[BibT_eX]

[DOI]

CoRR, June, 2025

Why Distillation can Outperform Zero-RL: The Role of Flexible Reasoning.

[BibT_eX]

[DOI]

CoRR, May, 2025

Who You Are Matters: Bridging Topics and Social Roles via LLM-Enhanced Logical Recommendation.

[BibT_eX]

[DOI]

CoRR, May, 2025

R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, May, 2025

InstructEngine: Instruction-driven Text-to-Image Alignment.

[BibT_eX]

[DOI]

CoRR, April, 2025

TIME: Temporal-sensitive Multi-dimensional Instruction Tuning and Benchmarking for Video-LLMs.

[BibT_eX]

[DOI]

CoRR, March, 2025

Exo2Ego: Exocentric Knowledge Guided MLLM for Egocentric Video Understanding.

[BibT_eX]

[DOI]

CoRR, March, 2025

External Large Foundation Model: How to Efficiently Serve Trillions of Parameters for Online Ads Recommendation.

[BibT_eX]

[DOI]

Amit Anand Amlesahwaram

CoRR, February, 2025

iMOVE: Instance-Motion-Aware Video Understanding.

[BibT_eX]

[DOI]

CoRR, February, 2025

MM-RLHF: The Next Step Forward in Multimodal LLM Alignment.

[BibT_eX]

[DOI]

CoRR, February, 2025

TaskGalaxy: Scaling Multi-modal Instruction Fine-tuning with Tens of Thousands Vision Task Types.

[BibT_eX]

[DOI]

CoRR, February, 2025

External Large Foundation Model: How to Efficiently Serve Trillions of Parameters for Online Ads Recommendation.

[BibT_eX]

[DOI]

Proceedings of the Companion Proceedings of the ACM on Web Conference 2025, 2025

Who You Are Matters: Bridging Interests and Social Roles via LLM-Enhanced Logic Recommendation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

Decoupling Contrastive Decoding: Robust Hallucination Mitigation in Multimodal Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

StreamingCoT: A Dataset for Temporal Dynamics and Multimodal Chain-of-Thought Reasoning in Streaming VideoQA.

[BibT_eX]

[DOI]

Proceedings of the 33rd ACM International Conference on Multimedia, 2025

VLM as Policy: Common-Law Content Moderation Framework for Short Video Platform.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, V.2, 2025

MM-RLHF: The Next Step Forward in Multimodal LLM Alignment.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

SVBench: A Benchmark with Temporal Multi-Turn Dialogues for Streaming Video Understanding.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

TaskGalaxy: Scaling Multi-modal Instruction Fine-tuning with Tens of Thousands Vision Task Types.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

CoMM: A Coherent Interleaved Image-Text Dataset for Multimodal Understanding and Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

iMOVE : Instance-Motion-Aware Video Understanding.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2025

2024

Kwai-STaR: Transform LLMs into State-Transition Reasoners.

[BibT_eX]

[DOI]

CoRR, 2024

EVLM: An Efficient Vision-Language Model for Visual Understanding.

[BibT_eX]

[DOI]

CoRR, 2024

Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model.

[BibT_eX]

[DOI]

CoRR, 2024

Semantic Editing Increment Benefits Zero-Shot Composed Image Retrieval.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

A Multimodal Transformer for Live Streaming Highlight Prediction.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Multimedia and Expo, 2024

2023

ContentCTR: Frame-level Live Streaming Click-Through Rate Prediction with Multimodal Transformer.

[BibT_eX]

[DOI]

CoRR, 2023

A Unified Model for Video Understanding and Knowledge Embedding with Heterogeneous Knowledge Graph Dataset.

[BibT_eX]

[DOI]

Proceedings of the 2023 ACM International Conference on Multimedia Retrieval, 2023

2022

A Unified Model for Video Understanding and Knowledge Embedding with Heterogeneous Knowledge Graph Dataset.

[BibT_eX]

[DOI]

CoRR, 2022

MLTR: Multi-Label Classification with Transformer.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Multimedia and Expo, 2022

2021

Learning Interpretable Decision Rule Sets: A Submodular Optimization Approach.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Time Series Data Augmentation for Deep Learning: A Survey.

[BibT_eX]

[DOI]

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, 2021

2019

A High Performance Text Vector Similarity Search Method Based on Overlapping Degree.

[BibT_eX]

[DOI]

Proceedings of the 2019 International Conference on Data Mining Workshops, 2019

Fan Yang

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...