Fan Yang

Orcid: 0009-0005-4570-5885

Affiliations:

KuaiShou Inc., Beijing, China

According to our database¹, Fan Yang authored at least 36 papers between 2021 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Bibliography

2025

StreamingCoT: A Dataset for Temporal Dynamics and Multimodal Chain-of-Thought Reasoning in Streaming VideoQA.

[BibT_eX]

[DOI]

CoRR, October, 2025

Kwai Keye-VL 1.5 Technical Report.

[BibT_eX]

[DOI]

CoRR, September, 2025

OneRec-V2 Technical Report.

[BibT_eX]

[DOI]

CoRR, August, 2025

Thyme: Think Beyond Images.

[BibT_eX]

[DOI]

CoRR, August, 2025

COMPEER: Controllable Empathetic Reinforcement Reasoning for Emotional Support Conversation.

[BibT_eX]

[DOI]

CoRR, August, 2025

Kwai Keye-VL Technical Report.

[BibT_eX]

[DOI]

CoRR, July, 2025

Long-Tailed Distribution-Aware Router For Mixture-of-Experts in Large Vision-Language Model.

[BibT_eX]

[DOI]

CoRR, July, 2025

OneRec Technical Report.

[BibT_eX]

[DOI]

CoRR, June, 2025

Why Distillation can Outperform Zero-RL: The Role of Flexible Reasoning.

[BibT_eX]

[DOI]

CoRR, May, 2025

Who You Are Matters: Bridging Topics and Social Roles via LLM-Enhanced Logical Recommendation.

[BibT_eX]

[DOI]

CoRR, May, 2025

R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, May, 2025

VLM as Policy: Common-Law Content Moderation Framework for Short Video Platform.

[BibT_eX]

[DOI]

CoRR, April, 2025

InstructEngine: Instruction-driven Text-to-Image Alignment.

[BibT_eX]

[DOI]

CoRR, April, 2025

Decoupling Contrastive Decoding: Robust Hallucination Mitigation in Multimodal Large Language Models.

[BibT_eX]

[DOI]

CoRR, April, 2025

TIME: Temporal-sensitive Multi-dimensional Instruction Tuning and Benchmarking for Video-LLMs.

[BibT_eX]

[DOI]

CoRR, March, 2025

Exo2Ego: Exocentric Knowledge Guided MLLM for Egocentric Video Understanding.

[BibT_eX]

[DOI]

CoRR, March, 2025

External Large Foundation Model: How to Efficiently Serve Trillions of Parameters for Online Ads Recommendation.

[BibT_eX]

[DOI]

Amit Anand Amlesahwaram

CoRR, February, 2025

iMOVE: Instance-Motion-Aware Video Understanding.

[BibT_eX]

[DOI]

CoRR, February, 2025

MM-RLHF: The Next Step Forward in Multimodal LLM Alignment.

[BibT_eX]

[DOI]

CoRR, February, 2025

TaskGalaxy: Scaling Multi-modal Instruction Fine-tuning with Tens of Thousands Vision Task Types.

[BibT_eX]

[DOI]

CoRR, February, 2025

External Large Foundation Model: How to Efficiently Serve Trillions of Parameters for Online Ads Recommendation.

[BibT_eX]

[DOI]

Proceedings of the Companion Proceedings of the ACM on Web Conference 2025, 2025

SVBench: A Benchmark with Temporal Multi-Turn Dialogues for Streaming Video Understanding.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

TaskGalaxy: Scaling Multi-modal Instruction Fine-tuning with Tens of Thousands Vision Task Types.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

CoMM: A Coherent Interleaved Image-Text Dataset for Multimodal Understanding and Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

iMOVE : Instance-Motion-Aware Video Understanding.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2025

2024

Kwai-STaR: Transform LLMs into State-Transition Reasoners.

[BibT_eX]

[DOI]

CoRR, 2024

EVLM: An Efficient Vision-Language Model for Visual Understanding.

[BibT_eX]

[DOI]

CoRR, 2024

Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model.

[BibT_eX]

[DOI]

CoRR, 2024

Semantic Editing Increment Benefits Zero-Shot Composed Image Retrieval.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

A Multimodal Transformer for Live Streaming Highlight Prediction.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Multimedia and Expo, 2024

2023

ContentCTR: Frame-level Live Streaming Click-Through Rate Prediction with Multimodal Transformer.

[BibT_eX]

[DOI]

CoRR, 2023

A Unified Model for Video Understanding and Knowledge Embedding with Heterogeneous Knowledge Graph Dataset.

[BibT_eX]

[DOI]

Proceedings of the 2023 ACM International Conference on Multimedia Retrieval, 2023

2022

A Unified Model for Video Understanding and Knowledge Embedding with Heterogeneous Knowledge Graph Dataset.

[BibT_eX]

[DOI]

CoRR, 2022

MLTR: Multi-Label Classification with Transformer.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Multimedia and Expo, 2022

2021

Learning Interpretable Decision Rule Sets: A Submodular Optimization Approach.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Time Series Data Augmentation for Deep Learning: A Survey.

[BibT_eX]

[DOI]

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, 2021

Fan Yang

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...