Yan Gao

Orcid: 0009-0004-5960-1684

Affiliations:

Xiaohongshu Inc., Beijing, China
Alibaba Group, Beijing, China
Chinese Academy of Sciences, Institute of Computing Technology, State Key Laboratory of Computer Architecture, Beijing, China

According to our database¹, Yan Gao authored at least 43 papers between 2019 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Bibliography

2025

DecEx-RAG: Boosting Agentic Retrieval-Augmented Generation with Decision and Execution Optimization via Process Supervision.

[BibT_eX]

[DOI]

CoRR, October, 2025

RealBench: A Chinese Multi-image Understanding Benchmark Close to Real-world Scenarios.

[BibT_eX]

[DOI]

CoRR, September, 2025

SelfAug: Mitigating Catastrophic Forgetting in Retrieval-Augmented Generation via Distribution Self-Alignment.

[BibT_eX]

[DOI]

CoRR, September, 2025

Decomposed Reasoning with Reinforcement Learning for Relevance Assessment in UGC Platforms.

[BibT_eX]

[DOI]

CoRR, August, 2025

RedOne: Revealing Domain-specific LLM Post-Training in Social Networking Services.

[BibT_eX]

[DOI]

CoRR, July, 2025

Plan Your Travel and Travel with Your Plan: Wide-Horizon Planning and Evaluation via LLM.

[BibT_eX]

[DOI]

CoRR, June, 2025

PaRT: Enhancing Proactive Social Chatbots with Personalized Real-Time Retrieval.

[BibT_eX]

[DOI]

Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2025

Qilin: A Multimodal Information Retrieval Dataset with APP-level User Sessions.

[BibT_eX]

[DOI]

Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2025

MoDification: Mixture of Depths Made Easy.

[BibT_eX]

[DOI]

Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

NoteLLM-2: Multimodal Large Representation Models for Recommendation.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, V.1, 2025

Understanding the RoPE Extensions of Long-Context LLMs: An Attention Perspective.

[BibT_eX]

[DOI]

Proceedings of the 31st International Conference on Computational Linguistics, 2025

ZigZagKV: Dynamic KV Cache Compression for Long-context Modeling based on Layer Uncertainty.

[BibT_eX]

[DOI]

Proceedings of the 31st International Conference on Computational Linguistics, 2025

Towards the Law of Capacity Gap in Distilling Language Models.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024

OV-DAR: Open-Vocabulary Object Detection and Attributes Recognition.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., November, 2024

TOMGPT: Reliable Text-Only Training Approach for Cost-Effective Multi-modal Large Language Model.

[BibT_eX]

[DOI]

ACM Trans. Knowl. Discov. Data, August, 2024

ScalingNote: Scaling up Retrievers with Large Language Models for Real-World Dense Retrieval.

[BibT_eX]

[DOI]

CoRR, 2024

Benchmarking Large Language Models for Conversational Question Answering in Multi-instructional Documents.

[BibT_eX]

[DOI]

CoRR, 2024

NoteLLM-2: Multimodal Large Representation Models for Recommendation.

[BibT_eX]

[DOI]

CoRR, 2024

From Image to Video, what do we need in multimodal LLMs?

[BibT_eX]

[DOI]

CoRR, 2024

Agent Group Chat: An Interactive Group Chat Simulacra For Better Eliciting Collective Emergent Behavior.

[BibT_eX]

[DOI]

CoRR, 2024

NoteLLM: A Retrievable Large Language Model for Note Recommendation.

[BibT_eX]

[DOI]

CoRR, 2024

NoteLLM: A Retrievable Large Language Model for Note Recommendation.

[BibT_eX]

[DOI]

Proceedings of the Companion Proceedings of the ACM on Web Conference 2024, 2024

Vript: A Video Is Worth Thousands of Words.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision Computation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Are Mixture-of-Modality-Experts Transformers Robust to Missing Modality During Training and Inferring?

[BibT_eX]

[DOI]

Yan Gao

Tong Xu

Enhong Chen

Proceedings of the Intelligent Information Processing XII, 2024

Caseg: Clip-Based Action Segmentation With Learnable Text Prompt.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Image Processing, 2024

DetectBench: Can Large Language Model Detect and Piece Together Implicit Evidence?

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

PyramidInfer: Pyramid KV Cache Compression for High-throughput LLM Inference.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2024

Efficient k-Nearest-Neighbor Machine Translation with Dynamic Retrieval.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2024

AQ-DETR: Low-Bit Quantized Detection Transformer with Auxiliary Queries.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

Towards the Law of Capacity Gap in Distilling Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

MVP-SEG: Multi-View Prompt Learning for Open-Vocabulary Semantic Segmentation.

[BibT_eX]

[DOI]

CoRR, 2023

MVP-SEG: Multi-view Prompt Learning for Open-Vocabulary Semantic Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Pattern Recognition and Computer Vision - 6th Chinese Conference, 2023

2INER: Instructive and In-Context Learning on Few-Shot Named Entity Recognition.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

OvarNet: Towards Open-Vocabulary Object Attribute Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

Occluded Video Instance Segmentation: A Benchmark.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., 2022

Unified QA-aware Knowledge Graph Generation Based on Multi-modal Modeling.

[BibT_eX]

[DOI]

Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

NFormer: Robust Person Re-identification with Neighbor Transformer.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021

Occluded Video Instance Segmentation.

[BibT_eX]

[DOI]

CoRR, 2021

Occluded Video Instance Segmentation: Dataset and ICCV 2021 Challenge.

[BibT_eX]

[DOI]

Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, 2021

Decoupled IoU Regression for Object Detection.

[BibT_eX]

[DOI]

Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

2020

1st Place Solutions for OpenImage2019 - Object Detection and Instance Segmentation.

[BibT_eX]

[DOI]

CoRR, 2020

2019

Utilizing the Instability in Weakly Supervised Object Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019

Yan Gao

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...