Fan Yang

Affiliations:
  • Chinese Academy of Sciences, Institute of Automation, Foundation Model Research Center, Beijing, China
  • SenseTime Research (former)


According to our database1, Fan Yang authored at least 19 papers between 2019 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
TraceVision: Trajectory-Aware Vision-Language Model for Human-Like Spatial Understanding.
CoRR, February, 2026

Seg-LLaVA: Empowering pixel-level understanding with large vision language model.
Pattern Recognit., 2026

GeM-VG: Towards Generalized Multi-image Visual Grounding with Multimodal Large Language Models.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025
From Seeing to Predicting: A Vision-Language Framework for Trajectory Forecasting and Controlled Video Generation.
CoRR, October, 2025

FOCUS: Unified Vision-Language Modeling for Interactive Editing Driven by Referential Segmentation.
CoRR, June, 2025

Understand, Think, and Answer: Advancing Visual Reasoning with Large Multimodal Models.
CoRR, May, 2025

Grounding-MD: Grounded Video-language Pre-training for Open-World Moment Detection.
CoRR, April, 2025

Vision-R1: Evolving Human-Free Alignment in Large Vision-Language Models via Vision-Guided Reinforcement Learning.
CoRR, March, 2025

Griffon v2: Advancing Multimodal Perception with High-Resolution Scaling and Visual-Language Co-Referring.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

2024
Griffon-G: Bridging Vision-Language and Vision-Centric Tasks via Large Multimodal Models.
CoRR, 2024

Griffon v2: Advancing Multimodal Perception with High-Resolution Scaling and Visual-Language Co-Referring.
CoRR, 2024

The Devil is in Details: Delving Into Lite FFN Design for Vision Transformers.
Proceedings of the IEEE International Conference on Acoustics, 2024

Griffon: Spelling Out All Object Locations at Any Granularity with Large Language Models.
Proceedings of the Computer Vision - ECCV 2024, 2024

2023
SeqCo-DETR: Sequence Consistency Training for Self-Supervised Object Detection with Transformers.
Proceedings of the 34th British Machine Vision Conference 2023, 2023

Exploring Stochastic Autoregressive Image Modeling for Visual Representation.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
Obj2Seq: Formatting Objects as Sequences with Class Prompt for Visual Tasks.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

UniVIP: A Unified Framework for Self-Supervised Visual Pre-training.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021
MST: Masked Self-Supervised Transformer for Visual Representation.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

2019
Data Augmentation for Object Detection via Progressive and Selective Instance-Switching.
CoRR, 2019


  Loading...