Fan Yang

Affiliations:

Chinese Academy of Sciences, Institute of Automation, Foundation Model Research Center, Beijing, China
SenseTime Research (former)

According to our database¹, Fan Yang authored at least 19 papers between 2019 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

TraceVision: Trajectory-Aware Vision-Language Model for Human-Like Spatial Understanding.

[BibT_eX]

[DOI]

CoRR, February, 2026

Seg-LLaVA: Empowering pixel-level understanding with large vision language model.

[BibT_eX]

[DOI]

Pattern Recognit., 2026

GeM-VG: Towards Generalized Multi-image Visual Grounding with Multimodal Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025

From Seeing to Predicting: A Vision-Language Framework for Trajectory Forecasting and Controlled Video Generation.

[BibT_eX]

[DOI]

CoRR, October, 2025

FOCUS: Unified Vision-Language Modeling for Interactive Editing Driven by Referential Segmentation.

[BibT_eX]

[DOI]

CoRR, June, 2025

Understand, Think, and Answer: Advancing Visual Reasoning with Large Multimodal Models.

[BibT_eX]

[DOI]

CoRR, May, 2025

Grounding-MD: Grounded Video-language Pre-training for Open-World Moment Detection.

[BibT_eX]

[DOI]

CoRR, April, 2025

Vision-R1: Evolving Human-Free Alignment in Large Vision-Language Models via Vision-Guided Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, March, 2025

Griffon v2: Advancing Multimodal Perception with High-Resolution Scaling and Visual-Language Co-Referring.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

2024

Griffon-G: Bridging Vision-Language and Vision-Centric Tasks via Large Multimodal Models.

[BibT_eX]

[DOI]

CoRR, 2024

Griffon v2: Advancing Multimodal Perception with High-Resolution Scaling and Visual-Language Co-Referring.

[BibT_eX]

[DOI]

CoRR, 2024

The Devil is in Details: Delving Into Lite FFN Design for Vision Transformers.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Griffon: Spelling Out All Object Locations at Any Granularity with Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

2023

SeqCo-DETR: Sequence Consistency Training for Self-Supervised Object Detection with Transformers.

[BibT_eX]

[DOI]

Proceedings of the 34th British Machine Vision Conference 2023, 2023

Exploring Stochastic Autoregressive Image Modeling for Visual Representation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022

Obj2Seq: Formatting Objects as Sequences with Class Prompt for Visual Tasks.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

UniVIP: A Unified Framework for Self-Supervised Visual Pre-training.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021

MST: Masked Self-Supervised Transformer for Visual Representation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

2019

Data Augmentation for Object Detection via Progressive and Selective Instance-Switching.

[BibT_eX]

[DOI]

CoRR, 2019

Fan Yang

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...