Yufei Zhan

Orcid: 0009-0002-1377-8519

According to our database¹, Yufei Zhan authored at least 20 papers between 2021 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

TraceVision: Trajectory-Aware Vision-Language Model for Human-Like Spatial Understanding.

[BibT_eX]

[DOI]

CoRR, February, 2026

Seg-LLaVA: Empowering pixel-level understanding with large vision language model.

[BibT_eX]

[DOI]

Pattern Recognit., 2026

REFORMamba-Unet: Hierarchical gated refocusing convolution and Mamba-Based u-net for PECTPA 3D medical image.

[BibT_eX]

[DOI]

Biomed. Signal Process. Control., 2026

Baseline Method of the Foundation Model Challenge for Ultrasound Image Analysis.

[BibT_eX]

[DOI]

Proceedings of the 23rd IEEE International Symposium on Biomedical Imaging, 2026

GeM-VG: Towards Generalized Multi-image Visual Grounding with Multimodal Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025

Unleashing Perception-Time Scaling to Multimodal Reasoning Models.

[BibT_eX]

[DOI]

CoRR, October, 2025

Seeing is Believing? Mitigating OCR Hallucinations in Multimodal Large Language Models.

[BibT_eX]

[DOI]

CoRR, June, 2025

FOCUS: Unified Vision-Language Modeling for Interactive Editing Driven by Referential Segmentation.

[BibT_eX]

[DOI]

CoRR, June, 2025

VFaith: Do Large Multimodal Models Really Reason on Seen Images Rather than Previous Memories?

[BibT_eX]

[DOI]

CoRR, June, 2025

GThinker: Towards General Multimodal Reasoning via Cue-Guided Rethinking.

[BibT_eX]

[DOI]

CoRR, June, 2025

Understand, Think, and Answer: Advancing Visual Reasoning with Large Multimodal Models.

[BibT_eX]

[DOI]

CoRR, May, 2025

Vision-R1: Evolving Human-Free Alignment in Large Vision-Language Models via Vision-Guided Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, March, 2025

Griffon v2: Advancing Multimodal Perception with High-Resolution Scaling and Visual-Language Co-Referring.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

UIOrchestra: Generating High-Fidelity Code from UI Designs with a Multi-agent System.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2025, 2025

2024

Relation-Associated Instructions & Hallucination Benchmark.

[BibT_eX]

[DOI]

Dataset, July, 2024

Griffon-G: Bridging Vision-Language and Vision-Centric Tasks via Large Multimodal Models.

[BibT_eX]

[DOI]

CoRR, 2024

Griffon v2: Advancing Multimodal Perception with High-Resolution Scaling and Visual-Language Co-Referring.

[BibT_eX]

[DOI]

CoRR, 2024

Griffon: Spelling Out All Object Locations at Any Granularity with Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

2023

Mitigating Hallucination in Visual Language Models with Visual Supervision.

[BibT_eX]

[DOI]

CoRR, 2023

2021

Learning Region-Based Attention Network for Traffic Sign Recognition.

[BibT_eX]

[DOI]

Ke Zhou

Yufei Zhan

Dongmei Fu

Sensors, 2021

Yufei Zhan

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...