Yufei Zhan

Orcid: 0009-0002-1377-8519

According to our database1, Yufei Zhan authored at least 20 papers between 2021 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
TraceVision: Trajectory-Aware Vision-Language Model for Human-Like Spatial Understanding.
CoRR, February, 2026

Seg-LLaVA: Empowering pixel-level understanding with large vision language model.
Pattern Recognit., 2026

REFORMamba-Unet: Hierarchical gated refocusing convolution and Mamba-Based u-net for PECTPA 3D medical image.
Biomed. Signal Process. Control., 2026

Baseline Method of the Foundation Model Challenge for Ultrasound Image Analysis.
Proceedings of the 23rd IEEE International Symposium on Biomedical Imaging, 2026

GeM-VG: Towards Generalized Multi-image Visual Grounding with Multimodal Large Language Models.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025
Unleashing Perception-Time Scaling to Multimodal Reasoning Models.
CoRR, October, 2025

Seeing is Believing? Mitigating OCR Hallucinations in Multimodal Large Language Models.
CoRR, June, 2025

FOCUS: Unified Vision-Language Modeling for Interactive Editing Driven by Referential Segmentation.
CoRR, June, 2025

VFaith: Do Large Multimodal Models Really Reason on Seen Images Rather than Previous Memories?
CoRR, June, 2025

GThinker: Towards General Multimodal Reasoning via Cue-Guided Rethinking.
CoRR, June, 2025

Understand, Think, and Answer: Advancing Visual Reasoning with Large Multimodal Models.
CoRR, May, 2025

Vision-R1: Evolving Human-Free Alignment in Large Vision-Language Models via Vision-Guided Reinforcement Learning.
CoRR, March, 2025

Griffon v2: Advancing Multimodal Perception with High-Resolution Scaling and Visual-Language Co-Referring.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

UIOrchestra: Generating High-Fidelity Code from UI Designs with a Multi-agent System.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2025, 2025

2024
Relation-Associated Instructions & Hallucination Benchmark.
Dataset, July, 2024

Griffon-G: Bridging Vision-Language and Vision-Centric Tasks via Large Multimodal Models.
CoRR, 2024

Griffon v2: Advancing Multimodal Perception with High-Resolution Scaling and Visual-Language Co-Referring.
CoRR, 2024

Griffon: Spelling Out All Object Locations at Any Granularity with Large Language Models.
Proceedings of the Computer Vision - ECCV 2024, 2024

2023
Mitigating Hallucination in Visual Language Models with Visual Supervision.
CoRR, 2023

2021
Learning Region-Based Attention Network for Traffic Sign Recognition.
Sensors, 2021


  Loading...