Yuanhan Zhang

Orcid: 0000-0002-9063-7886

According to our database1, Yuanhan Zhang authored at least 39 papers between 2019 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
Otter: A Multi-Modal Model With In-Context Instruction Tuning.
IEEE Trans. Pattern Anal. Mach. Intell., September, 2025

Bamboo: Building Mega-Scale Vision Dataset Continually with Human-Machine Synergy.
Int. J. Comput. Vis., August, 2025

Neural Prompt Search.
IEEE Trans. Pattern Anal. Mach. Intell., July, 2025

Towards Video Thinking Test: A Holistic Benchmark for Advanced Video Reasoning and Understanding.
CoRR, July, 2025

Learning Without Forgetting for Vision-Language Models.
IEEE Trans. Pattern Anal. Mach. Intell., June, 2025

VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness.
CoRR, March, 2025

Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos.
CoRR, January, 2025

Long Context Transfer from Language to Vision.
Trans. Mach. Learn. Res., 2025

LLaVA-Video: Video Instruction Tuning With Synthetic Data.
Trans. Mach. Learn. Res., 2025

LLaVA-OneVision: Easy Visual Task Transfer.
Trans. Mach. Learn. Res., 2025

Robust face anti-spoofing with Dual Probabilistic Modeling.
Pattern Recognit., 2025

LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models.
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2025, Albuquerque, New Mexico, USA, April 29, 2025

Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward.
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

LLaVA-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025


2024
Video Instruction Tuning With Synthetic Data.
CoRR, 2024

LLaVA-OneVision: Easy Visual Task Transfer.
CoRR, 2024

LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models.
CoRR, 2024

WorldQA: Multimodal World Knowledge in Videos through Long-Chain Reasoning.
CoRR, 2024

3D Point Cloud Pre-Training with Knowledge Distilled from 2D Images.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2024

Exploring the Integration of Light and Music in Artistic Furniture Design: A Study in Interaction Design Informed by Children's Climbing Behavior.
Proceedings of the Human-Computer Interaction, 2024

Octopus: Embodied Vision-Language Programmer from Environmental Feedback.
Proceedings of the Computer Vision - ECCV 2024, 2024

[inline-graphic not available: see fulltext] FunQA: Towards Surprising Video Comprehension.
Proceedings of the Computer Vision - ECCV 2024, 2024

MMBench: Is Your Multi-modal Model an All-Around Player?
Proceedings of the Computer Vision - ECCV 2024, 2024

VBench: Comprehensive Benchmark Suite for Video Generative Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
OtterHD: A High-Resolution Multi-modality Model.
CoRR, 2023

Multimodal Foundation Models for Zero-shot Animal Species Recognition in Camera Trap Images.
CoRR, 2023

FunQA: Towards Surprising Video Comprehension.
CoRR, 2023

MIMIC-IT: Multi-Modal In-Context Instruction Tuning.
CoRR, 2023

Learning without Forgetting for Vision-Language Models.
CoRR, 2023

Latent Distribution Adjusting for Face Anti-Spoofing.
CoRR, 2023

Otter: A Multi-Modal Model with In-Context Instruction Tuning.
CoRR, 2023

What Makes Good Examples for Visual In-Context Learning?
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

2022
3D Point Cloud Pre-training with Knowledge Distillation from 2D Images.
CoRR, 2022

On-Device Domain Generalization.
CoRR, 2022

Benchmarking Omni-Vision Representation Through the Lens of Visual Realms.
Proceedings of the Computer Vision - ECCV 2022, 2022

2021
CelebA-Spoof Challenge 2020 on Face Anti-Spoofing: Methods and Results.
CoRR, 2021

2020
CelebA-Spoof: Large-Scale Face Anti-spoofing Dataset with Rich Annotations.
Proceedings of the Computer Vision - ECCV 2020, 2020

2019
Makeup based on segmentation and local transfer.
Proceedings of the 6th International Conference on Behavioral, 2019


  Loading...