Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

LLaVA-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models.

[BibT_eX]

[DOI]

Feng Li

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

EgoLife: Towards Egocentric Life Assistant.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024

Video Instruction Tuning With Synthetic Data.

[BibT_eX]

[DOI]

CoRR, 2024

LLaVA-OneVision: Easy Visual Task Transfer.

[BibT_eX]

[DOI]

CoRR, 2024

LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models.

[BibT_eX]

[DOI]

CoRR, 2024

WorldQA: Multimodal World Knowledge in Videos through Long-Chain Reasoning.

[BibT_eX]

[DOI]

Christopher Arif Setiadharma

Jingkang Yang

Ziwei Liu

CoRR, 2024

3D Point Cloud Pre-Training with Knowledge Distilled from 2D Images.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Multimedia and Expo, 2024

Exploring the Integration of Light and Music in Artistic Furniture Design: A Study in Interaction Design Informed by Children's Climbing Behavior.

[BibT_eX]

[DOI]

Minyu Li

Yuanhan Zhang

Ao Qi

Proceedings of the Human-Computer Interaction, 2024

Octopus: Embodied Vision-Language Programmer from Environmental Feedback.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

[inline-graphic not available: see fulltext] FunQA: Towards Surprising Video Comprehension.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

MMBench: Is Your Multi-modal Model an All-Around Player?

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

VBench: Comprehensive Benchmark Suite for Video Generative Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

OtterHD: A High-Resolution Multi-modality Model.

[BibT_eX]

[DOI]

CoRR, 2023

Multimodal Foundation Models for Zero-shot Animal Species Recognition in Camera Trap Images.

[BibT_eX]

[DOI]

CoRR, 2023

FunQA: Towards Surprising Video Comprehension.

[BibT_eX]

[DOI]

CoRR, 2023

MIMIC-IT: Multi-Modal In-Context Instruction Tuning.

[BibT_eX]

[DOI]

CoRR, 2023

Learning without Forgetting for Vision-Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

Latent Distribution Adjusting for Face Anti-Spoofing.

[BibT_eX]

[DOI]

CoRR, 2023

Otter: A Multi-Modal Model with In-Context Instruction Tuning.

[BibT_eX]

[DOI]

CoRR, 2023

What Makes Good Examples for Visual In-Context Learning?

[BibT_eX]

[DOI]

Yuanhan Zhang

Kaiyang Zhou

Ziwei Liu

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

2022

3D Point Cloud Pre-training with Knowledge Distillation from 2D Images.

[BibT_eX]

[DOI]

CoRR, 2022

On-Device Domain Generalization.

[BibT_eX]

[DOI]

CoRR, 2022

Benchmarking Omni-Vision Representation Through the Lens of Visual Realms.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

2021

CelebA-Spoof Challenge 2020 on Face Anti-Spoofing: Methods and Results.

[BibT_eX]

[DOI]

CoRR, 2021

2020

CelebA-Spoof: Large-Scale Face Anti-spoofing Dataset with Rich Annotations.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

2019

Makeup based on segmentation and local transfer.

[BibT_eX]

[DOI]

Proceedings of the 6th International Conference on Behavioral, 2019

Yuanhan Zhang

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...