David Fan

Orcid: 0000-0002-9217-5451

Affiliations:
  • Meta Fundamental AI Research (FAIR), New York, NY, USA
  • Amazon Prime Video, Seattle, WA, USA (former)
  • Princeton University, Vision and Learning Lab, Princeton, NJ, USA (former)


According to our database1, David Fan authored at least 18 papers between 2020 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

Online presence:

On csauthors.net:

Bibliography

2026
Beyond Language Modeling: An Exploration of Multimodal Pretraining.
CoRR, March, 2026

A Lightweight Library for Energy-Based Joint-Embedding Predictive Architectures.
CoRR, February, 2026

2025
World Models Can Leverage Human Videos for Dexterous Manipulation.
CoRR, December, 2025

Learning to See Before Seeing: Demystifying LLM Visual Priors from Language Pre-training.
CoRR, September, 2025

V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning.
CoRR, June, 2025

GEXIA: Granularity Expansion and Iterative Approximation for Scalable Multi-Grained Video-Language Learning.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2025

Now you see Me: Context-Aware Automatic Audio Description.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2025

MetaMorph: Multimodal Understanding and Generation via Instruction Tuning.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Scaling Language-Free Visual Representation Learning.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

2024
NowYouSee Me: Context-Aware Automatic Audio Description.
CoRR, 2024

Video Token Merging for Long-form Video Understanding.
CoRR, 2024

Video Token Merging for Long Video Understanding.
Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Text-Guided Video Masked Autoencoder.
Proceedings of the Computer Vision - ECCV 2024, 2024

2023
Nearest-Neighbor Inter-Intra Contrastive Learning from Unlabeled Videos.
CoRR, 2023

MEGA: Multimodal Alignment Aggregation and Distillation For Cinematic Video Segmentation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Motion-Guided Masking for Spatiotemporal Representation Learning.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

2021
Shot Contrastive Self-Supervised Learning for Scene Boundary Detection.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020
OASIS: A Large-Scale Dataset for Single Image 3D in the Wild.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020


  Loading...