Haotian Zhang

Orcid: 0000-0001-6809-0426

Affiliations:
  • Apple AI/ML, Cupertino, CA, USA
  • University of Washington, Department of Electrical and Computer Engineering, Seattle, WA, USA


According to our database1, Haotian Zhang authored at least 29 papers between 2019 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

MMEgo: Towards Building Egocentric Multimodal LLMs for Video QA.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Improve Vision Language Model Chain-of-thought Reasoning.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024
Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms.
CoRR, 2024

MM-Ego: Towards Building Egocentric Multimodal LLMs.
CoRR, 2024

Contrastive Localized Language-Image Pre-Training.
CoRR, 2024

MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning.
CoRR, 2024

Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models.
CoRR, 2024

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training.
CoRR, 2024

How Easy is It to Fool Your Multimodal LLMs? An Empirical Analysis on Deceptive Prompts.
CoRR, 2024

Empowering Unsupervised Domain Adaptation with Large-scale Pre-trained Vision-Language Models.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024

Ferret: Refer and Ground Anything Anywhere at Any Granularity.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs.
Proceedings of the Computer Vision - ECCV 2024, 2024


VeCLIP: Improving CLIP Training via Visual-Enriched Captions.
Proceedings of the Computer Vision - ECCV 2024, 2024

2023
From Scarcity to Efficiency: Improving CLIP Training via Visual-enriched Captions.
CoRR, 2023

2022
DIOR: DIstill Observations to Representations for Multi-Object Tracking and Segmentation.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision Workshops, 2022

GLIPv2: Unifying Localization and Vision-Language Understanding.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Grounded Language-Image Pre-training.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021
ROD2021 Challenge: A Summary for Radar Object Detection Challenge for Autonomous Driving Applications.
Proceedings of the ICMR '21: International Conference on Multimedia Retrieval, 2021

Monocular 3D Localization of Vehicles in Road Scenes.
Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, 2021

2020
Bundle Adjustment for Monocular Visual Odometry Based on Detections of Traffic Signs.
IEEE Trans. Veh. Technol., 2020

IA-MOT: Instance-Aware Multi-Object Tracking with Motion Consistency.
CoRR, 2020

2019
Eye in the Sky: Drone-Based Object Tracking and 3D Localization.
Proceedings of the 27th ACM International Conference on Multimedia, 2019

Exploit the Connectivity: Multi-Object Tracking with TrackletNet.
Proceedings of the 27th ACM International Conference on Multimedia, 2019

Bundle Adjustment for Monocular Visual Odometry Based on Detected Traffic Sign Features.
Proceedings of the 2019 IEEE International Conference on Image Processing, 2019



  Loading...