Hao Tian
Orcid: 0009-0003-0941-3629Affiliations:
- SenseTime, Beijing, China
- University of Heidelberg, Germany (until 2022)
According to our database1,
Hao Tian
authored at least 22 papers
between 2021 and 2025.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on orcid.org
On csauthors.net:
Bibliography
2025
MiMo: Unlocking the Reasoning Potential of Language Model - From Pretraining to Posttraining.
CoRR, May, 2025
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models.
CoRR, April, 2025
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing.
CoRR, March, 2025
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025
2024
Delving Into the Devils of Bird's-Eye-View Perception: A Review, Evaluation and Recipe.
IEEE Trans. Pattern Anal. Mach. Intell., April, 2024
Mini-InternVL: a flexible-transfer pocket multi-modal model with 5% parameters and 90% performance.
Vis. Intell., 2024
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling.
CoRR, 2024
Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance.
CoRR, 2024
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models.
CoRR, 2024
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text.
CoRR, 2024
MMInstruct: a high-quality multi-modal instruction tuning dataset with extensive diversity.
Sci. China Inf. Sci., 2024
How far are we to GPT-4V? Closing the gap to commercial multimodal models with open-source suites.
Sci. China Inf. Sci., 2024
2023
DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving.
CoRR, 2023
InstructSeq: Unifying Vision Tasks with Instruction-conditioned Multi-modal Sequence Generation.
CoRR, 2023
Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory.
CoRR, 2023
BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
2022
Delving into the Devils of Bird's-eye-view Perception: A Review, Evaluation and Recipe.
CoRR, 2022
2021
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021