Bo Li

Orcid: 0000-0002-8447-0928

Affiliations:
  • Nanyang Technological University, S-Lab, Singapore
  • University of California, Berkeley, CA, USA (2019 - 2021)


According to our database1, Bo Li authored at least 44 papers between 2010 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
Otter: A Multi-Modal Model With In-Context Instruction Tuning.
IEEE Trans. Pattern Anal. Mach. Intell., September, 2025

Benchmarking and Analyzing Generative Data for Visual Recognition.
IEEE Trans. Pattern Anal. Mach. Intell., September, 2025

Has GPT-5 Achieved Spatial Intelligence? An Empirical Study.
CoRR, August, 2025

Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos.
CoRR, January, 2025

Long Context Transfer from Language to Vision.
Trans. Mach. Learn. Res., 2025

LLaVA-OneVision: Easy Visual Task Transfer.
Trans. Mach. Learn. Res., 2025

LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models.
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2025, Albuquerque, New Mexico, USA, April 29, 2025

LLaVA-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025


2024
MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs.
CoRR, 2024

Large Multi-modal Models Can Interpret Features in Large Multi-modal Models.
CoRR, 2024

Video Instruction Tuning With Synthetic Data.
CoRR, 2024

LLaVA-OneVision: Easy Visual Task Transfer.
CoRR, 2024

LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models.
CoRR, 2024

WorldQA: Multimodal World Knowledge in Videos through Long-Chain Reasoning.
CoRR, 2024

Octopus: Embodied Vision-Language Programmer from Environmental Feedback.
Proceedings of the Computer Vision - ECCV 2024, 2024

[inline-graphic not available: see fulltext] FunQA: Towards Surprising Video Comprehension.
Proceedings of the Computer Vision - ECCV 2024, 2024

MMBench: Is Your Multi-modal Model an All-Around Player?
Proceedings of the Computer Vision - ECCV 2024, 2024

2023
OtterHD: A High-Resolution Multi-modality Model.
CoRR, 2023

FunQA: Towards Surprising Video Comprehension.
CoRR, 2023

MIMIC-IT: Multi-Modal In-Context Instruction Tuning.
CoRR, 2023

Otter: A Multi-Modal Model with In-Context Instruction Tuning.
CoRR, 2023

Large Language Models are Visual Reasoning Coordinators.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Sparse Mixture-of-Experts are Domain Generalizable Learners.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Panoptic Video Scene Graph Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
A Review of Single-Source Deep Unsupervised Visual Domain Adaptation.
IEEE Trans. Neural Networks Learn. Syst., 2022

Your Autoregressive Generative Model Can be Better If You Treat It as an Energy-Based One.
CoRR, 2022

Sparse Fusion Mixture-of-Experts are Domain Generalizable Learners.
CoRR, 2022

Domain Generalization using Pretrained Models without Fine-tuning.
CoRR, 2022

Self-Supervised Pretraining Improves Self-Supervised Pretraining.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2022

OpenOOD: Benchmarking Generalized Out-of-Distribution Detection.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

AnimeRun: 2D Animation Visual Correspondence from Open Source 3D Movies.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Invariant Information Bottleneck for Domain Generalization.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
MADAN: Multi-source Adversarial Domain Aggregation Network for Domain Adaptation.
Int. J. Comput. Vis., 2021

Full-Cycle Energy Consumption Benchmark for Low-Carbon Computer Vision.
CoRR, 2021

Invariant Information Bottleneck for Domain Generalization.
CoRR, 2021

Energy-Based Open-World Uncertainty Modeling for Confidence Calibration.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Learning Invariant Representations and Risks for Semi-Supervised Domain Adaptation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

ePointDA: An End-to-End Simulation-to-Real Domain Adaptation Framework for LiDAR Point Cloud Segmentation.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
Rethinking Distributional Matching Based Domain Adaptation.
CoRR, 2020

MADAN: Multi-source Adversarial Domain Aggregation Network for Domain Adaptation.
CoRR, 2020

Multi-source Domain Adaptation in the Deep Learning Era: A Systematic Survey.
CoRR, 2020

2019
Multi-source Domain Adaptation for Semantic Segmentation.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

2010
Synchronization with timing recovery loop in UHF RFID reader receivers.
Proceedings of the 17th IEEE International Conference on Electronics, 2010


  Loading...