Baoqi Pei

Orcid: 0009-0007-7811-7961

According to our database1, Baoqi Pei authored at least 18 papers between 2024 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
Towards Multimodal Lifelong Understanding: A Dataset and Agentic Baseline.
CoRR, March, 2026

Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding.
Int. J. Comput. Vis., January, 2026

2025
Beyond Multiple Choice: Verifiable OpenQA for Robust Vision-Language RFT.
CoRR, November, 2025

Guiding Audio-Visual Question Answering with Collective Question Reasoning.
Int. J. Comput. Vis., October, 2025

EgoThinker: Unveiling Egocentric Reasoning with Spatio-Temporal CoT.
CoRR, October, 2025

Vinci: A Real-time Smart Assistant Based on Egocentric Vision-language Model for Portable Devices.
Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., September, 2025

Bridging Perspectives: A Survey on Cross-view Collaborative Intelligence with Egocentric-Exocentric Vision.
CoRR, June, 2025

An Egocentric Vision-Language Model based Portable Real-time Smart Assistant.
CoRR, March, 2025

EgoExoBench: A Benchmark for First- and Third-person View Video Understanding in MLLMs.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

EgoExo-Gen: Ego-centric Video Prediction by Watching Exo-centric Videos.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Modeling Fine-Grained Hand-Object Dynamics for Egocentric Video Representation Learning.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024
Vinci: A Real-time Embodied Smart Assistant based on Egocentric Vision-Language Model.
CoRR, 2024

EgoVideo: Exploring Egocentric Foundation Model and Downstream Adaptation.
CoRR, 2024

InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding.
CoRR, 2024

Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding.
CoRR, 2024

InternVideo2: Scaling Foundation Models for Multimodal Video Understanding.
Proceedings of the Computer Vision - ECCV 2024, 2024

EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024


  Loading...