Bozhou Li

Orcid: 0009-0001-7519-5733

According to our database1, Bozhou Li authored at least 21 papers between 2021 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
LongAV-Compass: Towards Unified Evaluation of Minute-Scale Audio-Visual Generation Across T2AV, I2AV, and V2AV.
CoRR, May, 2026

LatentOmni: Rethinking Omni-Modal Understanding via Unified Audio-Visual Latent Reasoning.
CoRR, May, 2026

Artifact-Bench: Evaluating MLLMs on Detecting and Assessing the Artifacts of AI-Generated Videos.
CoRR, May, 2026

OpenWorldLib: A Unified Codebase and Definition of Advanced World Models.
CoRR, April, 2026

OmniSIFT: Modality-Asymmetric Token Compression for Efficient Omni-modal Large Language Models.
CoRR, February, 2026

Semantic Routing: Exploring Multi-Layer LLM Feature Weighting for Diffusion Transformers.
CoRR, February, 2026

Research on World Models Is Not Merely Injecting World Knowledge into Specific Tasks.
CoRR, February, 2026

DiaDem: Advancing Dialogue Descriptions in Audiovisual Video Captioning for Multimodal Large Language Models.
CoRR, January, 2026

2025
GRAN-TED: Generating Robust, Aligned, and Nuanced Text Embedding for Diffusion Models.
CoRR, December, 2025

The Unseen Bias: How Norm Discrepancy in Pre-Norm MLLMs Leads to Visual Information Loss.
CoRR, December, 2025

AVoCaDO: An Audiovisual Video Captioner Driven by Temporal Orchestration.
CoRR, October, 2025

RealUnify: Do Unified Models Truly Benefit from Unification? A Comprehensive Benchmark.
CoRR, September, 2025

Text2VectorSQL: Bridging Text-to-SQL and Vector Search for Unified Natural Language Queries.
CoRR, June, 2025

ID-Align: RoPE-Conscious Position Remapping for Dynamic High-Resolution Adaptation in Vision-Language Models.
CoRR, May, 2025

The First Prompt Counts the Most! An Evaluation of Large Language Models on Iterative Example-Based Code Generation.
Proc. ACM Softw. Eng., 2025

SynthVLM: Towards High-Quality and Efficient Synthesis of Image-Caption Datasets for Vision-Language Models.
Proceedings of the 33rd ACM International Conference on Multimedia, 2025

An Adaptive Attention-Aware Method for Occluded Multi-Pedestrian Tracking.
Proceedings of the 28th International Conference on Computer Supported Cooperative Work in Design, 2025

2024
Gradual Learning: Optimizing Fine-Tuning with Partially Mastered Knowledge in Large Language Models.
CoRR, 2024

Are Bigger Encoders Always Better in Vision Large Models?
CoRR, 2024

A Survey of Multimodal Large Language Model from A Data-centric Perspective.
CoRR, 2024

2021
Cluster-Based Distribution Alignment For Generalizable Person Re-Identification.
Proceedings of the 2021 IEEE International Conference on Multimedia & Expo Workshops, 2021


  Loading...