Boqiang Zhang

Orcid: 0000-0002-5314-4054

According to our database1, Boqiang Zhang authored at least 22 papers between 2023 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders.
CoRR, March, 2026

Boosting Segment Anything Model to Generalize Visually Non-Salient Scenarios.
IEEE Trans. Image Process., 2026

Multi-axial vibration fatigue optimization strategy based on the artificial intelligence algorithm.
Eng. Appl. Artif. Intell., 2026

2025
N3D-VLM: Native 3D Grounding Enables Accurate Spatial Reasoning in Vision-Language Models.
CoRR, December, 2025

MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources.
CoRR, September, 2025

What Is a Good Caption? A Comprehensive Visual Caption Benchmark for Evaluating Both Correctness and Coverage of MLLMs.
CoRR, February, 2025

VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding.
CoRR, January, 2025

A study on state estimation of distributed electric drive articulated vehicle.
J. Comput. Des. Eng., 2025

An urban change detection method based on multimodal data and knowledge graph technology.
Int. J. Digit. Earth, 2025

Chain of Ideas: Revolutionizing Research Via Novel Idea Development with LLM Agents.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2025, 2025

SynTab-LLaVA: Enhancing Multimodal Table Understanding with Decoupled Synthesis.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

ECBench: Can Multi-modal Foundation Models Understand the Egocentric World? A Holistic Embodied Cognition Benchmark.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024
Choose What You Need: Disentangled Representation Learning for Scene Text Recognition, Removal and Editing.
CoRR, 2024

Visual large language model for wheat disease diagnosis in the wild.
Comput. Electron. Agric., 2024

How Control Information Influences Multilingual Text Image Generation and Editing?
Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Focus on the Whole Character: Discriminative Character Modeling for Scene Text Recognition.
Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024

Self-Supervised Pre-training with Symmetric Superimposition Modeling for Scene Text Recognition.
Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024

Choose What You Need: Disentangled Representation Learning for Scene Text Recognition, Removal and Editing.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
Research on unmanned transfer vehicle path planning for raw grain warehousing.
J. Intell. Fuzzy Syst., October, 2023

Symmetrical Linguistic Feature Distillation with CLIP for Scene Text Recognition.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Linguistic More: Taking a Further Step toward Efficient and Accurate Scene Text Recognition.
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023


  Loading...