Zhenbo Luo

Orcid: 0009-0002-5836-0749

According to our database1, Zhenbo Luo authored at least 51 papers between 2014 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
Beyond Binary: Reframing GUI Critique as Continuous Semantic Alignment.
CoRR, May, 2026

Doc-V*:Coarse-to-Fine Interactive Visual Reasoning for Multi-Page Document VQA.
CoRR, April, 2026

OmniJigsaw: Enhancing Omni-Modal Reasoning via Modality-Orchestrated Reordering.
CoRR, April, 2026

Q-Mask: Query-driven Causal Masks for Text Anchoring in OCR-Oriented Vision-Language Models.
CoRR, April, 2026

Video Streaming Thinking: VideoLLMs Can Watch and Think Simultaneously.
CoRR, March, 2026

IMTBench: A Multi-Scenario Cross-Modal Collaborative Evaluation Benchmark for In-Image Machine Translation.
CoRR, March, 2026

PatchCue: Enhancing Vision-Language Model Reasoning with Patch-Based Visual Cues.
CoRR, March, 2026

EMO-R3: Reflective Reinforcement Learning for Emotional Reasoning in Multimodal Large Language Models.
CoRR, February, 2026

ThinkOmni: Lifting Textual Reasoning to Omni-modal Scenarios via Guidance Decoding.
CoRR, February, 2026

MSJoE: Jointly Evolving MLLM and Sampler for Efficient Long-Form Video Understanding.
CoRR, February, 2026

Visual Para-Thinker: Divide-and-Conquer Reasoning for Visual Comprehension.
CoRR, February, 2026

GeoFocus: Blending Efficient Global-to-Local Perception for Multimodal Geometry Problem-Solving.
CoRR, February, 2026

Video-OPD: Efficient Post-Training of Multimodal Large Language Models for Temporal Video Grounding via On-Policy Distillation.
CoRR, February, 2026

Restoring Exploration after Post-Training: Latent Exploration Decoding for Large Reasoning Models.
CoRR, February, 2026

GAIA: A Data Flywheel System for Training GUI Test-Time Scaling Critic Models.
CoRR, January, 2026

Federated Balanced Learning.
CoRR, January, 2026

Federated Joint Learning for Domain and Class Generalization.
CoRR, January, 2026

Think-Clip-Sample: Slow-Fast Frame Selection for Video Understanding.
CoRR, January, 2026

AutoLink: Autonomous Schema Exploration and Expansion for Scalable Schema Linking in Text-to-SQL at Scale.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

Cook and Clean Together: Teaching Embodied Agents for Parallel Task Execution.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025
Xiaomi MiMo-VL-Miloco Technical Report.
CoRR, December, 2025

TimeViper: A Hybrid Mamba-Transformer Vision-Language Model for Efficient Long Video Understanding.
CoRR, November, 2025

REVISOR: Beyond Textual Reflection, Towards Multimodal Introspective Reasoning in Long-Form Video Understanding.
CoRR, November, 2025

HyperClick: Advancing Reliable GUI Grounding via Uncertainty Calibration.
CoRR, October, 2025

Thinking in cocktail party: Chain-of-Thought and reinforcement learning for target speaker automatic speech recognition.
CoRR, September, 2025

BTL-UI: Blink-Think-Link Reasoning Model for GUI Agent.
CoRR, September, 2025

Omni-CLST: Error-aware Curriculum Learning with guided Selective chain-of-Thought for audio question answering.
CoRR, September, 2025

Shuffle-R1: Efficient RL framework for Multimodal Large Language Models via Data-centric Dynamic Shuffle.
CoRR, August, 2025

MiMo-VL Technical Report.
CoRR, June, 2025

Think Silently, Think Fast: Dynamic Latent Compression of LLM Reasoning Chains.
CoRR, May, 2025

Time-R1: Post-Training Large Vision Language Model for Temporal Video Grounding.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

Q-Frame: Query-Aware Frame Selection and Multi-Resolution Adaptation for Video-LLMs.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Let Your Car Listen to Your Respiration Contactlessly with Ubiquitous Acoustic Signals.
Proceedings of the Companion of the 2025 ACM International Joint Conference on Pervasive and Ubiquitous Computing, 2025

2020
Realtime multi-scale scene text detection with scale-based region proposal network.
Pattern Recognit., 2020

VOC-ReID: Vehicle Re-identification based on Vehicle-Orientation-Camera.
CoRR, 2020

Multi Receptive Field Network for Semantic Segmentation.
Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2020

VOC-RelD: Vehicle Re-identification based on Vehicle-Orientation-Camera.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019
Arbitrary Shape Scene Text Detection With Adaptive Text Region Representation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Structured Knowledge Distillation for Semantic Segmentation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018
Auto-painter: Cartoon image generation from sketch by using conditional Wasserstein generative adversarial networks.
Neurocomputing, 2018

R<sup>2</sup> CNN: Rotational Region CNN for Arbitrarily-Oriented Scene Text Detection.
Proceedings of the 24th International Conference on Pattern Recognition, 2018

Monocular Relative Depth Perception With Web Stereo Data Supervision.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

2017
Auto-painter: Cartoon Image Generation from Sketch by Using Conditional Generative Adversarial Networks.
CoRR, 2017

R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection.
CoRR, 2017

Deep Residual Text Detection Network for Scene Text.
Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition, 2017

End-to-End Scene Text Recognition in Videos Based on Multi Frame Tracking.
Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition, 2017

ICDAR2017 Robust Reading Challenge on Multi-Lingual Scene Text Detection and Script Identification - RRC-MLT.
Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition, 2017

Fast Genre Classification of Web Images Using Global and Local Features.
Proceedings of the 4th IAPR Asian Conference on Pattern Recognition, 2017

2016
Unsupervised Adaptation of Neural Networks for Chinese Handwriting Recognition.
Proceedings of the 15th International Conference on Frontiers in Handwriting Recognition, 2016

Random Projected Convolutional Feature for Scene Text Recognition.
Proceedings of the 15th International Conference on Frontiers in Handwriting Recognition, 2016

2014
Enhanced Non-linear Features for On-line Handwriting Recognition Using Deep Learning.
Proceedings of the Neural Information Processing - 21st International Conference, 2014


  Loading...