Zhenbo Luo

Orcid: 0009-0002-5836-0749

According to our database¹, Zhenbo Luo authored at least 51 papers between 2014 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

Beyond Binary: Reframing GUI Critique as Continuous Semantic Alignment.

[BibT_eX]

[DOI]

CoRR, May, 2026

Doc-V*:Coarse-to-Fine Interactive Visual Reasoning for Multi-Page Document VQA.

[BibT_eX]

[DOI]

CoRR, April, 2026

OmniJigsaw: Enhancing Omni-Modal Reasoning via Modality-Orchestrated Reordering.

[BibT_eX]

[DOI]

CoRR, April, 2026

Q-Mask: Query-driven Causal Masks for Text Anchoring in OCR-Oriented Vision-Language Models.

[BibT_eX]

[DOI]

CoRR, April, 2026

Video Streaming Thinking: VideoLLMs Can Watch and Think Simultaneously.

[BibT_eX]

[DOI]

CoRR, March, 2026

IMTBench: A Multi-Scenario Cross-Modal Collaborative Evaluation Benchmark for In-Image Machine Translation.

[BibT_eX]

[DOI]

CoRR, March, 2026

PatchCue: Enhancing Vision-Language Model Reasoning with Patch-Based Visual Cues.

[BibT_eX]

[DOI]

CoRR, March, 2026

EMO-R3: Reflective Reinforcement Learning for Emotional Reasoning in Multimodal Large Language Models.

[BibT_eX]

[DOI]

CoRR, February, 2026

ThinkOmni: Lifting Textual Reasoning to Omni-modal Scenarios via Guidance Decoding.

[BibT_eX]

[DOI]

CoRR, February, 2026

MSJoE: Jointly Evolving MLLM and Sampler for Efficient Long-Form Video Understanding.

[BibT_eX]

[DOI]

CoRR, February, 2026

Visual Para-Thinker: Divide-and-Conquer Reasoning for Visual Comprehension.

[BibT_eX]

[DOI]

CoRR, February, 2026

GeoFocus: Blending Efficient Global-to-Local Perception for Multimodal Geometry Problem-Solving.

[BibT_eX]

[DOI]

CoRR, February, 2026

Video-OPD: Efficient Post-Training of Multimodal Large Language Models for Temporal Video Grounding via On-Policy Distillation.

[BibT_eX]

[DOI]

CoRR, February, 2026

Restoring Exploration after Post-Training: Latent Exploration Decoding for Large Reasoning Models.

[BibT_eX]

[DOI]

Wenhui Tan

Fiorenzo Parascandolo

CoRR, February, 2026

GAIA: A Data Flywheel System for Training GUI Test-Time Scaling Critic Models.

[BibT_eX]

[DOI]

CoRR, January, 2026

Federated Balanced Learning.

[BibT_eX]

[DOI]

CoRR, January, 2026

Federated Joint Learning for Domain and Class Generalization.

[BibT_eX]

[DOI]

CoRR, January, 2026

Think-Clip-Sample: Slow-Fast Frame Selection for Video Understanding.

[BibT_eX]

[DOI]

CoRR, January, 2026

AutoLink: Autonomous Schema Exploration and Expansion for Scalable Schema Linking in Text-to-SQL at Scale.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

Cook and Clean Together: Teaching Embodied Agents for Parallel Task Execution.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025

Xiaomi MiMo-VL-Miloco Technical Report.

[BibT_eX]

[DOI]

CoRR, December, 2025

TimeViper: A Hybrid Mamba-Transformer Vision-Language Model for Efficient Long Video Understanding.

[BibT_eX]

[DOI]

CoRR, November, 2025

REVISOR: Beyond Textual Reflection, Towards Multimodal Introspective Reasoning in Long-Form Video Understanding.

[BibT_eX]

[DOI]

CoRR, November, 2025

HyperClick: Advancing Reliable GUI Grounding via Uncertainty Calibration.

[BibT_eX]

[DOI]

CoRR, October, 2025

Thinking in cocktail party: Chain-of-Thought and reinforcement learning for target speaker automatic speech recognition.

[BibT_eX]

[DOI]

CoRR, September, 2025

BTL-UI: Blink-Think-Link Reasoning Model for GUI Agent.

[BibT_eX]

[DOI]

CoRR, September, 2025

Omni-CLST: Error-aware Curriculum Learning with guided Selective chain-of-Thought for audio question answering.

[BibT_eX]

[DOI]

CoRR, September, 2025

Shuffle-R1: Efficient RL framework for Multimodal Large Language Models via Data-centric Dynamic Shuffle.

[BibT_eX]

[DOI]

CoRR, August, 2025

MiMo-VL Technical Report.

[BibT_eX]

[DOI]

CoRR, June, 2025

Think Silently, Think Fast: Dynamic Latent Compression of LLM Reasoning Chains.

[BibT_eX]

[DOI]

CoRR, May, 2025

Time-R1: Post-Training Large Vision Language Model for Temporal Video Grounding.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

Q-Frame: Query-Aware Frame Selection and Multi-Resolution Adaptation for Video-LLMs.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Let Your Car Listen to Your Respiration Contactlessly with Ubiquitous Acoustic Signals.

[BibT_eX]

[DOI]

Proceedings of the Companion of the 2025 ACM International Joint Conference on Pervasive and Ubiquitous Computing, 2025

2020

Realtime multi-scale scene text detection with scale-based region proposal network.

[BibT_eX]

[DOI]

Pattern Recognit., 2020

VOC-ReID: Vehicle Re-identification based on Vehicle-Orientation-Camera.

[BibT_eX]

[DOI]

CoRR, 2020

Multi Receptive Field Network for Semantic Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2020

VOC-RelD: Vehicle Re-identification based on Vehicle-Orientation-Camera.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019

Arbitrary Shape Scene Text Detection With Adaptive Text Region Representation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Structured Knowledge Distillation for Semantic Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018

Auto-painter: Cartoon image generation from sketch by using conditional Wasserstein generative adversarial networks.

[BibT_eX]

[DOI]

Neurocomputing, 2018

R<sup>2</sup> CNN: Rotational Region CNN for Arbitrarily-Oriented Scene Text Detection.

[BibT_eX]

[DOI]

Proceedings of the 24th International Conference on Pattern Recognition, 2018

Monocular Relative Depth Perception With Web Stereo Data Supervision.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

2017

Auto-painter: Cartoon Image Generation from Sketch by Using Conditional Generative Adversarial Networks.

[BibT_eX]

[DOI]

CoRR, 2017

R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection.

[BibT_eX]

[DOI]

CoRR, 2017

Deep Residual Text Detection Network for Scene Text.

[BibT_eX]

[DOI]

Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition, 2017

End-to-End Scene Text Recognition in Videos Based on Multi Frame Tracking.

[BibT_eX]

[DOI]

Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition, 2017

ICDAR2017 Robust Reading Challenge on Multi-Lingual Scene Text Detection and Script Identification - RRC-MLT.

[BibT_eX]

[DOI]

Muhammad Muzzamil Luqman

Jean-Christophe Burie

Cheng-Lin Liu

Jean-Marc Ogier

Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition, 2017

Fast Genre Classification of Web Images Using Global and Local Features.

[BibT_eX]

[DOI]

Proceedings of the 4th IAPR Asian Conference on Pattern Recognition, 2017

2016

Unsupervised Adaptation of Neural Networks for Chinese Handwriting Recognition.

[BibT_eX]

[DOI]

Proceedings of the 15th International Conference on Frontiers in Handwriting Recognition, 2016

Random Projected Convolutional Feature for Scene Text Recognition.

[BibT_eX]

[DOI]

Proceedings of the 15th International Conference on Frontiers in Handwriting Recognition, 2016

2014

Enhanced Non-linear Features for On-line Handwriting Recognition Using Deep Learning.

[BibT_eX]

[DOI]

Proceedings of the Neural Information Processing - 21st International Conference, 2014

Zhenbo Luo

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...