Gen Luo

Orcid: 0000-0001-5334-1843

According to our database¹, Gen Luo authored at least 68 papers between 2016 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

Towards Parameter-Efficient Network Pruning with Re-Parameterized Adapter.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., April, 2026

ScaleEdit-12M: Scaling Open-Source Image Editing Data Generation via Multi-Agent Framework.

[BibT_eX]

[DOI]

CoRR, March, 2026

InternVL-U: Democratizing Unified Multimodal Models for Understanding, Reasoning, Generation and Editing.

[BibT_eX]

[DOI]

CoRR, March, 2026

ACE-Brain-0: Spatial Intelligence as a Shared Scaffold for Universal Embodiments.

[BibT_eX]

[DOI]

CoRR, March, 2026

Out of the Memory Barrier: A Highly Memory Efficient Training System for LLMs with Million-Token Contexts.

[BibT_eX]

[DOI]

CoRR, February, 2026

Domain incremental learning for object detection.

[BibT_eX]

[DOI]

Pattern Recognit., 2026

Earth-Adapter: Bridge the Geospatial Domain Gaps with a Frequency-Guided Mixture of Adapters.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025

Omni-Referring Image Segmentation.

[BibT_eX]

[DOI]

CoRR, December, 2025

Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., November, 2025

ViCO: A Training Strategy towards Semantic Aware Dynamic High-Resolution.

[BibT_eX]

[DOI]

CoRR, October, 2025

MetaCaptioner: Towards Generalist Visual Captioning with Open-source Suites.

[BibT_eX]

[DOI]

CoRR, October, 2025

Vlaser: Vision-Language-Action Model with Synergistic Embodied Reasoning.

[BibT_eX]

[DOI]

CoRR, October, 2025

MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization.

[BibT_eX]

[DOI]

CoRR, October, 2025

Lumina-DiMOO: An Omni Diffusion Large Language Model for Multi-Modal Generation and Understanding.

[BibT_eX]

[DOI]

CoRR, October, 2025

Sequential Diffusion Language Models.

[BibT_eX]

[DOI]

CoRR, September, 2025

ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data.

[BibT_eX]

[DOI]

CoRR, September, 2025

GenExam: A Multidisciplinary Text-to-Image Exam.

[BibT_eX]

[DOI]

CoRR, September, 2025

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency.

[BibT_eX]

[DOI]

CoRR, August, 2025

MoIL: Momentum Imitation Learning for Efficient Vision-Language Adaptation.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., July, 2025

Mono-InternVL-1.5: Towards Cheaper and Faster Monolithic Multimodal Large Language Models.

[BibT_eX]

[DOI]

CoRR, July, 2025

Breaking Bad Molecules: Are MLLMs Ready for Structure-Level Molecular Detoxification?

[BibT_eX]

[DOI]

CoRR, June, 2025

SpaCE-10: A Comprehensive Benchmark for Multimodal Large Language Models in Compositional Spatial Intelligence.

[BibT_eX]

[DOI]

CoRR, June, 2025

Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spaces.

[BibT_eX]

[DOI]

CoRR, June, 2025

Earth-Adapter: Bridge the Geospatial Domain Gaps with Mixture of Frequency Adaptation.

[BibT_eX]

[DOI]

CoRR, April, 2025

DriveMLM: aligning multi-modal large language models with behavioral planning states for autonomous driving.

[BibT_eX]

[DOI]

Vis. Intell., 2025

Joint Method for XPIC, Equalization and IQ Imbalance Compensation in Space-Air-Ground Broadband Dual-Polarized Systems.

[BibT_eX]

[DOI]

Proceedings of the 102nd IEEE Vehicular Technology Conference, 2025

NaViL: Rethinking Scaling Properties of Native Multimodal Large Language Models under Data Constraints.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

Spotlight Attention: Towards Efficient LLM Generation via Non-linear Hashing-based KV Cache Retrieval.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

Feast Your Eyes: Mixture-of-Resolution Adaptation for Multimodal Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

γ-MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

FlashSloth : Lightning Multimodal Large Language Models via Embedded Visual Compression.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

WeakMCN: Multi-task Collaborative Network for Weakly Supervised Referring Expression Comprehension and Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

DViN: Dynamic Visual Routing Network for Weakly Supervised Referring Expression Comprehension.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Training Long-Context LLMs Efficiently via Chunk-wise Optimization.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2025

2024

Towards Language-Guided Visual Recognition via Dynamic Convolutions.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., January, 2024

A Survivor in the Era of Large-Scale Pretraining: An Empirical Study of One-Stage Referring Expression Comprehension.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2024

ChatRex: Taming Multimodal LLM for Joint Perception and Understanding.

[BibT_eX]

[DOI]

CoRR, 2024

Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training.

[BibT_eX]

[DOI]

CoRR, 2024

ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

Routing Experts: Learning to Route Dynamic Experts in Multi-modal Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

RG-SAN: Rule-Guided Spatial Awareness Network for End-to-End 3D Referring Expression Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

3D-GRES: Generalized 3D Referring Expression Segmentation.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Deep Instruction Tuning for Segment Anything Model.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

QueryMatch: A Query-based Contrastive Learning Framework for Weakly Supervised Visual Grounding.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Fast Text-to-3D-Aware Face Generation and Manipulation via Direct Cross-modal Mapping and Geometric Regularization.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

CaM: Cache Merging for Memory-efficient LLMs Inference.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Towards Omni-supervised Referring Expression Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Multimedia and Expo, 2024

APL: Anchor-Based Prompt Learning for One-Stage Weakly Supervised Referring Expression Comprehension.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

3D-STMN: Dependency-Driven Superpoint-Text Matching Network for End-to-End 3D Referring Expression Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

A Real-Time Global Inference Network for One-Stage Referring Expression Comprehension.

[BibT_eX]

[DOI]

IEEE Trans. Neural Networks Learn. Syst., 2023

Multi-Branch Distance-Sensitive Self-Attention Network for Image Captioning.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2023

Towards End-to-end Semi-supervised Learning for One-stage Object Detection.

[BibT_eX]

[DOI]

CoRR, 2023

Towards Efficient Visual Adaption via Structural Re-parameterization.

[BibT_eX]

[DOI]

CoRR, 2023

Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

RefTeacher: A Strong Baseline for Semi-Supervised Referring Expression Comprehension.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

RefCLIP: A Universal Teacher for Weakly Supervised Referring Expression Comprehension.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

Towards Lightweight Transformer Via Group-Wise Transformation for Vision-and-Language Tasks.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2022

What Goes beyond Multi-modal Fusion in One-stage Referring Expression Comprehension: An Empirical Study.

[BibT_eX]

[DOI]

CoRR, 2022

SeqTR: A Simple Yet Universal Network for Visual Grounding.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

Active Teacher for Semi-Supervised Object Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021

Towards Language-guided Visual Recognition via Dynamic Convolutions.

[BibT_eX]

[DOI]

CoRR, 2021

Improving Image Captioning by Leveraging Intra- and Inter-layer Global Representation in Transformer Network.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020

K-armed Bandit based Multi-Modal Network Architecture Search for Visual Question Answering.

[BibT_eX]

[DOI]

Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Cascade Grouped Attention Network for Referring Expression Segmentation.

[BibT_eX]

[DOI]

Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Multi-Task Collaborative Network for Joint Referring Expression Comprehension and Segmentation.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2016

No-reference image sharpness Algorithm based on gradient shape.

[BibT_eX]

[DOI]

Proceedings of the 9th International Congress on Image and Signal Processing, 2016

Gen Luo

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...