Qinghong Lin

Orcid: 0000-0003-2568-2346

According to our database1, Qinghong Lin authored at least 65 papers between 2020 and 2026.

Collaborative distances:

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
SceneCode: Executable World Programs for Editable Indoor Scenes with Articulated Objects.
CoRR, May, 2026

AI for Auto-Research: Roadmap & User Guide.
CoRR, May, 2026

Soap2Soap: Long Cinematic Video Remaking via Multi-Agent Collaboration.
CoRR, May, 2026

Checkup2Action: A Multimodal Clinical Check-up Report Dataset for Patient-Oriented Action Card Generation.
CoRR, May, 2026

Reasoning Compression with Mixed-Policy Distillation.
CoRR, May, 2026

Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond.
CoRR, April, 2026

GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents.
CoRR, April, 2026

GUIDE: A Benchmark for Understanding and Assisting Users in Open-Ended GUI Tasks.
CoRR, March, 2026

CUA-Suite: Massive Human-annotated Video Demonstrations for Computer-Use Agents.
CoRR, March, 2026

Code2World: A GUI World Model via Renderable Code Generation.
CoRR, February, 2026

FocusUI: Efficient UI Grounding via Position-Preserving Visual Token Selection.
CoRR, January, 2026

A Survey on Foundations and Frontiers of Multimodal Agentic Frameworks: Techniques and Applications.
Trans. Mach. Learn. Res., 2026

2025
ShowUI-<i>π</i>: Flow-based Generative Models as GUI Dexterous Hands.
CoRR, December, 2025

Video Reality Test: Can AI-Generated ASMR Videos fool VLMs and Humans?
CoRR, December, 2025

Computer-Use Agents as Judges for Generative User Interface.
CoRR, November, 2025

Grounding Computer Use Agents on Human Demonstrations.
CoRR, November, 2025

VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation.
CoRR, November, 2025

Paper2Video: Automatic Video Generation from Scientific Papers.
CoRR, October, 2025

Code2Video: A Code-centric Paradigm for Educational Video Generation.
CoRR, October, 2025

DiffusionVMR: Diffusion Model for Joint Video Moment Retrieval and Highlight Detection.
IEEE Trans. Neural Networks Learn. Syst., August, 2025

Reinforcement Learning in Vision: A Survey.
CoRR, August, 2025

Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers.
CoRR, May, 2025

UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction.
CoRR, March, 2025

VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning.
CoRR, March, 2025

Fusion-Attention Diagnosis Network (FADNet): An end-to-end framework for optic disc segmentation and ocular disease classification.
Inf. Fusion, 2025

Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

GUI-Narrator: Detecting and Captioning Computer GUI Actions.
Proceedings of the 33rd ACM International Conference on Multimedia, 2025

UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction.
Proceedings of the Forty-second International Conference on Machine Learning, 2025

Show-o: One Single Transformer to Unify Multimodal Understanding and Generation.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

VLog: Video-Language Models by Generative Retrieval of Narration Vocabulary.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

ShowUI: One Vision-Language-Action Model for GUI Visual Agent.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

ROICtrl: Boosting Instance Control for Visual Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

MovieBench: A Hierarchical Movie Level Dataset for Long Video Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

VG-TVP: Multimodal Procedural Planning via Visually Grounded Text-Video Prompting.
Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024
ShowUI: One Vision-Language-Action Model for GUI Visual Agent.
CoRR, 2024

GUI Action Narrator: Where and When Did That Action Take Place?
CoRR, 2024

Learning Long-form Video Prior via Generative Pre-Training.
CoRR, 2024

COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training.
CoRR, 2024

VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision Computation.
Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

VideoGUI: A Benchmark for GUI Automation from Instructional Videos.
Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

AssistEditor: Multi-Agent Collaboration for GUI Workflow Automation in Video Creation.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

AssistGPT: Towards Multi-modal Agent for Human-Centric AI Assistant.
Proceedings of the 5th International Workshop on Human-centric Multimedia Analysis, 2024

Learning Video Context as Interleaved Multimodal Sequences.
Proceedings of the Computer Vision - ECCV 2024, 2024

Bootstrapping SparseFormers from Vision Foundation Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

VideoLLM-online: Online Video Large Language Model for Streaming Video.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
Unsupervised Cross-Modal Hashing With Modality-Interaction.
IEEE Trans. Circuits Syst. Video Technol., September, 2023

Unsupervised Cross-Modal Hashing via Semantic Text Mining.
IEEE Trans. Multim., 2023

Unsupervised Hashing with Semantic Concept Mining.
Proc. ACM Manag. Data, 2023

DiffusionVMR: Diffusion Model for Video Moment Retrieval.
CoRR, 2023

AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn.
CoRR, 2023

VisorGPT: Learning Visual Prior via Generative Pre-Training.
CoRR, 2023

Learning Visual Prior via Generative Pre-Training.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Too Large; Data Reduction for Vision-Language Pre-Training.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

UniVTG: Towards Unified Video-Language Temporal Grounding.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

All in One: Exploring Unified Video-Language Pre-Training.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Affordance Grounding from Demonstration Video to Target Image.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
Egocentric Video-Language Pretraining @ Ego4D Challenge 2022.
CoRR, 2022

Egocentric Video-Language Pretraining @ EPIC-KITCHENS-100 Multi-Instance Retrieval Challenge 2022.
CoRR, 2022

Egocentric Video-Language Pretraining.
CoRR, 2022

Egocentric Video-Language Pretraining.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Deep Unsupervised Hashing with Latent Semantic Components.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
Deep Self-Adaptive Hashing for Image Retrieval.
Proceedings of the CIKM '21: The 30th ACM International Conference on Information and Knowledge Management, Virtual Event, Queensland, Australia, November 1, 2021

2020
Label Self-Adaption Hashing for Image Retrieval.
Proceedings of the 25th International Conference on Pattern Recognition, 2020

Deep Superpixel Cut for Unsupervised Image Segmentation.
Proceedings of the 25th International Conference on Pattern Recognition, 2020


  Loading...