We stand with Ukraine

We stand with Ukraine

Qinghong Lin

Orcid: 0000-0003-2568-2346

According to our database¹, Qinghong Lin authored at least 65 papers between 2020 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

SceneCode: Executable World Programs for Editable Indoor Scenes with Articulated Objects.

[DOI]

,

,

,

,

Kevin Qinghong Lin

,

,

CoRR, May, 2026

AI for Auto-Research: Roadmap & User Guide.

[DOI]

,

,

,

,

Kevin Qinghong Lin

,

Xuan Billy Zhang

,

,

,

,

,

,

,

,

,

,

,

Benoit R. Cottereau

,

,

,

CoRR, May, 2026

Soap2Soap: Long Cinematic Video Remaking via Multi-Agent Collaboration.

[DOI]

,

,

Kevin Qinghong Lin

,

,

Mike Zheng Shou

CoRR, May, 2026

Checkup2Action: A Multimodal Clinical Check-up Report Dataset for Patient-Oriented Action Card Generation.

[DOI]

,

,

Kevin Qinghong Lin

,

,

,

,

Amir Atapour-Abarghouei

CoRR, May, 2026

Reasoning Compression with Mixed-Policy Distillation.

[DOI]

,

,

,

,

,

Kevin Qinghong Lin

,

CoRR, May, 2026

Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond.

[DOI]

,

Xuan Billy Zhang

,

Kevin Qinghong Lin

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Mike Zheng Shou

,

,

,

,

,

CoRR, April, 2026

GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents.

[DOI]

,

,

Kevin Qinghong Lin

,

,

Mike Zheng Shou

CoRR, April, 2026

GUIDE: A Benchmark for Understanding and Assisting Users in Open-Ended GUI Tasks.

[DOI]

,

,

,

Kevin Qinghong Lin

,

,

,

CoRR, March, 2026

CUA-Suite: Massive Human-annotated Video Demonstrations for Computer-Use Agents.

[DOI]

,

,

Kevin Qinghong Lin

,

,

,

Patrice Béchard

,

,

CoRR, March, 2026

Code2World: A GUI World Model via Renderable Code Generation.

[DOI]

,

,

,

,

,

,

,

,

Kevin Qinghong Lin

CoRR, February, 2026

FocusUI: Efficient UI Grounding via Position-Preserving Visual Token Selection.

[DOI]

,

Kevin Qinghong Lin

,

Mike Zheng Shou

,

CoRR, January, 2026

A Survey on Foundations and Frontiers of Multimodal Agentic Frameworks: Techniques and Applications.

[DOI]

,

,

,

,

Shraman Pramanick

,

Kevin Qinghong Lin

,

,

Mike Zheng Shou

,

,

Mohamed Elhoseiny

,

,

,

,

,

Sanjoy Chowdhury

,

Trans. Mach. Learn. Res., 2026

2025

ShowUI-<i>π</i>: Flow-based Generative Models as GUI Dexterous Hands.

[DOI]

,

Kevin Qinghong Lin

,

Mike Zheng Shou

CoRR, December, 2025

Video Reality Test: Can AI-Generated ASMR Videos fool VLMs and Humans?

[DOI]

,

,

,

,

,

,

,

,

Kevin Qinghong Lin

CoRR, December, 2025

Computer-Use Agents as Judges for Generative User Interface.

[DOI]

Kevin Qinghong Lin

,

,

,

,

,

,

Mike Zheng Shou

CoRR, November, 2025

Grounding Computer Use Agents on Human Demonstrations.

[DOI]

,

,

,

Kevin Qinghong Lin

,

,

,

,

Johan Obando-Ceron

,

Juan A. Rodríguez

,

Nicolas Chapados

,

,

Adriana Romero-Soriano

,

Reihaneh Rabbany

,

Perouz Taslakian

,

Christopher Pal

,

,

CoRR, November, 2025

VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation.

[DOI]

Kevin Qinghong Lin

,

,

,

,

,

,

,

Alex Jinpeng Wang

CoRR, November, 2025

Paper2Video: Automatic Video Generation from Scientific Papers.

[DOI]

,

Kevin Qinghong Lin

,

Mike Zheng Shou

CoRR, October, 2025

Code2Video: A Code-centric Paradigm for Educational Video Generation.

[DOI]

,

Kevin Qinghong Lin

,

Mike Zheng Shou

CoRR, October, 2025

DiffusionVMR: Diffusion Model for Joint Video Moment Retrieval and Highlight Detection.

[DOI]

,

Kevin Qinghong Lin

,

,

IEEE Trans. Neural Networks Learn. Syst., August, 2025

Reinforcement Learning in Vision: A Survey.

[DOI]

,

,

,

Kevin Qinghong Lin

,

,

,

,

,

Mike Zheng Shou

CoRR, August, 2025

Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers.

[DOI]

,

Kevin Qinghong Lin

,

,

,

CoRR, May, 2025

UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction.

[DOI]

,

,

Kevin Qinghong Lin

,

Juan A. Rodríguez

,

,

,

Nicolas Chapados

,

,

Aishwarya Agrawal

,

,

Christopher Pal

,

Perouz Taslakian

,

,

CoRR, March, 2025

VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning.

[DOI]

,

Kevin Qinghong Lin

,

,

Mike Zheng Shou

CoRR, March, 2025

Fusion-Attention Diagnosis Network (FADNet): An end-to-end framework for optic disc segmentation and ocular disease classification.

[DOI]

,

,

,

,

,

,

,

Teruko Fukuyama

,

,

,

,

,

,

,

,

,

Inf. Fusion, 2025

Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models.

[DOI]

,

Kevin Qinghong Lin

,

,

Mike Zheng Shou

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

GUI-Narrator: Detecting and Captioning Computer GUI Actions.

[DOI]

,

,

,

,

Mike Zheng Shou

Proceedings of the 33rd ACM International Conference on Multimedia, 2025

UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction.

[DOI]

,

,

Kevin Qinghong Lin

,

Juan A. Rodríguez

,

,

Nicolas Chapados

,

,

Aishwarya Agrawal

,

,

Christopher Pal

,

Perouz Taslakian

,

,

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Show-o: One Single Transformer to Unify Multimodal Understanding and Generation.

[DOI]

,

,

,

David Junhao Zhang

,

,

Kevin Qinghong Lin

,

,

,

,

Mike Zheng Shou

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

VLog: Video-Language Models by Generative Retrieval of Narration Vocabulary.

[DOI]

Kevin Qinghong Lin

,

Mike Zheng Shou

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

ShowUI: One Vision-Language-Action Model for GUI Visual Agent.

[DOI]

Kevin Qinghong Lin

,

,

,

,

,

,

Stan Weixian Lei

,

,

Mike Zheng Shou

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

ROICtrl: Boosting Instance Control for Visual Generation.

[DOI]

,

,

,

,

,

,

Kevin Qinghong Lin

,

Mike Zheng Shou

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

MovieBench: A Hierarchical Movie Level Dataset for Long Video Generation.

[DOI]

,

,

,

,

,

,

Kevin Qinghong Lin

,

,

Mike Zheng Shou

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

VG-TVP: Multimodal Procedural Planning via Visually Grounded Text-Video Prompting.

[DOI]

Muhammet Furkan Ilaslan

,

,

Kevin Qinghong Lin

,

,

Mike Zheng Shou

,

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024

ShowUI: One Vision-Language-Action Model for GUI Visual Agent.

[DOI]

Kevin Qinghong Lin

,

,

,

,

,

,

,

,

Mike Zheng Shou

CoRR, 2024

GUI Action Narrator: Where and When Did That Action Take Place?

[DOI]

,

,

Kevin Qinghong Lin

,

,

,

,

,

,

Mike Zheng Shou

CoRR, 2024

Learning Long-form Video Prior via Generative Pre-Training.

[DOI]

,

,

,

Kevin Qinghong Lin

,

,

,

,

,

,

,

Mike Zheng Shou

CoRR, 2024

COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training.

[DOI]

Alex Jinpeng Wang

,

,

Kevin Qinghong Lin

,

,

,

,

,

Mike Zheng Shou

CoRR, 2024

VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision Computation.

[DOI]

,

,

Kevin Qinghong Lin

,

,

,

,

,

,

,

Mike Zheng Shou

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

VideoGUI: A Benchmark for GUI Automation from Instructional Videos.

[DOI]

Kevin Qinghong Lin

,

,

,

,

,

,

,

Mike Zheng Shou

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

AssistEditor: Multi-Agent Collaboration for GUI Workflow Automation in Video Creation.

[DOI]

,

,

,

,

Mike Zheng Shou

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

AssistGPT: Towards Multi-modal Agent for Human-Centric AI Assistant.

[DOI]

,

,

,

Mike Zheng Shou

Proceedings of the 5th International Workshop on Human-centric Multimedia Analysis, 2024

Learning Video Context as Interleaved Multimodal Sequences.

[DOI]

Kevin Qinghong Lin

,

Pengchuan Zhang

,

,

,

,

,

,

,

Mike Zheng Shou

Proceedings of the Computer Vision - ECCV 2024, 2024

Bootstrapping SparseFormers from Vision Foundation Models.

[DOI]

,

,

Kevin Qinghong Lin

,

,

Mike Zheng Shou

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

VideoLLM-online: Online Video Large Language Model for Streaming Video.

[DOI]

,

,

,

Kevin Qinghong Lin

,

,

,

,

,

,

Mike Zheng Shou

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

Unsupervised Cross-Modal Hashing With Modality-Interaction.

[DOI]

,

,

,

,

,

,

IEEE Trans. Circuits Syst. Video Technol., September, 2023

Unsupervised Cross-Modal Hashing via Semantic Text Mining.

[DOI]

,

,

,

,

,

,

IEEE Trans. Multim., 2023

Unsupervised Hashing with Semantic Concept Mining.

[DOI]

,

,

Kevin Qinghong Lin

,

,

,

,

,

Proc. ACM Manag. Data, 2023

DiffusionVMR: Diffusion Model for Video Moment Retrieval.

[DOI]

,

Kevin Qinghong Lin

,

,

CoRR, 2023

AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn.

[DOI]

,

,

,

Kevin Qinghong Lin

,

,

,

Mike Zheng Shou

CoRR, 2023

VisorGPT: Learning Visual Prior via Generative Pre-Training.

[DOI]

,

,

,

,

Kevin Qinghong Lin

,

,

,

Mike Zheng Shou

CoRR, 2023

Learning Visual Prior via Generative Pre-Training.

[DOI]

,

,

,

,

Kevin Qinghong Lin

,

,

,

Mike Zheng Shou

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Too Large; Data Reduction for Vision-Language Pre-Training.

[DOI]

Alex Jinpeng Wang

,

Kevin Qinghong Lin

,

David Junhao Zhang

,

Stan Weixian Lei

,

Mike Zheng Shou

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone.

[DOI]

Shraman Pramanick

,

,

,

Kevin Qinghong Lin

,

,

Mike Zheng Shou

,

,

Pengchuan Zhang

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

UniVTG: Towards Unified Video-Language Temporal Grounding.

[DOI]

Kevin Qinghong Lin

,

Pengchuan Zhang

,

,

Shraman Pramanick

,

,

Alex Jinpeng Wang

,

,

Mike Zheng Shou

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

All in One: Exploring Unified Video-Language Pre-Training.

[DOI]

,

,

,

,

Kevin Qinghong Lin

,

Satoshi Tsutsui

,

,

,

,

,

,

Mike Zheng Shou

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Affordance Grounding from Demonstration Video to Target Image.

[DOI]

,

,

Kevin Qinghong Lin

,

Mike Zheng Shou

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

Egocentric Video-Language Pretraining @ Ego4D Challenge 2022.

[DOI]

Kevin Qinghong Lin

,

Alex Jinpeng Wang

,

,

,

,

Eric Zhongcong Xu

,

,

,

,

,

,

,

,

,

,

Mike Zheng Shou

CoRR, 2022

Egocentric Video-Language Pretraining @ EPIC-KITCHENS-100 Multi-Instance Retrieval Challenge 2022.

[DOI]

Kevin Qinghong Lin

,

Alex Jinpeng Wang

,

,

Eric Zhongcong Xu

,

,

,

,

,

,

,

,

Mike Zheng Shou

CoRR, 2022

Egocentric Video-Language Pretraining.

[DOI]

Kevin Qinghong Lin

,

Alex Jinpeng Wang

,

,

,

,

Eric Zhongcong Xu

,

,

,

,

,

,

,

,

,

,

Mike Zheng Shou

CoRR, 2022

Egocentric Video-Language Pretraining.

[DOI]

Kevin Qinghong Lin

,

,

,

,

,

Eric Zhongcong Xu

,

,

,

,

,

,

,

,

,

,

Mike Zheng Shou

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Deep Unsupervised Hashing with Latent Semantic Components.

[DOI]

,

,

,

,

,

Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021

Deep Self-Adaptive Hashing for Image Retrieval.

[DOI]

,

,

,

,

Proceedings of the CIKM '21: The 30th ACM International Conference on Information and Knowledge Management, Virtual Event, Queensland, Australia, November 1, 2021

2020

Label Self-Adaption Hashing for Image Retrieval.

[DOI]

,

,

,

,

Proceedings of the 25th International Conference on Pattern Recognition, 2020

Deep Superpixel Cut for Unsupervised Image Segmentation.

[DOI]

,

,

Proceedings of the 25th International Conference on Pattern Recognition, 2020

Loading...