We stand with Ukraine

We stand with Ukraine

Qinghong Lin

Orcid: 0000-0003-2568-2346

According to our database¹, Qinghong Lin authored at least 48 papers between 2020 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2025

VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation.

[BibT_eX]

[DOI]

Kevin Qinghong Lin

,

,

,

,

,

,

,

Alex Jinpeng Wang

CoRR, November, 2025

Paper2Video: Automatic Video Generation from Scientific Papers.

[BibT_eX]

[DOI]

,

Kevin Qinghong Lin

,

Mike Zheng Shou

CoRR, October, 2025

Code2Video: A Code-centric Paradigm for Educational Video Generation.

[BibT_eX]

[DOI]

,

Kevin Qinghong Lin

,

Mike Zheng Shou

CoRR, October, 2025

DiffusionVMR: Diffusion Model for Joint Video Moment Retrieval and Highlight Detection.

[BibT_eX]

[DOI]

,

Kevin Qinghong Lin

,

,

IEEE Trans. Neural Networks Learn. Syst., August, 2025

Reinforcement Learning in Vision: A Survey.

[BibT_eX]

[DOI]

,

,

,

Kevin Qinghong Lin

,

,

,

,

,

Mike Zheng Shou

CoRR, August, 2025

Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers.

[BibT_eX]

[DOI]

,

Kevin Qinghong Lin

,

,

,

CoRR, May, 2025

Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models.

[BibT_eX]

[DOI]

,

Kevin Qinghong Lin

,

,

Mike Zheng Shou

CoRR, May, 2025

UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction.

[BibT_eX]

[DOI]

,

,

Kevin Qinghong Lin

,

Juan A. Rodríguez

,

,

,

Nicolas Chapados

,

,

Aishwarya Agrawal

,

,

Christopher Pal

,

Perouz Taslakian

,

,

CoRR, March, 2025

VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning.

[BibT_eX]

[DOI]

,

Kevin Qinghong Lin

,

,

Mike Zheng Shou

CoRR, March, 2025

Fusion-Attention Diagnosis Network (FADNet): An end-to-end framework for optic disc segmentation and ocular disease classification.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

Teruko Fukuyama

,

,

,

,

,

,

,

,

,

Inf. Fusion, 2025

UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction.

[BibT_eX]

[DOI]

,

,

Kevin Qinghong Lin

,

Juan A. Rodríguez

,

,

Nicolas Chapados

,

,

Aishwarya Agrawal

,

,

Christopher Pal

,

Perouz Taslakian

,

,

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Show-o: One Single Transformer to Unify Multimodal Understanding and Generation.

[BibT_eX]

[DOI]

,

,

,

David Junhao Zhang

,

,

Kevin Qinghong Lin

,

,

,

,

Mike Zheng Shou

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

VLog: Video-Language Models by Generative Retrieval of Narration Vocabulary.

[BibT_eX]

[DOI]

Kevin Qinghong Lin

,

Mike Zheng Shou

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

ShowUI: One Vision-Language-Action Model for GUI Visual Agent.

[BibT_eX]

[DOI]

Kevin Qinghong Lin

,

,

,

,

,

,

Stan Weixian Lei

,

,

Mike Zheng Shou

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

ROICtrl: Boosting Instance Control for Visual Generation.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Kevin Qinghong Lin

,

Mike Zheng Shou

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

MovieBench: A Hierarchical Movie Level Dataset for Long Video Generation.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Kevin Qinghong Lin

,

,

Mike Zheng Shou

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

VG-TVP: Multimodal Procedural Planning via Visually Grounded Text-Video Prompting.

[BibT_eX]

[DOI]

Muhammet Furkan Ilaslan

,

,

Kevin Qinghong Lin

,

,

Mike Zheng Shou

,

Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024

ShowUI: One Vision-Language-Action Model for GUI Visual Agent.

[BibT_eX]

[DOI]

Kevin Qinghong Lin

,

,

,

,

,

,

,

,

Mike Zheng Shou

CoRR, 2024

GUI Action Narrator: Where and When Did That Action Take Place?

[BibT_eX]

[DOI]

,

,

Kevin Qinghong Lin

,

,

,

,

,

,

Mike Zheng Shou

CoRR, 2024

Learning Long-form Video Prior via Generative Pre-Training.

[BibT_eX]

[DOI]

,

,

,

Kevin Qinghong Lin

,

,

,

,

,

,

,

Mike Zheng Shou

CoRR, 2024

COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training.

[BibT_eX]

[DOI]

Alex Jinpeng Wang

,

,

Kevin Qinghong Lin

,

,

,

,

,

Mike Zheng Shou

CoRR, 2024

VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision Computation.

[BibT_eX]

[DOI]

,

,

Kevin Qinghong Lin

,

,

,

,

,

,

,

Mike Zheng Shou

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

VideoGUI: A Benchmark for GUI Automation from Instructional Videos.

[BibT_eX]

[DOI]

Kevin Qinghong Lin

,

,

,

,

,

,

,

Mike Zheng Shou

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

AssistEditor: Multi-Agent Collaboration for GUI Workflow Automation in Video Creation.

[BibT_eX]

[DOI]

,

,

,

,

Mike Zheng Shou

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

AssistGPT: Towards Multi-modal Agent for Human-Centric AI Assistant.

[BibT_eX]

[DOI]

,

,

,

Mike Zheng Shou

Proceedings of the 5th International Workshop on Human-centric Multimedia Analysis, 2024

Learning Video Context as Interleaved Multimodal Sequences.

[BibT_eX]

[DOI]

Kevin Qinghong Lin

,

Pengchuan Zhang

,

,

,

,

,

,

,

Mike Zheng Shou

Proceedings of the Computer Vision - ECCV 2024, 2024

Bootstrapping SparseFormers from Vision Foundation Models.

[BibT_eX]

[DOI]

,

,

Kevin Qinghong Lin

,

,

Mike Zheng Shou

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

VideoLLM-online: Online Video Large Language Model for Streaming Video.

[BibT_eX]

[DOI]

,

,

,

Kevin Qinghong Lin

,

,

,

,

,

,

Mike Zheng Shou

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

Unsupervised Cross-Modal Hashing With Modality-Interaction.

[BibT_eX]

[DOI]

,

,

,

,

,

,

IEEE Trans. Circuits Syst. Video Technol., September, 2023

Unsupervised Cross-Modal Hashing via Semantic Text Mining.

[BibT_eX]

[DOI]

,

,

,

,

,

,

IEEE Trans. Multim., 2023

Unsupervised Hashing with Semantic Concept Mining.

[BibT_eX]

[DOI]

,

,

Kevin Qinghong Lin

,

,

,

,

,

Proc. ACM Manag. Data, 2023

DiffusionVMR: Diffusion Model for Video Moment Retrieval.

[BibT_eX]

[DOI]

,

Kevin Qinghong Lin

,

,

CoRR, 2023

AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn.

[BibT_eX]

[DOI]

,

,

,

Kevin Qinghong Lin

,

,

,

Mike Zheng Shou

CoRR, 2023

VisorGPT: Learning Visual Prior via Generative Pre-Training.

[BibT_eX]

[DOI]

,

,

,

,

Kevin Qinghong Lin

,

,

,

Mike Zheng Shou

CoRR, 2023

Learning Visual Prior via Generative Pre-Training.

[BibT_eX]

[DOI]

,

,

,

,

Kevin Qinghong Lin

,

,

,

Mike Zheng Shou

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Too Large; Data Reduction for Vision-Language Pre-Training.

[BibT_eX]

[DOI]

Alex Jinpeng Wang

,

Kevin Qinghong Lin

,

David Junhao Zhang

,

Stan Weixian Lei

,

Mike Zheng Shou

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone.

[BibT_eX]

[DOI]

Shraman Pramanick

,

,

,

Kevin Qinghong Lin

,

,

Mike Zheng Shou

,

,

Pengchuan Zhang

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

UniVTG: Towards Unified Video-Language Temporal Grounding.

[BibT_eX]

[DOI]

Kevin Qinghong Lin

,

Pengchuan Zhang

,

,

Shraman Pramanick

,

,

Alex Jinpeng Wang

,

,

Mike Zheng Shou

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

All in One: Exploring Unified Video-Language Pre-Training.

[BibT_eX]

[DOI]

,

,

,

,

Kevin Qinghong Lin

,

Satoshi Tsutsui

,

,

,

,

,

,

Mike Zheng Shou

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Affordance Grounding from Demonstration Video to Target Image.

[BibT_eX]

[DOI]

,

,

Kevin Qinghong Lin

,

Mike Zheng Shou

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

Egocentric Video-Language Pretraining @ Ego4D Challenge 2022.

[BibT_eX]

[DOI]

Kevin Qinghong Lin

,

Alex Jinpeng Wang

,

,

,

,

Eric Zhongcong Xu

,

,

,

,

,

,

,

,

,

,

Mike Zheng Shou

CoRR, 2022

Egocentric Video-Language Pretraining @ EPIC-KITCHENS-100 Multi-Instance Retrieval Challenge 2022.

[BibT_eX]

[DOI]

Kevin Qinghong Lin

,

Alex Jinpeng Wang

,

,

Eric Zhongcong Xu

,

,

,

,

,

,

,

,

Mike Zheng Shou

CoRR, 2022

Egocentric Video-Language Pretraining.

[BibT_eX]

[DOI]

Kevin Qinghong Lin

,

Alex Jinpeng Wang

,

,

,

,

Eric Zhongcong Xu

,

,

,

,

,

,

,

,

,

,

Mike Zheng Shou

CoRR, 2022

Egocentric Video-Language Pretraining.

[BibT_eX]

[DOI]

Kevin Qinghong Lin

,

,

,

,

,

Eric Zhongcong Xu

,

,

,

,

,

,

,

,

,

,

Mike Zheng Shou

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Deep Unsupervised Hashing with Latent Semantic Components.

[BibT_eX]

[DOI]

,

,

,

,

,

Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021

Deep Self-Adaptive Hashing for Image Retrieval.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the CIKM '21: The 30th ACM International Conference on Information and Knowledge Management, Virtual Event, Queensland, Australia, November 1, 2021

2020

Label Self-Adaption Hashing for Image Retrieval.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the 25th International Conference on Pattern Recognition, 2020

Deep Superpixel Cut for Unsupervised Image Segmentation.

[BibT_eX]

[DOI]

,

,

Proceedings of the 25th International Conference on Pattern Recognition, 2020

Loading...