Qinghong Lin

Orcid: 0000-0003-2568-2346

According to our database1, Qinghong Lin authored at least 44 papers between 2020 and 2025.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
DiffusionVMR: Diffusion Model for Joint Video Moment Retrieval and Highlight Detection.
IEEE Trans. Neural Networks Learn. Syst., August, 2025

Reinforcement Learning in Vision: A Survey.
CoRR, August, 2025

Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers.
CoRR, May, 2025

Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models.
CoRR, May, 2025

UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction.
CoRR, March, 2025

VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning.
CoRR, March, 2025

Fusion-Attention Diagnosis Network (FADNet): An end-to-end framework for optic disc segmentation and ocular disease classification.
Inf. Fusion, 2025

Show-o: One Single Transformer to Unify Multimodal Understanding and Generation.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

VLog: Video-Language Models by Generative Retrieval of Narration Vocabulary.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

ShowUI: One Vision-Language-Action Model for GUI Visual Agent.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

ROICtrl: Boosting Instance Control for Visual Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

MovieBench: A Hierarchical Movie Level Dataset for Long Video Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

VG-TVP: Multimodal Procedural Planning via Visually Grounded Text-Video Prompting.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024
ShowUI: One Vision-Language-Action Model for GUI Visual Agent.
CoRR, 2024

GUI Action Narrator: Where and When Did That Action Take Place?
CoRR, 2024

Learning Long-form Video Prior via Generative Pre-Training.
CoRR, 2024

COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training.
CoRR, 2024

VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision Computation.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

VideoGUI: A Benchmark for GUI Automation from Instructional Videos.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

AssistEditor: Multi-Agent Collaboration for GUI Workflow Automation in Video Creation.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

AssistGPT: Towards Multi-modal Agent for Human-Centric AI Assistant.
Proceedings of the 5th International Workshop on Human-centric Multimedia Analysis, 2024

Learning Video Context as Interleaved Multimodal Sequences.
Proceedings of the Computer Vision - ECCV 2024, 2024

Bootstrapping SparseFormers from Vision Foundation Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

VideoLLM-online: Online Video Large Language Model for Streaming Video.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
Unsupervised Cross-Modal Hashing With Modality-Interaction.
IEEE Trans. Circuits Syst. Video Technol., September, 2023

Unsupervised Cross-Modal Hashing via Semantic Text Mining.
IEEE Trans. Multim., 2023

Unsupervised Hashing with Semantic Concept Mining.
Proc. ACM Manag. Data, 2023

DiffusionVMR: Diffusion Model for Video Moment Retrieval.
CoRR, 2023

AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn.
CoRR, 2023

VisorGPT: Learning Visual Prior via Generative Pre-Training.
CoRR, 2023

Learning Visual Prior via Generative Pre-Training.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Too Large; Data Reduction for Vision-Language Pre-Training.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

UniVTG: Towards Unified Video-Language Temporal Grounding.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

All in One: Exploring Unified Video-Language Pre-Training.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Affordance Grounding from Demonstration Video to Target Image.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
Egocentric Video-Language Pretraining @ Ego4D Challenge 2022.
CoRR, 2022

Egocentric Video-Language Pretraining @ EPIC-KITCHENS-100 Multi-Instance Retrieval Challenge 2022.
CoRR, 2022

Egocentric Video-Language Pretraining.
CoRR, 2022

Egocentric Video-Language Pretraining.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Deep Unsupervised Hashing with Latent Semantic Components.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
Deep Self-Adaptive Hashing for Image Retrieval.
Proceedings of the CIKM '21: The 30th ACM International Conference on Information and Knowledge Management, Virtual Event, Queensland, Australia, November 1, 2021

2020
Label Self-Adaption Hashing for Image Retrieval.
Proceedings of the 25th International Conference on Pattern Recognition, 2020

Deep Superpixel Cut for Unsupervised Image Segmentation.
Proceedings of the 25th International Conference on Pattern Recognition, 2020


  Loading...