Yinan He

Orcid: 0000-0002-0969-6310

According to our database¹, Yinan He authored at least 57 papers between 2009 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., March, 2026

2025

ExpVid: A Benchmark for Experiment Video Understanding & Reasoning.

[BibT_eX]

[DOI]

CoRR, October, 2025

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency.

[BibT_eX]

[DOI]

CoRR, August, 2025

ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models.

[BibT_eX]

[DOI]

CoRR, June, 2025

InternSpatial: A Comprehensive Dataset for Spatial Reasoning in Vision-Language Models.

[BibT_eX]

[DOI]

CoRR, June, 2025

LaVie: High-Quality Video Generation with Cascaded Latent Diffusion Models.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., May, 2025

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models.

[BibT_eX]

[DOI]

CoRR, April, 2025

VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning.

[BibT_eX]

[DOI]

CoRR, April, 2025

VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness.

[BibT_eX]

[DOI]

CoRR, March, 2025

InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling.

[BibT_eX]

[DOI]

CoRR, January, 2025

DiffVSR: Enhancing Real-World Video Super-Resolution with Diffusion Models for Advanced Visual Quality and Temporal Consistency.

[BibT_eX]

[DOI]

CoRR, January, 2025

Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models.

[BibT_eX]

[DOI]

CoRR, January, 2025

VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling.

[BibT_eX]

[DOI]

CoRR, January, 2025

A Novel Hybrid-DCNN-Based Framework for Enhanced Rice Aboveground Biomass Estimation Under Limited Samples.

[BibT_eX]

[DOI]

IEEE Trans. Geosci. Remote. Sens., 2025

Learning Discriminative Representations in Videos via Active Embedding Distance Correlation.

[BibT_eX]

[DOI]

IEEE Signal Process. Lett., 2025

Discovering robust biomarkers of psychiatric disorders from resting-state functional MRI via graph neural networks: A systematic review.

[BibT_eX]

[DOI]

NeuroImage, 2025

VideoChat: chat-centric video understanding.

[BibT_eX]

[DOI]

Sci. China Inf. Sci., 2025

VideoChat-R1.5: Visual Test-Time Scaling to Reinforce Multimodal Reasoning by Iterative Perception.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text.

[BibT_eX]

[DOI]

et al.

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

DiffVSR: Revealing an Effective Recipe for Taming Robust Video Super-Resolution Against Complex Degradations.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

WISNet: Pseudo Label Generation on Unbalanced and Patch Annotated Waste Images.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024

Hybrid Active Learning with Uncertainty-Weighted Embeddings.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2024

Downscaling Administrative-Level Crop Yield Statistics to 1 km Grids Using Multisource Remote Sensing Data and Ensemble Machine Learning.

[BibT_eX]

[DOI]

IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens., 2024

VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models.

[BibT_eX]

[DOI]

CoRR, 2024

OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text.

[BibT_eX]

[DOI]

CoRR, 2024

Discovering robust biomarkers of neurological disorders from functional MRI using graph neural networks: A Review.

[BibT_eX]

[DOI]

CoRR, 2024

InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding.

[BibT_eX]

[DOI]

CoRR, 2024

From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities.

[BibT_eX]

[DOI]

CoRR, 2024

Does Video-Text Pretraining Help Open-Vocabulary Online Action Detection?

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Subtype-Specific Biomarkers of Alzheimer's Disease from Anatomical and Functional Connectomes via Graph Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

InternVideo2: Scaling Foundation Models for Multimodal Video Understanding.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

VideoMamba: State Space Model for Efficient Video Understanding.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

VBench: Comprehensive Benchmark Suite for Video Generative Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

MVBench: A Comprehensive Multi-modal Video Understanding Benchmark.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

MVBench: A Comprehensive Multi-modal Video Understanding Benchmark.

[BibT_eX]

[DOI]

CoRR, 2023

Harvest Video Foundation Models via Efficient Post-Pretraining.

[BibT_eX]

[DOI]

CoRR, 2023

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation.

[BibT_eX]

[DOI]

CoRR, 2023

InternGPT: Solving Vision-Centric Tasks by Interacting with Chatbots Beyond Language.

[BibT_eX]

[DOI]

CoRR, 2023

UniFormerV2: Unlocking the Potential of Image ViTs for Video Understanding.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Unmasked Teacher: Towards Training-Efficient Video Foundation Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

InternVideo: General Video Foundation Models via Generative and Discriminative Learning.

[BibT_eX]

[DOI]

CoRR, 2022

Exploring adaptation of VideoMAE for Audio-Visual Diarization & Social @ Ego4d Looking at me Challenge.

[BibT_eX]

[DOI]

Yinan He

Guo Chen

CoRR, 2022

UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer.

[BibT_eX]

[DOI]

CoRR, 2022

InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D Challenges.

[BibT_eX]

[DOI]

CoRR, 2022

TreMo: Continuous Vital Sign Monitoring Based on Subtle Intrinsic Tremors with COTS Mobile Devices.

[BibT_eX]

[DOI]

Yinan He

Yi Jiang

Hongzi Zhu

Proceedings of the IEEE International Conference on Communications, 2022

X-Learner: Learning Cross Sources and Tasks for Universal Visual Representation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

2021

ForgeryNet - Face Forgery Analysis Challenge 2021: Methods and Results.

[BibT_eX]

[DOI]

CoRR, 2021

INTERN: A New Learning Paradigm Towards General Vision.

[BibT_eX]

[DOI]

CoRR, 2021

ForgeryNet: A Versatile Benchmark for Comprehensive Forgery Analysis.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2019

Inter-frame Relationship Graph Based Near-Duplicate Video Clip Detection Method.

[BibT_eX]

[DOI]

Proceedings of the Image and Graphics Technologies and Applications, 2019

2017

Highly Portable, Sensor-Based System for Human Fall Monitoring.

[BibT_eX]

[DOI]

Sensors, 2017

2009

Height Servo System for Straw-Checkerboard Sand Barriers Paving Robot.

[BibT_eX]

[DOI]

Proceedings of the 2009 Second International Symposium on Computational Intelligence and Design, 2009

Yinan He

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...