Yinan He

Orcid: 0000-0002-6200-9938

According to our database1, Yinan He authored at least 51 papers between 2009 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models.
CoRR, June, 2025

InternSpatial: A Comprehensive Dataset for Spatial Reasoning in Vision-Language Models.
CoRR, June, 2025

VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos.
CoRR, June, 2025

LaVie: High-Quality Video Generation with Cascaded Latent Diffusion Models.
Int. J. Comput. Vis., May, 2025

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models.
CoRR, April, 2025

VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning.
CoRR, April, 2025

VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness.
CoRR, March, 2025

InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling.
CoRR, January, 2025

DiffVSR: Enhancing Real-World Video Super-Resolution with Diffusion Models for Advanced Visual Quality and Temporal Consistency.
CoRR, January, 2025

Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models.
CoRR, January, 2025

VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling.
CoRR, January, 2025

A Novel Hybrid-DCNN-Based Framework for Enhanced Rice Aboveground Biomass Estimation Under Limited Samples.
IEEE Trans. Geosci. Remote. Sens., 2025

Learning Discriminative Representations in Videos via Active Embedding Distance Correlation.
IEEE Signal Process. Lett., 2025

Discovering robust biomarkers of psychiatric disorders from resting-state functional MRI via graph neural networks: A systematic review.
NeuroImage, 2025

OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

WISNet: Pseudo Label Generation on Unbalanced and Patch Annotated Waste Images.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024
Hybrid Active Learning with Uncertainty-Weighted Embeddings.
Trans. Mach. Learn. Res., 2024

Downscaling Administrative-Level Crop Yield Statistics to 1 km Grids Using Multisource Remote Sensing Data and Ensemble Machine Learning.
IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens., 2024

VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models.
CoRR, 2024

OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text.
CoRR, 2024

Discovering robust biomarkers of neurological disorders from functional MRI using graph neural networks: A Review.
CoRR, 2024

InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding.
CoRR, 2024

From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities.
CoRR, 2024

Does Video-Text Pretraining Help Open-Vocabulary Online Action Detection?
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Subtype-Specific Biomarkers of Alzheimer's Disease from Anatomical and Functional Connectomes via Graph Neural Networks.
Proceedings of the IEEE International Conference on Acoustics, 2024

InternVideo2: Scaling Foundation Models for Multimodal Video Understanding.
Proceedings of the Computer Vision - ECCV 2024, 2024

VideoMamba: State Space Model for Efficient Video Understanding.
Proceedings of the Computer Vision - ECCV 2024, 2024

VBench: Comprehensive Benchmark Suite for Video Generative Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

MVBench: A Comprehensive Multi-modal Video Understanding Benchmark.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark.
CoRR, 2023

Harvest Video Foundation Models via Efficient Post-Pretraining.
CoRR, 2023

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation.
CoRR, 2023

VideoChat: Chat-Centric Video Understanding.
CoRR, 2023

InternGPT: Solving Vision-Centric Tasks by Interacting with Chatbots Beyond Language.
CoRR, 2023

UniFormerV2: Unlocking the Potential of Image ViTs for Video Understanding.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Unmasked Teacher: Towards Training-Efficient Video Foundation Models.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
InternVideo: General Video Foundation Models via Generative and Discriminative Learning.
CoRR, 2022

Exploring adaptation of VideoMAE for Audio-Visual Diarization & Social @ Ego4d Looking at me Challenge.
CoRR, 2022

UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer.
CoRR, 2022

InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D Challenges.
CoRR, 2022

TreMo: Continuous Vital Sign Monitoring Based on Subtle Intrinsic Tremors with COTS Mobile Devices.
Proceedings of the IEEE International Conference on Communications, 2022

X-Learner: Learning Cross Sources and Tasks for Universal Visual Representation.
Proceedings of the Computer Vision - ECCV 2022, 2022

2021
ForgeryNet - Face Forgery Analysis Challenge 2021: Methods and Results.
CoRR, 2021

INTERN: A New Learning Paradigm Towards General Vision.
CoRR, 2021

ForgeryNet: A Versatile Benchmark for Comprehensive Forgery Analysis.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2019
Inter-frame Relationship Graph Based Near-Duplicate Video Clip Detection Method.
Proceedings of the Image and Graphics Technologies and Applications, 2019

2017
Highly Portable, Sensor-Based System for Human Fall Monitoring.
Sensors, 2017

2009
Height Servo System for Straw-Checkerboard Sand Barriers Paving Robot.
Proceedings of the 2009 Second International Symposium on Computational Intelligence and Design, 2009


  Loading...