Yi Wang

Orcid: 0000-0002-1728-9563

Affiliations:
  • Shanghai AI Laboratory, China


According to our database1, Yi Wang authored at least 72 papers between 2012 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
DisCo: Towards Distinct and Coherent Visual Encapsulation in Video MLLMs.
CoRR, July, 2025

InternSpatial: A Comprehensive Dataset for Spatial Reasoning in Vision-Language Models.
CoRR, June, 2025

VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos.
CoRR, June, 2025

VideoChat-A1: Thinking with Long Videos by Chain-of-Shot Reasoning.
CoRR, June, 2025

LaVie: High-Quality Video Generation with Cascaded Latent Diffusion Models.
Int. J. Comput. Vis., May, 2025

TraveLLaMA: Facilitating Multi-modal Large Language Models to Understand Urban Scenes and Provide Travel Assistance.
CoRR, April, 2025

VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning.
CoRR, April, 2025

MSFM-UNET: enhancing medical image segmentation with multi-scale and multi-view frequency fusion.
Pattern Anal. Appl., March, 2025

Make Your Training Flexible: Towards Deployment-Efficient Video Models.
CoRR, March, 2025

InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling.
CoRR, January, 2025

Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models.
CoRR, January, 2025

VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling.
CoRR, January, 2025

Scale-View Co-Awareness Framework for Simultaneous Segmentation of Pancreas and Tumors.
IEEE Trans. Instrum. Meas., 2025

TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling.
CoRR, 2024

ViLLa: Video Reasoning Segmentation with Large Language Model.
CoRR, 2024

OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text.
CoRR, 2024

InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding.
CoRR, 2024

From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities.
CoRR, 2024

SyncVIS: Synchronized Video Instance Segmentation.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Does Video-Text Pretraining Help Open-Vocabulary Online Action Detection?
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

OSFENet: Object Spatiotemporal Feature Enhanced Network for Surgical Phase Recognition.
Proceedings of the Advanced Intelligent Computing Technology and Applications, 2024

InternVideo2: Scaling Foundation Models for Multimodal Video Understanding.
Proceedings of the Computer Vision - ECCV 2024, 2024

VideoMamba: State Space Model for Efficient Video Understanding.
Proceedings of the Computer Vision - ECCV 2024, 2024

MVBench: A Comprehensive Multi-modal Video Understanding Benchmark.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
Conditional Temporal Variational AutoEncoder for Action Video Prediction.
Int. J. Comput. Vis., October, 2023

Open World Entity Segmentation.
IEEE Trans. Pattern Anal. Mach. Intell., July, 2023

MVBench: A Comprehensive Multi-modal Video Understanding Benchmark.
CoRR, 2023

Harvest Video Foundation Models via Efficient Post-Pretraining.
CoRR, 2023

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation.
CoRR, 2023

VideoLLM: Modeling Video Sequence with Large Language Models.
CoRR, 2023

VideoChat: Chat-Centric Video Understanding.
CoRR, 2023

InternGPT: Solving Vision-Centric Tasks by Interacting with Chatbots Beyond Language.
CoRR, 2023

TMT-VIS: Taxonomy-aware Multi-dataset Joint Training for Video Instance Segmentation.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

JourneyDB: A Benchmark for Generative Image Understanding.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Retrieving-to-Answer: Zero-Shot Video Question Answering with Frozen Large Language Models.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Scaling Data Generation in Vision-and-Language Navigation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

UniFormerV2: Unlocking the Potential of Image ViTs for Video Understanding.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Unmasked Teacher: Towards Training-Efficient Video Foundation Models.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

NeRFLiX: High-Quality Neural View Synthesis by Learning a Degradation-Driven Inter-viewpoint MiXer.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Learning Open-Vocabulary Semantic Segmentation Models From Natural Language Supervision.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Advancing Pancreas Segmentation through the Patch-Adjust Fusion Framework.
Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine, 2023

2022
PointINS: Point-Based Instance Segmentation.
IEEE Trans. Pattern Anal. Mach. Intell., 2022

InternVideo: General Video Foundation Models via Generative and Discriminative Learning.
CoRR, 2022

CMC v2: Towards More Accurate COVID-19 Detection with Discriminative Video Priors.
CoRR, 2022

UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer.
CoRR, 2022

InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D Challenges.
CoRR, 2022

PalGAN: Image Colorization with Palette Generative Adversarial Networks.
Proceedings of the Computer Vision - ECCV 2022, 2022

CMC_v2: Towards More Accurate COVID-19 Detection with Discriminative Video Priors.
Proceedings of the Computer Vision - ECCV 2022 Workshops, 2022

Towards Implicit Text-Guided 3D Shape Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

MAT: Mask-Aware Transformer for Large Hole Image Inpainting.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021
Blood Vessel Segmentation Based on the 3D Residual U-Net.
Int. J. Pattern Recognit. Artif. Intell., 2021

Image Synthesis via Semantic Composition.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Multi-Scale Aligned Distillation for Low-Resolution Detection.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020
Automatic liver tumour segmentation in CT combining FCN and NMF-based deformable model.
Comput. methods Biomech. Biomed. Eng. Imaging Vis., 2020

VCNet: A Robust Approach to Blind Image Inpainting.
Proceedings of the Computer Vision - ECCV 2020, 2020

Attentive Normalization for Conditional Image Generation.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019
Liver Vessels Segmentation Based on 3d Residual U-NET.
Proceedings of the 2019 IEEE International Conference on Image Processing, 2019

Wide-Context Semantic Image Extrapolation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018
Wavelet-based extended morphological profile and deep autoencoder for hyperspectral image classification.
Int. J. Wavelets Multiresolution Inf. Process., 2018

Scale-recurrent Network for Deep Image Deblurring.
CoRR, 2018

Image Inpainting via Generative Multi-column Convolutional Neural Networks.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

2017
Automatic Liver Lesion Segmentation in CT Combining Fully Convolutional Networks and Non-negative Matrix Factorization.
Proceedings of the Imaging for Patient-Customized Simulations and Systems for Point-of-Care Ultrasound, 2017

A novel variational method for liver segmentation based on statistical shape model prior and enforced local statistical feature.
Proceedings of the 14th IEEE International Symposium on Biomedical Imaging, 2017

2016
Face recognition via collaborative representation based multiple one-dimensional embedding.
Int. J. Wavelets Multiresolution Inf. Process., 2016

2014
An effective and robust method for modeling multi-furcation liver vessel by using Gap Border Pairing.
Comput. Medical Imaging Graph., 2014

2013
Automatic Multi-Scale Segmentation of Intrahepatic Vessel in CT Images for Liver Surgery Planning.
Int. J. Pattern Recognit. Artif. Intell., 2013

2012
Interconnection of wind farms with grid using a MTDC network.
Proceedings of the 38th Annual Conference on IEEE Industrial Electronics Society, 2012


  Loading...