Yi Wang

Orcid: 0000-0002-1728-9563

Affiliations:

Shanghai AI Laboratory, China

According to our database¹, Yi Wang authored at least 79 papers between 2012 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2025

VideoTG-R1: Boosting Video Temporal Grounding via Curriculum Reinforcement Learning on Reflected Boundary Annotations.

[BibT_eX]

[DOI]

CoRR, October, 2025

UniFlow: A Unified Pixel Flow Tokenizer for Visual Understanding and Generation.

[BibT_eX]

[DOI]

CoRR, October, 2025

Learning Goal-Oriented Language-Guided Navigation with Self-Improving Demonstrations at Scale.

[BibT_eX]

[DOI]

CoRR, September, 2025

VideoChat-R1.5: Visual Test-Time Scaling to Reinforce Multimodal Reasoning by Iterative Perception.

[BibT_eX]

[DOI]

CoRR, September, 2025

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency.

[BibT_eX]

[DOI]

CoRR, August, 2025

A Lightweight Group Multiscale Bidirectional Interactive Network for Real-Time Steel Surface Defect Detection.

[BibT_eX]

[DOI]

CoRR, August, 2025

DisCo: Towards Distinct and Coherent Visual Encapsulation in Video MLLMs.

[BibT_eX]

[DOI]

CoRR, July, 2025

InternSpatial: A Comprehensive Dataset for Spatial Reasoning in Vision-Language Models.

[BibT_eX]

[DOI]

CoRR, June, 2025

VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos.

[BibT_eX]

[DOI]

CoRR, June, 2025

VideoChat-A1: Thinking with Long Videos by Chain-of-Shot Reasoning.

[BibT_eX]

[DOI]

CoRR, June, 2025

LaVie: High-Quality Video Generation with Cascaded Latent Diffusion Models.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., May, 2025

TraveLLaMA: Facilitating Multi-modal Large Language Models to Understand Urban Scenes and Provide Travel Assistance.

[BibT_eX]

[DOI]

CoRR, April, 2025

VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning.

[BibT_eX]

[DOI]

CoRR, April, 2025

MSFM-UNET: enhancing medical image segmentation with multi-scale and multi-view frequency fusion.

[BibT_eX]

[DOI]

Pattern Anal. Appl., March, 2025

Make Your Training Flexible: Towards Deployment-Efficient Video Models.

[BibT_eX]

[DOI]

CoRR, March, 2025

InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling.

[BibT_eX]

[DOI]

CoRR, January, 2025

Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models.

[BibT_eX]

[DOI]

CoRR, January, 2025

VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling.

[BibT_eX]

[DOI]

CoRR, January, 2025

Scale-View Co-Awareness Framework for Simultaneous Segmentation of Pancreas and Tumors.

[BibT_eX]

[DOI]

IEEE Trans. Instrum. Meas., 2025

TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling.

[BibT_eX]

[DOI]

CoRR, 2024

ViLLa: Video Reasoning Segmentation with Large Language Model.

[BibT_eX]

[DOI]

CoRR, 2024

OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text.

[BibT_eX]

[DOI]

CoRR, 2024

InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding.

[BibT_eX]

[DOI]

CoRR, 2024

From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities.

[BibT_eX]

[DOI]

CoRR, 2024

SyncVIS: Synchronized Video Instance Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Does Video-Text Pretraining Help Open-Vocabulary Online Action Detection?

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

OSFENet: Object Spatiotemporal Feature Enhanced Network for Surgical Phase Recognition.

[BibT_eX]

[DOI]

Proceedings of the Advanced Intelligent Computing Technology and Applications, 2024

InternVideo2: Scaling Foundation Models for Multimodal Video Understanding.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

VideoMamba: State Space Model for Efficient Video Understanding.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

MVBench: A Comprehensive Multi-modal Video Understanding Benchmark.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

Conditional Temporal Variational AutoEncoder for Action Video Prediction.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., October, 2023

Open World Entity Segmentation.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., July, 2023

MVBench: A Comprehensive Multi-modal Video Understanding Benchmark.

[BibT_eX]

[DOI]

CoRR, 2023

Harvest Video Foundation Models via Efficient Post-Pretraining.

[BibT_eX]

[DOI]

CoRR, 2023

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation.

[BibT_eX]

[DOI]

CoRR, 2023

VideoLLM: Modeling Video Sequence with Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

VideoChat: Chat-Centric Video Understanding.

[BibT_eX]

[DOI]

CoRR, 2023

InternGPT: Solving Vision-Centric Tasks by Interacting with Chatbots Beyond Language.

[BibT_eX]

[DOI]

CoRR, 2023

TMT-VIS: Taxonomy-aware Multi-dataset Joint Training for Video Instance Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

JourneyDB: A Benchmark for Generative Image Understanding.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Retrieving-to-Answer: Zero-Shot Video Question Answering with Frozen Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Scaling Data Generation in Vision-and-Language Navigation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

UniFormerV2: Unlocking the Potential of Image ViTs for Video Understanding.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Unmasked Teacher: Towards Training-Efficient Video Foundation Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

NeRFLiX: High-Quality Neural View Synthesis by Learning a Degradation-Driven Inter-viewpoint MiXer.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Learning Open-Vocabulary Semantic Segmentation Models From Natural Language Supervision.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Advancing Pancreas Segmentation through the Patch-Adjust Fusion Framework.

[BibT_eX]

[DOI]

Yong Zhang

Yi Wang

Bin Fang

Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine, 2023

2022

PointINS: Point-Based Instance Segmentation.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2022

InternVideo: General Video Foundation Models via Generative and Discriminative Learning.

[BibT_eX]

[DOI]

CoRR, 2022

CMC v2: Towards More Accurate COVID-19 Detection with Discriminative Video Priors.

[BibT_eX]

[DOI]

CoRR, 2022

UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer.

[BibT_eX]

[DOI]

CoRR, 2022

InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D Challenges.

[BibT_eX]

[DOI]

CoRR, 2022

PalGAN: Image Colorization with Palette Generative Adversarial Networks.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

CMC_v2: Towards More Accurate COVID-19 Detection with Discriminative Video Priors.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022 Workshops, 2022

Towards Implicit Text-Guided 3D Shape Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

MAT: Mask-Aware Transformer for Large Hole Image Inpainting.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021

Blood Vessel Segmentation Based on the 3D Residual U-Net.

[BibT_eX]

[DOI]

Int. J. Pattern Recognit. Artif. Intell., 2021

Image Synthesis via Semantic Composition.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Multi-Scale Aligned Distillation for Low-Resolution Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020

Automatic liver tumour segmentation in CT combining FCN and NMF-based deformable model.

[BibT_eX]

[DOI]

Comput. methods Biomech. Biomed. Eng. Imaging Vis., 2020

VCNet: A Robust Approach to Blind Image Inpainting.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

Attentive Normalization for Conditional Image Generation.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019

Liver Vessels Segmentation Based on 3d Residual U-NET.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE International Conference on Image Processing, 2019

Wide-Context Semantic Image Extrapolation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018

Wavelet-based extended morphological profile and deep autoencoder for hyperspectral image classification.

[BibT_eX]

[DOI]

Int. J. Wavelets Multiresolution Inf. Process., 2018

Scale-recurrent Network for Deep Image Deblurring.

[BibT_eX]

[DOI]

CoRR, 2018

Image Inpainting via Generative Multi-column Convolutional Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

2017

Automatic Liver Lesion Segmentation in CT Combining Fully Convolutional Networks and Non-negative Matrix Factorization.

[BibT_eX]

[DOI]

Proceedings of the Imaging for Patient-Customized Simulations and Systems for Point-of-Care Ultrasound, 2017

A novel variational method for liver segmentation based on statistical shape model prior and enforced local statistical feature.

[BibT_eX]

[DOI]

Proceedings of the 14th IEEE International Symposium on Biomedical Imaging, 2017

2016

Face recognition via collaborative representation based multiple one-dimensional embedding.

[BibT_eX]

[DOI]

Int. J. Wavelets Multiresolution Inf. Process., 2016

2014

An effective and robust method for modeling multi-furcation liver vessel by using Gap Border Pairing.

[BibT_eX]

[DOI]

Comput. Medical Imaging Graph., 2014

2013

Automatic Multi-Scale Segmentation of Intrahepatic Vessel in CT Images for Liver Surgery Planning.

[BibT_eX]

[DOI]

Patrick Shen-Pei Wang

Hongguang Wang

Int. J. Pattern Recognit. Artif. Intell., 2013

On an asymptotic analysis of polynomial approximation to halfband filters.

[BibT_eX]

[DOI]

Charles A. Micchelli

Jianzhong Wang

Yi Wang

Adv. Comput. Math., 2013

2012

Interconnection of wind farms with grid using a MTDC network.

[BibT_eX]

[DOI]

Proceedings of the 38th Annual Conference on IEEE Industrial Electronics Society, 2012

Yi Wang

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...