Jifeng Dai
Orcid: 0000-0002-6785-0785
According to our database1,
Jifeng Dai
authored at least 162 papers
between 2011 and 2025.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on orcid.org
On csauthors.net:
Bibliography
2025
ACM Comput. Surv., November, 2025
Grounding Actions in Camera Space: Observation-Centric Vision-Language-Action Policy.
CoRR, August, 2025
CoRR, July, 2025
AnyCap Project: A Unified Framework, Dataset, and Benchmark for Controllable Omni-modal Captioning.
CoRR, July, 2025
Mono-InternVL-1.5: Towards Cheaper and Faster Monolithic Multimodal Large Language Models.
CoRR, July, 2025
InternSpatial: A Comprehensive Dataset for Spatial Reasoning in Vision-Language Models.
CoRR, June, 2025
CoRR, June, 2025
Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spaces.
CoRR, June, 2025
Point or Line? Using Line-based Representation for Panoptic Symbol Spotting in CAD Drawings.
CoRR, May, 2025
CoRR, May, 2025
EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement Learning.
CoRR, May, 2025
IEEE Trans. Pattern Anal. Mach. Intell., April, 2025
VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models.
CoRR, April, 2025
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models.
CoRR, April, 2025
BEVFormer: Learning Bird's-Eye-View Representation From LiDAR-Camera via Spatiotemporal Transformers.
IEEE Trans. Pattern Anal. Mach. Intell., March, 2025
CoRR, March, 2025
CoRR, March, 2025
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing.
CoRR, March, 2025
CoRR, March, 2025
Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding.
CoRR, January, 2025
Maintaining Structural Integrity in Parameter Spaces for Parameter Efficient Fine-tuning.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025
HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025
Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025
SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025
2024
IEEE Trans. Pattern Anal. Mach. Intell., September, 2024
Delving Into the Devils of Bird's-Eye-View Perception: A Review, Evaluation and Recipe.
IEEE Trans. Pattern Anal. Mach. Intell., April, 2024
Mini-InternVL: a flexible-transfer pocket multi-modal model with 5% parameters and 90% performance.
Vis. Intell., 2024
V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding.
CoRR, 2024
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling.
CoRR, 2024
HoloDrive: Holistic 2D-3D Multi-Modal Street Scene Generation for Autonomous Driving.
CoRR, 2024
MuLan: Adapting Multilingual Diffusion Models for Hundreds of Languages with Negligible Cost.
CoRR, 2024
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization.
CoRR, 2024
Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance.
CoRR, 2024
Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training.
CoRR, 2024
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models.
CoRR, 2024
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output.
CoRR, 2024
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text.
CoRR, 2024
CoRR, 2024
MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer.
CoRR, 2024
Effect of a reduced arterial axial pre-stretch ratio during aging on the cardiac output and cerebral blood flow in the healthy elders.
Comput. Methods Programs Biomed., 2024
MMInstruct: a high-quality multi-modal instruction tuning dataset with extensive diversity.
Sci. China Inf. Sci., 2024
How far are we to GPT-4V? Closing the gap to commercial multimodal models with open-source suites.
Sci. China Inf. Sci., 2024
Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling.
Proceedings of the ACM SIGGRAPH 2024 Conference Papers, 2024
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
CooHOI: Learning Cooperative Human-Object Interaction with Manipulated Object Dynamics.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
Proceedings of the Forty-first International Conference on Machine Learning, 2024
Bounding Box Stability against Feature Dropout Reflects Detector Generalization across Environments.
Proceedings of the Twelfth International Conference on Learning Representations, 2024
The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World.
Proceedings of the Twelfth International Conference on Learning Representations, 2024
ADDP: Learning General Representations for Image Recognition and Generation with Alternating Denoising Diffusion Process.
Proceedings of the Twelfth International Conference on Learning Representations, 2024
Proceedings of the Computer Vision - ECCV 2024, 2024
Proceedings of the Computer Vision - ECCV 2024, 2024
Proceedings of the Computer Vision - ECCV 2024, 2024
Point2RBox: Combine Knowledge from Synthetic Visual Patterns for End-to-End Oriented Object Detection with Single Point Supervision.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
Intern VL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
2023
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks.
CoRR, 2023
DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving.
CoRR, 2023
InstructSeq: Unifying Vision Tasks with Instruction-conditioned Multi-modal Sequence Generation.
CoRR, 2023
Point2RBox: Combine Knowledge from Synthetic Visual Patterns for End-to-end Oriented Object Detection with Single Point Supervision.
CoRR, 2023
CoRR, 2023
FlowFormer: A Transformer Architecture and Its Masked Cost Volume Autoencoding for Optical Flow.
CoRR, 2023
Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory.
CoRR, 2023
InternGPT: Solving Vision-Centric Tasks by Interacting with Chatbots Beyond Language.
CoRR, 2023
VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023
Proceedings of the Eleventh International Conference on Learning Representations, 2023
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023
Learning Weather-General and Weather-Specific Features for Image Restoration Under Multiple Adverse Weather Conditions.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
FlowFormer++: Masked Cost Volume Autoencoding for Pretraining Optical Flow Estimation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
2022
Delving into the Devils of Bird's-eye-view Perception: A Review, Evaluation and Recipe.
CoRR, 2022
CoRR, 2022
BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers.
CoRR, 2022
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022
Proceedings of the Computer Vision - ECCV 2022, 2022
VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition.
Proceedings of the Computer Vision - ECCV 2022, 2022
Proceedings of the Computer Vision - ECCV 2022, 2022
BEVFormer: Learning Bird's-Eye-View Representation from Multi-camera Images via Spatiotemporal Transformers.
Proceedings of the Computer Vision - ECCV 2022, 2022
Proceedings of the Computer Vision - ECCV 2022, 2022
Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022
Exploring the Equivalence of Siamese Self-Supervised Learning via A Unified Gradient Framework.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022
2021
Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks.
CoRR, 2021
VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition.
CoRR, 2021
CoRR, 2021
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021
Proceedings of the 9th International Conference on Learning Representations, 2021
Proceedings of the 9th International Conference on Learning Representations, 2021
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021
2020
1st Place Solution of LVIS Challenge 2020: A Good Box is not a Guarantee of a Good Mask.
CoRR, 2020
Proceedings of the 8th International Conference on Learning Representations, 2020
Proceedings of the 8th International Conference on Learning Representations, 2020
Proceedings of the Computer Vision - ECCV 2020, 2020
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020
2019
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019
2018
CoRR, 2018
Proceedings of the Computer Vision - ECCV 2018, 2018
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018
2017
Proceedings of the IEEE International Conference on Computer Vision, 2017
Proceedings of the IEEE International Conference on Computer Vision, 2017
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017
2016
Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016
Proceedings of the Computer Vision - ECCV 2016, 2016
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016
2015
Proceedings of the 3rd International Conference on Learning Representations, 2015
BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation.
Proceedings of the 2015 IEEE International Conference on Computer Vision, 2015
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015
2014
Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014
2013
Proceedings of the IEEE International Conference on Computer Vision, 2013
2012
IEEE Trans. Pattern Anal. Mach. Intell., 2012
Proceedings of the 21st International Conference on Pattern Recognition, 2012
2011
IEEE Trans. Pattern Anal. Mach. Intell., 2011