Jifeng Dai

Orcid: 0000-0002-6785-0785

According to our database¹, Jifeng Dai authored at least 175 papers between 2011 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Bibliography

2026

AnyScene: Towards Highly Controllable Driving Scene Generation at Anywhere and Beyond.

[BibT_eX]

[DOI]

CoRR, May, 2026

Driving Intents Amplify Planning-Oriented Reinforcement Learning.

[BibT_eX]

[DOI]

Hengtong Lu

Victor Shea-Jay Huang

CoRR, May, 2026

MindVLA-U1: VLA Beats VA with Unified Streaming Architecture for Autonomous Driving.

[BibT_eX]

[DOI]

Yuzhou Huang

Benjin Zhu

Hengtong Lu

Victor Shea-Jay Huang

CoRR, May, 2026

Action Emergence from Streaming Intent.

[BibT_eX]

[DOI]

Pengfei Jing

Victor Shea-Jay Huang

CoRR, May, 2026

MiroFlow: Towards High-Performance and Robust Open-Source Agent Framework for General Deep Research Tasks.

[BibT_eX]

[DOI]

CoRR, February, 2026

Grounding Actions in Camera Space: Observation-Centric Vision-Language-Action Policy.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025

Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., November, 2025

Spatial Frequency Modulation for Semantic Segmentation.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., November, 2025

A Survey of Reasoning with Foundation Models: Concepts, Methodologies, and Outlook.

[BibT_eX]

[DOI]

ACM Comput. Surv., November, 2025

MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling.

[BibT_eX]

[DOI]

CoRR, November, 2025

Vlaser: Vision-Language-Action Model with Synergistic Embodied Reasoning.

[BibT_eX]

[DOI]

CoRR, October, 2025

Sequential Diffusion Language Models.

[BibT_eX]

[DOI]

CoRR, September, 2025

GenExam: A Multidisciplinary Text-to-Image Exam.

[BibT_eX]

[DOI]

CoRR, September, 2025

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency.

[BibT_eX]

[DOI]

CoRR, August, 2025

MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents.

[BibT_eX]

[DOI]

CoRR, July, 2025

AnyCap Project: A Unified Framework, Dataset, and Benchmark for Controllable Omni-modal Captioning.

[BibT_eX]

[DOI]

CoRR, July, 2025

Mono-InternVL-1.5: Towards Cheaper and Faster Monolithic Multimodal Large Language Models.

[BibT_eX]

[DOI]

CoRR, July, 2025

InternSpatial: A Comprehensive Dataset for Spatial Reasoning in Vision-Language Models.

[BibT_eX]

[DOI]

CoRR, June, 2025

Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spaces.

[BibT_eX]

[DOI]

CoRR, June, 2025

ZeroGUI: Automating Online GUI Learning at Zero Human Cost.

[BibT_eX]

[DOI]

CoRR, May, 2025

Point or Line? Using Line-based Representation for Panoptic Symbol Spotting in CAD Drawings.

[BibT_eX]

[DOI]

CoRR, May, 2025

Learning Adaptive and Temporally Causal Video Tokenization in a 1D Latent Space.

[BibT_eX]

[DOI]

CoRR, May, 2025

EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, May, 2025

Demystify Transformers & Convolutions in Modern Image Deep Networks.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., April, 2025

VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models.

[BibT_eX]

[DOI]

CoRR, April, 2025

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models.

[BibT_eX]

[DOI]

CoRR, April, 2025

BEVFormer: Learning Bird's-Eye-View Representation From LiDAR-Camera via Spatiotemporal Transformers.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., March, 2025

GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing.

[BibT_eX]

[DOI]

CoRR, March, 2025

VisualPRM: An Effective Process Reward Model for Multimodal Reasoning.

[BibT_eX]

[DOI]

CoRR, March, 2025

DriveMLM: aligning multi-modal large language models with behavioral planning states for autonomous driving.

[BibT_eX]

[DOI]

Vis. Intell., 2025

NaViL: Rethinking Scaling Properties of Native Multimodal Large Language Models under Data Constraints.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

OWMM-Agent: Open World Mobile Manipulation With Multi-modal Agentic Data Synthesis.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

MuLan: Adapting Multilingual Diffusion Models for Hundreds of Languages with Negligible Cost.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

CoMemo: LVLMs Need Image Context with Image Memory.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Maintaining Structural Integrity in Parameter Spaces for Parameter Efficient Fine-tuning.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

LangBridge: Interpreting Image as a Combination of Language Embeddings.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

PUMA: Empowering Unified MLLM with Multi-Granular Visual Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

MI-DETR: An Object Detection Model with Multi-time Inquiries Mechanism.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Docopilot: Improving Multimodal Models for Document-Level Understanding.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024

FeatAug-DETR: Enriching One-to-Many Matching for DETRs With Feature Augmentation.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., September, 2024

Delving Into the Devils of Bird's-Eye-View Perception: A Review, Evaluation and Recipe.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., April, 2024

Mini-InternVL: a flexible-transfer pocket multi-modal model with 5% parameters and 90% performance.

[BibT_eX]

[DOI]

Vis. Intell., 2024

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling.

[BibT_eX]

[DOI]

CoRR, 2024

HoloDrive: Holistic 2D-3D Multi-Modal Street Scene Generation for Autonomous Driving.

[BibT_eX]

[DOI]

CoRR, 2024

Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization.

[BibT_eX]

[DOI]

CoRR, 2024

Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance.

[BibT_eX]

[DOI]

CoRR, 2024

Diffusion Transformer Policy.

[BibT_eX]

[DOI]

CoRR, 2024

PUMA: Empowering Unified MLLM with Multi-granular Visual Generation.

[BibT_eX]

[DOI]

CoRR, 2024

big.LITTLE Vision Transformer for Efficient Visual Recognition.

[BibT_eX]

[DOI]

CoRR, 2024

Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training.

[BibT_eX]

[DOI]

CoRR, 2024

MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output.

[BibT_eX]

[DOI]

CoRR, 2024

Hierarchical Memory for Long Video QA.

[BibT_eX]

[DOI]

CoRR, 2024

OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text.

[BibT_eX]

[DOI]

CoRR, 2024

Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams.

[BibT_eX]

[DOI]

CoRR, 2024

LLMs Meet Multimodal Generation and Editing: A Survey.

[BibT_eX]

[DOI]

CoRR, 2024

FLoRA: Low-Rank Core Space for N-dimension.

[BibT_eX]

[DOI]

CoRR, 2024

MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer.

[BibT_eX]

[DOI]

CoRR, 2024

Effect of a reduced arterial axial pre-stretch ratio during aging on the cardiac output and cerebral blood flow in the healthy elders.

[BibT_eX]

[DOI]

Comput. Methods Programs Biomed., 2024

MMInstruct: a high-quality multi-modal instruction tuning dataset with extensive diversity.

[BibT_eX]

[DOI]

Sci. China Inf. Sci., 2024

How far are we to GPT-4V? Closing the gap to commercial multimodal models with open-source suites.

[BibT_eX]

[DOI]

Sci. China Inf. Sci., 2024

Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGGRAPH 2024 Conference Papers, 2024

Parameter-Inverted Image Pyramid Networks.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Needle In A Multimodal Haystack.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Learning 1D Causal Visual Representation with De-focus Attention Networks.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

DI-MaskDINO: A Joint Object Detection and Instance Segmentation Model.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

CooHOI: Learning Cooperative Human-Object Interaction with Manipulated Object Dynamics.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Bounding Box Stability against Feature Dropout Reflects Detector Generalization across Environments.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

ADDP: Learning General Representations for Image Recognition and Generation with Alternating Denoising Diffusion Process.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

The All-Seeing Project V2: Towards General Relation Comprehension of the Open World.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

ControlLLM: Augment Language Models with Tools by Searching on Graphs.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Distilling Knowledge from Large-Scale Image Models for Object Detection.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Point2RBox: Combine Knowledge from Synthetic Visual Patterns for End-to-End Oriented Object Detection with Single Point Supervision.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Intern VL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks.

[BibT_eX]

[DOI]

CoRR, 2023

A Survey of Reasoning with Foundation Models.

[BibT_eX]

[DOI]

CoRR, 2023

DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving.

[BibT_eX]

[DOI]

CoRR, 2023

InstructSeq: Unifying Vision Tasks with Instruction-conditioned Multi-modal Sequence Generation.

[BibT_eX]

[DOI]

CoRR, 2023

Point2RBox: Combine Knowledge from Synthetic Visual Patterns for End-to-end Oriented Object Detection with Single Point Supervision.

[BibT_eX]

[DOI]

CoRR, 2023

ControlLLM: Augment Language Models with Tools by Searching on Graphs.

[BibT_eX]

[DOI]

CoRR, 2023

Mini-DALLE3: Interactive Text to Image by Prompting Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

FlowFormer: A Transformer Architecture and Its Masked Cost Volume Autoencoding for Optical Flow.

[BibT_eX]

[DOI]

CoRR, 2023

Denoising Diffusion Semantic Segmentation with Mask Prior Modeling.

[BibT_eX]

[DOI]

CoRR, 2023

Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory.

[BibT_eX]

[DOI]

CoRR, 2023

InternGPT: Solving Vision-Centric Tasks by Interacting with Chatbots Beyond Language.

[BibT_eX]

[DOI]

CoRR, 2023

VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

JourneyDB: A Benchmark for Generative Image Understanding.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Vision Transformer Adapter for Dense Predictions.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

VideoFlow: Exploiting Temporal Cues for Multi-frame Optical Flow Estimation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Learning Weather-General and Weather-Specific Features for Image Restoration Under Multiple Adverse Weather Conditions.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Video Dehazing via a Multi-Range Temporal Alignment Network with Physical Prior.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Siamese Image Modeling for Self-Supervised Vision Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

FlowFormer++: Masked Cost Volume Autoencoding for Pretraining Optical Flow Estimation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Planning-oriented Autonomous Driving.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Towards All-in-One Pre-Training via Maximizing Multi-Modal Mutual Information.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

Goal-oriented Autonomous Driving.

[BibT_eX]

[DOI]

CoRR, 2022

Demystify Transformers & Convolutions in Modern Image Deep Networks.

[BibT_eX]

[DOI]

CoRR, 2022

Delving into the Devils of Bird's-eye-view Perception: A Review, Evaluation and Recipe.

[BibT_eX]

[DOI]

CoRR, 2022

Tip-Adapter: Training-free Adaption of CLIP for Few-shot Classification.

[BibT_eX]

[DOI]

CoRR, 2022

Siamese Image Modeling for Self-Supervised Vision Representation Learning.

[BibT_eX]

[DOI]

CoRR, 2022

ConvMAE: Masked Convolution Meets Masked Autoencoders.

[BibT_eX]

[DOI]

CoRR, 2022

BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers.

[BibT_eX]

[DOI]

CoRR, 2022

Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

MCMAE: Masked Convolution Meets Masked Autoencoders.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Tip-Adapter: Training-Free Adaption of CLIP for Few-Shot Classification.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

Frozen CLIP Models are Efficient Video Learners.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

BEVFormer: Learning Bird's-Eye-View Representation from Multi-camera Images via Spatiotemporal Transformers.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

FlowFormer: A Transformer Architecture for Optical Flow.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Exploring the Equivalence of Siamese Self-Supervised Learning via A Unified Gradient Framework.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

AutoLoss-Zero: Searching Loss Functions from Scratch for Generic Tasks.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021

Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks.

[BibT_eX]

[DOI]

CoRR, 2021

VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition.

[BibT_eX]

[DOI]

CoRR, 2021

Tip-Adapter: Training-free CLIP-Adapter for Better Vision-Language Modeling.

[BibT_eX]

[DOI]

CoRR, 2021

Collaborative Visual Navigation.

[BibT_eX]

[DOI]

CoRR, 2021

Scalable Transformers for Neural Machine Translation.

[BibT_eX]

[DOI]

CoRR, 2021

Decoupled Spatial-Temporal Transformer for Video Inpainting.

[BibT_eX]

[DOI]

CoRR, 2021

Searching Parameterized AP Loss for Object Detection.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Deformable DETR: Deformable Transformers for End-to-End Object Detection.

[BibT_eX]

[DOI]

Proceedings of the 9th International Conference on Learning Representations, 2021

Auto Seg-Loss: Searching Metric Surrogates for Semantic Segmentation.

[BibT_eX]

[DOI]

Proceedings of the 9th International Conference on Learning Representations, 2021

Exploring Cross-Image Pixel Contrast for Semantic Segmentation.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Influence Selection for Active Learning.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Fast Convergence of DETR with Spatially Modulated Co-Attention.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Unsupervised Object Detection With LIDAR Clues.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020

1st Place Solution of LVIS Challenge 2020: A Good Box is not a Guarantee of a Good Mask.

[BibT_eX]

[DOI]

CoRR, 2020

VL-BERT: Pre-training of Generic Visual-Linguistic Representations.

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Learning Representations, 2020

Deformable Kernels: Adapting Effective Receptive Fields for Object Deformation.

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Learning Representations, 2020

Mining Cross-Image Semantics for Weakly Supervised Semantic Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

Resolution Adaptive Networks for Efficient Inference.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Hierarchical Human Parsing With Typed Part-Relation Reasoning.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019

MMDetection: Open MMLab Detection Toolbox and Benchmark.

[BibT_eX]

[DOI]

CoRR, 2019

An Empirical Study of Spatial Attention Mechanisms in Deep Networks.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Deformable ConvNets V2: More Deformable, Better Results.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018

Integrated Object Detection and Tracking with Tracklet-Conditioned Detection.

[BibT_eX]

[DOI]

CoRR, 2018

Towards High Performance Video Object Detection for Mobiles.

[BibT_eX]

[DOI]

CoRR, 2018

Learning Region Features for Object Detection.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2018, 2018

Towards High Performance Video Object Detection.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Relation Networks for Object Detection.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

2017

Flow-Guided Feature Aggregation for Video Object Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Computer Vision, 2017

Deformable Convolutional Networks.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Computer Vision, 2017

Deep Feature Flow for Video Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

Fully Convolutional Instance-Aware Semantic Segmentation.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

2016

R-FCN: Object Detection via Region-based Fully Convolutional Networks.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Instance-Sensitive Fully Convolutional Networks.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2016, 2016

ScribbleSup: Scribble-Supervised Convolutional Networks for Semantic Segmentation.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

Instance-Aware Semantic Segmentation via Multi-task Network Cascades.

[BibT_eX]

[DOI]

Jifeng Dai

Kaiming He

Jian Sun

Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

2015

Generative Modeling of Convolutional Neural Networks.

[BibT_eX]

[DOI]

Jifeng Dai

Ying Nian Wu

Proceedings of the 3rd International Conference on Learning Representations, 2015

BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation.

[BibT_eX]

[DOI]

Jifeng Dai

Kaiming He

Jian Sun

Proceedings of the 2015 IEEE International Conference on Computer Vision, 2015

Convolutional feature masking for joint object and stuff segmentation.

[BibT_eX]

[DOI]

Jifeng Dai

Kaiming He

Jian Sun

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015

2014

Unsupervised Learning of Dictionaries of Hierarchical Compositional Models.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014

2013

Cosegmentation and Cosketch by Unsupervised Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Computer Vision, 2013

2012

Robust and Efficient Ridge-Based Palmprint Matching.

[BibT_eX]

[DOI]

Jifeng Dai

Jianjiang Feng

Jie Zhou

IEEE Trans. Pattern Anal. Mach. Intell., 2012

Mining sub-categories for object detection.

[BibT_eX]

[DOI]

Jifeng Dai

Jianjiang Feng

Jie Zhou

Proceedings of the 21st International Conference on Pattern Recognition, 2012

2011

Multifeature-Based High-Resolution Palmprint Recognition.

[BibT_eX]

[DOI]

Jifeng Dai

Jie Zhou

IEEE Trans. Pattern Anal. Mach. Intell., 2011

Jifeng Dai

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...