Xizhou Zhu

According to our database1, Xizhou Zhu authored at least 52 papers between 2016 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
The All-Seeing Project V2: Towards General Relation Comprehension of the Open World.
CoRR, 2024

MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer.
CoRR, 2024

Multi-scale 2D Temporal Map Diffusion Models for Natural Language Video Localization.
CoRR, 2024

Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications.
CoRR, 2024

2023
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks.
CoRR, 2023

DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving.
CoRR, 2023

Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft.
CoRR, 2023

ControlLLM: Augment Language Models with Tools by Searching on Graphs.
CoRR, 2023

Mini-DALLE3: Interactive Text to Image by Prompting Large Language Models.
CoRR, 2023

The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World.
CoRR, 2023

ADDP: Learning General Representations for Image Recognition and Generation with Alternating Denoising Diffusion Process.
CoRR, 2023

Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory.
CoRR, 2023

VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks.
CoRR, 2023

InternGPT: Solving Vision-Centric Tasks by Interacting with Chatbots Beyond Language.
CoRR, 2023

VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Siamese Image Modeling for Self-Supervised Vision Representation Learning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Planning-oriented Autonomous Driving.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Towards All-in-One Pre-Training via Maximizing Multi-Modal Mutual Information.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
Goal-oriented Autonomous Driving.
CoRR, 2022

Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information.
CoRR, 2022

Demystify Transformers & Convolutions in Modern Image Deep Networks.
CoRR, 2022

Delving into the Devils of Bird's-eye-view Perception: A Review, Evaluation and Recipe.
CoRR, 2022

Siamese Image Modeling for Self-Supervised Vision Representation Learning.
CoRR, 2022

DeciWatch: A Simple Baseline for 10x Efficient 2D and 3D Pose Estimation.
CoRR, 2022

Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

DeciWatch: A Simple Baseline for 10˟ Efficient 2D and 3D Pose Estimation.
Proceedings of the Computer Vision - ECCV 2022, 2022

VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition.
Proceedings of the Computer Vision - ECCV 2022, 2022

Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Exploring the Equivalence of Siamese Self-Supervised Learning via A Unified Gradient Framework.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

AutoLoss-Zero: Searching Loss Functions from Scratch for Generic Tasks.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021
Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks.
CoRR, 2021

VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition.
CoRR, 2021

Collaborative Visual Navigation.
CoRR, 2021

Searching Parameterized AP Loss for Object Detection.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Deformable DETR: Deformable Transformers for End-to-End Object Detection.
Proceedings of the 9th International Conference on Learning Representations, 2021

Auto Seg-Loss: Searching Metric Surrogates for Semantic Segmentation.
Proceedings of the 9th International Conference on Learning Representations, 2021

Unsupervised Object Detection With LIDAR Clues.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020
Auto Seg-Loss: Searching Metric Surrogates for Semantic Segmentation.
CoRR, 2020

VL-BERT: Pre-training of Generic Visual-Linguistic Representations.
Proceedings of the 8th International Conference on Learning Representations, 2020

Deformable Kernels: Adapting Effective Receptive Fields for Object Deformation.
Proceedings of the 8th International Conference on Learning Representations, 2020

Spatially Adaptive Inference with Stochastic Feature Sampling and Interpolation.
Proceedings of the Computer Vision - ECCV 2020, 2020

2019
An Empirical Study of Spatial Attention Mechanisms in Deep Networks.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Deformable ConvNets V2: More Deformable, Better Results.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018
Integrated Object Detection and Tracking with Tracklet-Conditioned Detection.
CoRR, 2018

Towards High Performance Video Object Detection for Mobiles.
CoRR, 2018

Towards High Performance Video Object Detection.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

2017
Flow-Guided Feature Aggregation for Video Object Detection.
Proceedings of the IEEE International Conference on Computer Vision, 2017

Deep Feature Flow for Video Recognition.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

2016
An Uncertainty-Aware Approach for Exploratory Microblog Retrieval.
IEEE Trans. Vis. Comput. Graph., 2016


  Loading...