Renrui Zhang

Orcid: 0000-0003-4503-5277

According to our database1, Renrui Zhang authored at least 76 papers between 2019 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
CLIP-Adapter: Better Vision-Language Models with Feature Adapters.
Int. J. Comput. Vis., February, 2024

MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
CoRR, 2024

OneTracker: Unifying Visual Object Tracking with Foundation Models and Efficient Tuning.
CoRR, 2024

SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models.
CoRR, 2024

FM-OV3D: Foundation Model-Based Cross-Modal Knowledge Blending for Open-Vocabulary 3D Detection.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

Parsing All Adverse Scenes: Severity-Aware Semantic Segmentation with Mask-Enhanced Cross-Domain Consistency.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
ManipLLM: Embodied Multimodal Large Language Model for Object-Centric Robotic Manipulation.
CoRR, 2023

Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation.
CoRR, 2023

A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise.
CoRR, 2023

Language-Assisted 3D Scene Understanding.
CoRR, 2023

Gradient-based Parameter Selection for Efficient Fine-Tuning.
CoRR, 2023

3DAxiesPrompts: Unleashing the 3D Spatial Task Capabilities of GPT-4V.
CoRR, 2023

ChatIllusion: Efficient-Aligning Interleaved Generation ability with Visual Instruction Model.
CoRR, 2023

SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models.
CoRR, 2023

Improving Compositional Text-to-image Generation with Large Vision-Language Models.
CoRR, 2023

MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning.
CoRR, 2023

NOC: High-Quality Neural Object Cloning with 3D Lifting of Segment Anything.
CoRR, 2023

RenderOcc: Vision-Centric 3D Occupancy Prediction with 2D Rendering Supervision.
CoRR, 2023

ImageBind-LLM: Multi-modality Instruction Tuning.
CoRR, 2023

Point-Bind & Point-LLM: Aligning Point Cloud with Multi-modality for 3D Understanding, Generation, and Instruction Following.
CoRR, 2023

Less is More: Towards Efficient Few-shot 3D Semantic Segmentation via Training-free Networks.
CoRR, 2023

JourneyDB: A Benchmark for Generative Image Understanding.
CoRR, 2023

Retrieving-to-Answer: Zero-Shot Video Question Answering with Frozen Large Language Models.
CoRR, 2023

Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation.
CoRR, 2023

Personalize Segment Anything Model with One Shot.
CoRR, 2023

LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model.
CoRR, 2023

ViewRefer: Grasp the Multi-view Knowledge for 3D Visual Grounding with GPT and Prototype Guidance.
CoRR, 2023

LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention.
CoRR, 2023

Parameter is Not All You Need: Starting from Non-Parametric Networks for 3D Point Cloud Analysis.
CoRR, 2023

Mimic before Reconstruct: Enhancing Masked Autoencoders with Feature Mimicking.
CoRR, 2023

Prompt, Generate, then Cache: Cascade of Foundation Models makes Strong Few-shot Learners.
CoRR, 2023

Nearest Neighbors Meet Deep Neural Networks for Point Cloud Analysis.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023

DS-Point: A Dual-Scale 3D Framework for Point Cloud Understanding.
Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics, 2023

JourneyDB: A Benchmark for Generative Image Understanding.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Revisiting Event-Based Video Frame Interpolation.
IROS, 2023

Joint-MAE: 2D-3D Joint Masked Autoencoders for 3D Point Cloud Pre-training.
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

Retrieving-to-Answer: Zero-Shot Video Question Answering with Frozen Large Language Models.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Not All Features Matter: Enhancing Few-shot CLIP with Adaptive Prior Refinement.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

PointCLIP V2: Prompting CLIP and GPT for Powerful 3D Open-world Learning.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

SparseMAE: Sparse Training Meets Masked Autoencoders.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

MonoDETR: Depth-guided Transformer for Monocular 3D Object Detection.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Starting from Non-Parametric Networks for 3D Point Cloud Analysis.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Learning 3D Representations from 2D Pre-Trained Models via Image-to-Point Masked Autoencoders.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Prompt, Generate, Then Cache: Cascade of Foundation Models Makes Strong Few-Shot Learners.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

PiMAE: Point Cloud and Image Interactive Masked Autoencoders for 3D Object Detection.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

iQuery: Instruments as Queries for Audio-Visual Sound Separation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Dynamic Embedding Size Search with Minimum Regret for Streaming Recommender System.
Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, 2023

CALIP: Zero-Shot Enhancement of CLIP with Parameter-Free Attention.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

Decorate the Newcomers: Visual Domain Prompt for Continual Test Time Adaptation.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
TiG-BEV: Multi-view BEV 3D Object Detection via Target Inner-Geometry Learning.
CoRR, 2022

Learning 3D Representations from 2D Pre-trained Models via Image-to-Point Masked Autoencoders.
CoRR, 2022

PointCLIP V2: Adapting CLIP for Powerful 3D Open-world Learning.
CoRR, 2022

EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual and Language Learning.
CoRR, 2022

Collaboration of Pre-trained Models Makes Better Few-shot Learner.
CoRR, 2022

Tip-Adapter: Training-free Adaption of CLIP for Few-shot Classification.
CoRR, 2022

Can Language Understand Depth?
CoRR, 2022

POS-BERT: Point Cloud One-Stage BERT Pre-Training.
CoRR, 2022

MonoDETR: Depth-aware Transformer for Monocular 3D Object Detection.
CoRR, 2022

Distillation with Contrast is All You Need for Self-Supervised Point Cloud Representation Learning.
CoRR, 2022

Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud Pre-training.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Can Language Understand Depth?
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Tip-Adapter: Training-Free Adaption of CLIP for Few-Shot Classification.
Proceedings of the Computer Vision - ECCV 2022, 2022

Frozen CLIP Models are Efficient Video Learners.
Proceedings of the Computer Vision - ECCV 2022, 2022

Exploring Resolution and Degradation Clues as Self-supervised Signal for Low Quality Object Detection.
Proceedings of the Computer Vision - ECCV 2022, 2022

PointCLIP: Point Cloud Understanding by CLIP.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021
VT-CLIP: Enhancing Vision-Language Models with Visual-guided Texts.
CoRR, 2021

DSPoint: Dual-scale Point Cloud Recognition with High-frequency Fusion.
CoRR, 2021

Tip-Adapter: Training-free CLIP-Adapter for Better Vision-Language Modeling.
CoRR, 2021

CLIP-Adapter: Better Vision-Language Models with Feature Adapters.
CoRR, 2021

Dual-stream Network for Visual Recognition.
CoRR, 2021

Differential Privacy Protection and Game Analysis of Intelligent Transportation Data.
Proceedings of the 12th International Symposium on Parallel Architectures, 2021

Dual-stream Network for Visual Recognition.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

End-to-End Object Detection with Adaptive Clustering Transformer.
Proceedings of the 32nd British Machine Vision Conference 2021, 2021

2019
A variational image segmentation method exploring both intensity means and texture patterns.
Signal Process. Image Commun., 2019


  Loading...