Renrui Zhang

Orcid: 0009-0009-5414-7087

According to our database¹, Renrui Zhang authored at least 174 papers between 2019 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

VGGT-Edit: Feed-forward Native 3D Scene Editing with Residual Field Prediction.

[BibT_eX]

[DOI]

CoRR, May, 2026

Uni-Synergy: Bridging Understanding and Generation for Personalized Reasoning via Co-operative Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, May, 2026

LaST-R1: Reinforcing Action via Adaptive Physical Latent Reasoning for VLA Models.

[BibT_eX]

[DOI]

CoRR, April, 2026

VideoZeroBench: Probing the Limits of Video MLLMs with Spatio-Temporal Evidence Verification.

[BibT_eX]

[DOI]

CoRR, April, 2026

PEARL: Personalized Streaming Video Understanding Model.

[BibT_eX]

[DOI]

CoRR, March, 2026

MME-CoF-Pro: Evaluating Reasoning Coherence in Video Generative Models with Text and Visual Hints.

[BibT_eX]

[DOI]

CoRR, March, 2026

SafeSci: Safety Evaluation of Large Language Models in Science Domains and Beyond.

[BibT_eX]

[DOI]

CoRR, March, 2026

R-Diverse: Mitigating Diversity Illusion in Self-Play LLM Training.

[BibT_eX]

[DOI]

CoRR, February, 2026

GENIUS: Generative Fluid Intelligence Evaluation Suite.

[BibT_eX]

[DOI]

CoRR, February, 2026

Mind-Brush: Integrating Agentic Cognitive Search and Reasoning into Image Generation.

[BibT_eX]

[DOI]

CoRR, February, 2026

Visual Generation Unlocks Human-Like Reasoning through Multimodal World Models.

[BibT_eX]

[DOI]

CoRR, January, 2026

Automated Safety Benchmarking: A Multi-agent Pipeline for LVLMs.

[BibT_eX]

[DOI]

CoRR, January, 2026

LaST<sub>0</sub>: Latent Spatio-Temporal Chain-of-Thought for Robotic Vision-Language-Action Model.

[BibT_eX]

[DOI]

CoRR, January, 2026

Spatially-enhanced Spiking neural network for efficient point cloud analysis.

[BibT_eX]

[DOI]

Neural Networks, 2026

PiSA: A Self-Augmented Data Engine and Training Strategy for 3D Understanding with Large Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2026

TIDE: Temporal-Aware Sparse Autoencoders for Interpretable Diffusion Transformers in Image Generation.

[BibT_eX]

[DOI]

Victor Shea-Jay Huang

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

NL2CA: Auto-formalizing Cognitive Decision-Making from Natural Language Using an Unsupervised CriticNL2LTL Framework.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025

DraCo: Draft as CoT for Text-to-Image Preview and Rare Concept Generation.

[BibT_eX]

[DOI]

CoRR, December, 2025

ManualVLA: A Unified VLA Model for Chain-of-Thought Manual Generation and Robotic Manipulation.

[BibT_eX]

[DOI]

CoRR, December, 2025

Multi-Crit: Benchmarking Multimodal Judges on Pluralistic Criteria-Following.

[BibT_eX]

[DOI]

CoRR, November, 2025

VCU-Bridge: Hierarchical Visual Connotation Understanding via Semantic Bridging.

[BibT_eX]

[DOI]

CoRR, November, 2025

Thinking-while-Generating: Interleaving Textual Reasoning throughout Visual Generation.

[BibT_eX]

[DOI]

CoRR, November, 2025

LLM-Driven Cognitive Modeling for Personalized Travel Generation.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Soc. Syst., October, 2025

Are Video Models Ready as Zero-Shot Reasoners? An Empirical Study with the MME-CoF Benchmark.

[BibT_eX]

[DOI]

CoRR, October, 2025

Generative Universal Verifier as Multimodal Meta-Reasoner.

[BibT_eX]

[DOI]

CoRR, October, 2025

BEAR: Benchmarking and Enhancing Multimodal Language Models for Atomic Embodied Capabilities.

[BibT_eX]

[DOI]

CoRR, October, 2025

Can World Models Benefit VLMs for World Dynamics?

[BibT_eX]

[DOI]

CoRR, October, 2025

MLA: A Multisensory Language-Action Model for Multimodal Understanding and Forecasting in Robotic Manipulation.

[BibT_eX]

[DOI]

CoRR, September, 2025

GLEAM: Learning to Match and Explain in Cross-View Geo-Localization.

[BibT_eX]

[DOI]

CoRR, September, 2025

Quantization Meets dLLMs: A Systematic Study of Post-training Quantization for Diffusion LLMs.

[BibT_eX]

[DOI]

CoRR, August, 2025

MME-Emotion: A Holistic Evaluation Benchmark for Emotional Intelligence in Multimodal Large Language Models.

[BibT_eX]

[DOI]

CoRR, August, 2025

Lumina-mGPT 2.0: Stand-Alone AutoRegressive Image Modeling.

[BibT_eX]

[DOI]

Victor Shea-Jay Huang

CoRR, July, 2025

AC-DiT: Adaptive Coordination Diffusion Transformer for Mobile Manipulation.

[BibT_eX]

[DOI]

CoRR, July, 2025

MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning.

[BibT_eX]

[DOI]

CoRR, June, 2025

Fast-in-Slow: A Dual-System Foundation Model Unifying Fast Manipulation within Slow Reasoning.

[BibT_eX]

[DOI]

CoRR, June, 2025

MME-Reasoning: A Comprehensive Benchmark for Logical Reasoning in MLLMs.

[BibT_eX]

[DOI]

CoRR, May, 2025

Adaptive Classifier-Free Guidance via Dynamic Low-Confidence Masking.

[BibT_eX]

[DOI]

CoRR, May, 2025

T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT.

[BibT_eX]

[DOI]

CoRR, May, 2025

TrustGeoGen: Scalable and Formal-Verified Data Engine for Trustworthy Multi-modal Geometric Problem Solving.

[BibT_eX]

[DOI]

CoRR, April, 2025

Concept-as-Tree: Synthetic Data is All You Need for VLM Personalization.

[BibT_eX]

[DOI]

CoRR, March, 2025

HybridVLA: Collaborative Diffusion and Autoregression in a Unified Vision-Language-Action Model.

[BibT_eX]

[DOI]

CoRR, March, 2025

Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step.

[BibT_eX]

[DOI]

CoRR, January, 2025

IMAGINE-E: Image Generation Intelligence Evaluation of State-of-the-art Text-to-Image Models.

[BibT_eX]

[DOI]

CoRR, January, 2025

Language-Assisted 3D Scene Understanding.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2025

LLaVA-OneVision: Easy Visual Task Transfer.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2025

3DAxisPrompt: Promoting the 3D grounding and reasoning in GPT-4o.

[BibT_eX]

[DOI]

Neurocomputing, 2025

Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

Perceive Anything: Recognize, Explain, Caption, and Segment Anything in Images and Videos.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

Fast-in-Slow: A Dual-System VLA Model Unifying Fast Manipulation within Slow Reasoning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

UniCTokens: Boosting Personalized Understanding and Generation via Unified Concept Tokens.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

CoPESD: A Multi-Level Surgical Motion Dataset for Training Large Vision-Language Models to Co-Pilot Endoscopic Submucosal Dissection.

[BibT_eX]

[DOI]

Proceedings of the 33rd ACM International Conference on Multimedia, 2025

MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

MAVIS: Mathematical Visual Instruction Tuning with an Automatic Data Engine.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

LLaVA-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

MMSearch: Unveiling the Potential of Large Models as Multi-modal Search Engines.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Lumina-T2X: Scalable Flow-based Large Diffusion Transformer for Flexible Resolution Generation.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

TAR3D: Creating High-Quality 3D Assets Via Next-Part Prediction.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Detect Anything 3D in the Wild.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Beyond Text-Visual Attention: Exploiting Visual Cues for Effective Token Pruning in VLMs.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Chimera: Improving Generalist Model with Domain-Specific Experts.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Let's Verify and Reinforce Image Generation Step by Step.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Lift3D Policy: Lifting 2D Foundation Models for Robust 3D Robotic Manipulation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

SciVerse: Unveiling the Knowledge Comprehension and Visual Reasoning of LMMs on Multi-modal Scientific Problems.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2025

LiDAR-LLM: Exploring the Potential of Large Language Models for 3D LiDAR Understanding.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

MM-Mixing: Multi-Modal Mixing Alignment for 3D Understanding.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024

Mimic before Reconstruct: Enhancing Masked Autoencoders with Feature Mimicking.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., May, 2024

CLIP-Adapter: Better Vision-Language Models with Feature Adapters.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., February, 2024

Chimera: Improving Generalist Model with Domain-Specific Experts.

[BibT_eX]

[DOI]

CoRR, 2024

Lift3D Foundation Policy: Lifting 2D Large-Scale Pretrained Models for Robust 3D Robotic Manipulation.

[BibT_eX]

[DOI]

CoRR, 2024

Point Cloud Understanding via Attention-Driven Contrastive Learning.

[BibT_eX]

[DOI]

CoRR, 2024

Training-free Regional Prompting for Diffusion Transformers.

[BibT_eX]

[DOI]

CoRR, 2024

PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions.

[BibT_eX]

[DOI]

CoRR, 2024

MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines.

[BibT_eX]

[DOI]

CoRR, 2024

SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners.

[BibT_eX]

[DOI]

CoRR, 2024

LLaVA-OneVision: Easy Visual Task Transfer.

[BibT_eX]

[DOI]

CoRR, 2024

MAVIS: Mathematical Visual Instruction Tuning.

[BibT_eX]

[DOI]

CoRR, 2024

LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models.

[BibT_eX]

[DOI]

CoRR, 2024

RoboMamba: Multimodal State Space Model for Efficient Robot Reasoning and Manipulation.

[BibT_eX]

[DOI]

CoRR, 2024

Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis.

[BibT_eX]

[DOI]

CoRR, 2024

TripletMix: Triplet Data Augmentation for 3D Understanding.

[BibT_eX]

[DOI]

CoRR, 2024

Self-Corrected Multimodal Large Language Model for End-to-End Robot Manipulation.

[BibT_eX]

[DOI]

CoRR, 2024

TerDiT: Ternary Diffusion Models with Transformers.

[BibT_eX]

[DOI]

CoRR, 2024

Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers.

[BibT_eX]

[DOI]

CoRR, 2024

MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?

[BibT_eX]

[DOI]

CoRR, 2024

SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

RoboMamba: Efficient Vision-Language-Action Model for Robotic Reasoning and Manipulation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Knowledge Refinement: An Interpretable Analytics for Travel Behaviors Based on Knowledge Automation.

[BibT_eX]

[DOI]

Peijun Ye

Renrui Zhang

Shichao Ge

Proceedings of the 27th IEEE International Conference on Intelligent Transportation Systems, 2024

RenderOcc: Vision-Centric 3D Occupancy Prediction with 2D Rendering Supervision.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Robotics and Automation, 2024

MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

LLaMA-Adapter: Efficient Fine-tuning of Large Language Models with Zero-initialized Attention.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Personalize Segment Anything Model with One Shot.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

ViDA: Homeostatic Visual Domain Adapter for Continual Test Time Adaptation.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Unleashing the Potentials of Likelihood Composition for Multi-modal Language Models.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

MATHVERSE: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

PanoVOS: Bridging Non-panoramic and Panoramic Views with Transformer for Video Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

SPHINX: A Mixer of Weights, Visual Embeddings and Image Scales for Multi-modal Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

No Time to Train: Empowering Non-Parametric Networks for Few-Shot 3D Scene Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Gradient-based Parameter Selection for Efficient Fine-Tuning.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

NTO3D: Neural Target Object 3D Reconstruction with Segment Anything.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Cloud-Device Collaborative Learning for Multimodal Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Continual-MAE: Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

OneTracker: Unifying Visual Object Tracking with Foundation Models and Efficient Tuning.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

ManipLLM: Embodied Multimodal Large Language Model for Object-Centric Robotic Manipulation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

FM-OV3D: Foundation Model-Based Cross-Modal Knowledge Blending for Open-Vocabulary 3D Detection.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

Parsing All Adverse Scenes: Severity-Aware Semantic Segmentation with Mask-Enhanced Cross-Domain Consistency.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation.

[BibT_eX]

[DOI]

CoRR, 2023

A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise.

[BibT_eX]

[DOI]

CoRR, 2023

Language-Assisted 3D Scene Understanding.

[BibT_eX]

[DOI]

CoRR, 2023

3DAxiesPrompts: Unleashing the 3D Spatial Task Capabilities of GPT-4V.

[BibT_eX]

[DOI]

CoRR, 2023

ChatIllusion: Efficient-Aligning Interleaved Generation ability with Visual Instruction Model.

[BibT_eX]

[DOI]

CoRR, 2023

SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

Improving Compositional Text-to-image Generation with Large Vision-Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

NOC: High-Quality Neural Object Cloning with 3D Lifting of Segment Anything.

[BibT_eX]

[DOI]

CoRR, 2023

RenderOcc: Vision-Centric 3D Occupancy Prediction with 2D Rendering Supervision.

[BibT_eX]

[DOI]

CoRR, 2023

ImageBind-LLM: Multi-modality Instruction Tuning.

[BibT_eX]

[DOI]

CoRR, 2023

Point-Bind & Point-LLM: Aligning Point Cloud with Multi-modality for 3D Understanding, Generation, and Instruction Following.

[BibT_eX]

[DOI]

CoRR, 2023

Less is More: Towards Efficient Few-shot 3D Semantic Segmentation via Training-free Networks.

[BibT_eX]

[DOI]

CoRR, 2023

Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation.

[BibT_eX]

[DOI]

CoRR, 2023

Personalize Segment Anything Model with One Shot.

[BibT_eX]

[DOI]

CoRR, 2023

LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model.

[BibT_eX]

[DOI]

CoRR, 2023

ViewRefer: Grasp the Multi-view Knowledge for 3D Visual Grounding with GPT and Prototype Guidance.

[BibT_eX]

[DOI]

CoRR, 2023

LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention.

[BibT_eX]

[DOI]

CoRR, 2023

Parameter is Not All You Need: Starting from Non-Parametric Networks for 3D Point Cloud Analysis.

[BibT_eX]

[DOI]

CoRR, 2023

Mimic before Reconstruct: Enhancing Masked Autoencoders with Feature Mimicking.

[BibT_eX]

[DOI]

CoRR, 2023

Nearest Neighbors Meet Deep Neural Networks for Point Cloud Analysis.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023

DS-Point: A Dual-Scale 3D Framework for Point Cloud Understanding.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics, 2023

JourneyDB: A Benchmark for Generative Image Understanding.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Revisiting Event-Based Video Frame Interpolation.

[BibT_eX]

[DOI]

IROS, 2023

Joint-MAE: 2D-3D Joint Masked Autoencoders for 3D Point Cloud Pre-training.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

Retrieving-to-Answer: Zero-Shot Video Question Answering with Frozen Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Not All Features Matter: Enhancing Few-shot CLIP with Adaptive Prior Refinement.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

PointCLIP V2: Prompting CLIP and GPT for Powerful 3D Open-world Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

SparseMAE: Sparse Training Meets Masked Autoencoders.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

MonoDETR: Depth-guided Transformer for Monocular 3D Object Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Starting from Non-Parametric Networks for 3D Point Cloud Analysis.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Learning 3D Representations from 2D Pre-Trained Models via Image-to-Point Masked Autoencoders.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Prompt, Generate, Then Cache: Cascade of Foundation Models Makes Strong Few-Shot Learners.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

PiMAE: Point Cloud and Image Interactive Masked Autoencoders for 3D Object Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

iQuery: Instruments as Queries for Audio-Visual Sound Separation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Dynamic Embedding Size Search with Minimum Regret for Streaming Recommender System.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, 2023

CALIP: Zero-Shot Enhancement of CLIP with Parameter-Free Attention.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

Decorate the Newcomers: Visual Domain Prompt for Continual Test Time Adaptation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022

TiG-BEV: Multi-view BEV 3D Object Detection via Target Inner-Geometry Learning.

[BibT_eX]

[DOI]

CoRR, 2022

PointCLIP V2: Adapting CLIP for Powerful 3D Open-world Learning.

[BibT_eX]

[DOI]

CoRR, 2022

EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual and Language Learning.

[BibT_eX]

[DOI]

CoRR, 2022

Collaboration of Pre-trained Models Makes Better Few-shot Learner.

[BibT_eX]

[DOI]

CoRR, 2022

Tip-Adapter: Training-free Adaption of CLIP for Few-shot Classification.

[BibT_eX]

[DOI]

CoRR, 2022

Can Language Understand Depth?

[BibT_eX]

[DOI]

Renrui Zhang

Ziyao Zeng

Ziyu Guo

CoRR, 2022

POS-BERT: Point Cloud One-Stage BERT Pre-Training.

[BibT_eX]

[DOI]

CoRR, 2022

MonoDETR: Depth-aware Transformer for Monocular 3D Object Detection.

[BibT_eX]

[DOI]

CoRR, 2022

Distillation with Contrast is All You Need for Self-Supervised Point Cloud Representation Learning.

[BibT_eX]

[DOI]

CoRR, 2022

Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud Pre-training.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Can Language Understand Depth?

[BibT_eX]

[DOI]

Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Tip-Adapter: Training-Free Adaption of CLIP for Few-Shot Classification.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

Frozen CLIP Models are Efficient Video Learners.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

Exploring Resolution and Degradation Clues as Self-supervised Signal for Low Quality Object Detection.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

PointCLIP: Point Cloud Understanding by CLIP.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021

VT-CLIP: Enhancing Vision-Language Models with Visual-guided Texts.

[BibT_eX]

[DOI]

CoRR, 2021

DSPoint: Dual-scale Point Cloud Recognition with High-frequency Fusion.

[BibT_eX]

[DOI]

CoRR, 2021

Tip-Adapter: Training-free CLIP-Adapter for Better Vision-Language Modeling.

[BibT_eX]

[DOI]

CoRR, 2021

Dual-stream Network for Visual Recognition.

[BibT_eX]

[DOI]

CoRR, 2021

Differential Privacy Protection and Game Analysis of Intelligent Transportation Data.

[BibT_eX]

[DOI]

Proceedings of the 12th International Symposium on Parallel Architectures, 2021

Dual-stream Network for Visual Recognition.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

End-to-End Object Detection with Adaptive Clustering Transformer.

[BibT_eX]

[DOI]

Proceedings of the 32nd British Machine Vision Conference 2021, 2021

2019

A variational image segmentation method exploring both intensity means and texture patterns.

[BibT_eX]

[DOI]

Signal Process. Image Commun., 2019

Renrui Zhang

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...