Yihan Zeng

Orcid: 0009-0001-2441-5492

According to our database¹, Yihan Zeng authored at least 28 papers between 2021 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2025

Collaborative Novel Object Discovery and Box-Guided Cross-Modal Alignment for Open-Vocabulary 3D Object Detection.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., November, 2025

PhysWorld: From Real Videos to World Models of Deformable Objects via Physics-Aware Demonstration Synthesis.

[BibT_eX]

[DOI]

CoRR, October, 2025

Tool-R1: Sample-Efficient Reinforcement Learning for Agentic Tool Use.

[BibT_eX]

[DOI]

CoRR, September, 2025

GLaVE-Cap: Global-Local Aligned Video Captioning with Vision Expert Integration.

[BibT_eX]

[DOI]

CoRR, September, 2025

C2-Evo: Co-Evolving Multimodal Data and Model for Self-Improving Reasoning.

[BibT_eX]

[DOI]

CoRR, July, 2025

CoherenDream: Boosting Holistic Text Coherence in 3D Generation via Multimodal Large Language Models Feedback.

[BibT_eX]

[DOI]

CoRR, April, 2025

Can Atomic Step Decomposition Enhance the Self-structured Reasoning of Multimodal Large Models?

[BibT_eX]

[DOI]

CoRR, March, 2025

Corrupted but Not Broken: Rethinking the Impact of Corrupted Data in Visual Instruction Tuning.

[BibT_eX]

[DOI]

CoRR, February, 2025

FramePainter: Endowing Interactive Image Editing with Video Diffusion Priors.

[BibT_eX]

[DOI]

CoRR, January, 2025

UniGS: Unified Language-Image-3D Pretraining with Gaussian Splatting.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

DreamPhysics: Learning Physics-Based 3D Dynamics with Video Diffusion Priors.

[BibT_eX]

[DOI]

Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024

AtomThink: A Slow Thinking Framework for Multimodal Mathematical Reasoning.

[BibT_eX]

[DOI]

CoRR, 2024

EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions.

[BibT_eX]

[DOI]

CoRR, 2024

DreamPhysics: Learning Physical Properties of Dynamic 3D Gaussians with Video Diffusion Priors.

[BibT_eX]

[DOI]

CoRR, 2024

OpenOcc: Open Vocabulary 3D Scene Reconstruction via Occupancy Representation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2024

TextField3D: Towards Enhancing Open-Vocabulary 3D Generation with Noisy Text Fields.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

PanGu-Draw: Advancing Resource-Efficient Text-to-Image Synthesis with Time-Decoupled Training and Reusable Coop-Diffusion.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

JointDreamer: Ensuring Geometry Consistency and Text Congruence in Text-to-3D Generation via Joint Score Distillation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

DreamControl: Control-Based Text-to-3D Generation with 3D Self-Prior.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

CoDA: Collaborative Novel Box Discovery and Cross-modal Alignment for Open-vocabulary 3D Object Detection.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

SUIT: Learning Significance-Guided Information for 3D Temporal Detection.

[BibT_eX]

[DOI]

IROS, 2023

Towards High-Fidelity Text-Guided 3D Face Generation and Manipulation Using only Images.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

DiffDis: Empowering Generative Diffusion Model with Cross-Modal Discrimination Capability.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

CLIP<sup>2</sup>: Contrastive Language-Image-Point Pretraining from Real-World Point Cloud Data.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

LIFT: Learning 4D LiDAR Image Fusion Transformer for 3D Object Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021

Learning Transferable Features for Point Cloud Detection via 3D Contrastive Co-training.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Cross-Modal 3D Object Detection and Tracking for Auto-Driving.

[BibT_eX]

[DOI]

Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2021

Yihan Zeng

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...