Dongzhi Jiang

According to our database1, Dongzhi Jiang authored at least 26 papers between 2023 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
GenClaw: Code-Driven Agentic Image Generation.
CoRR, May, 2026

CoCo: Code as CoT for Text-to-Image Preview and Rare Concept Generation.
CoRR, March, 2026

Mind-Brush: Integrating Agentic Cognitive Search and Reasoning into Image Generation.
CoRR, February, 2026

PiSA: A Self-Augmented Data Engine and Training Strategy for 3D Understanding with Large Models.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2026

2025
Are We Ready for RL in Text-to-3D Generation? A Progressive Investigation.
CoRR, December, 2025

DraCo: Draft as CoT for Text-to-Image Preview and Rare Concept Generation.
CoRR, December, 2025

RealGen: Photorealistic Text-to-Image Generation via Detector-Guided Rewards.
CoRR, December, 2025

Are Video Models Ready as Zero-Shot Reasoners? An Empirical Study with the MME-CoF Benchmark.
CoRR, October, 2025

BLINK-Twice: You see, but do you observe? A Reasoning Benchmark on Visual Perception.
CoRR, October, 2025

Echo-4o: Harnessing the Power of GPT-4o Synthetic Images for Improved Image Generation.
CoRR, August, 2025

MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning.
CoRR, June, 2025

T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT.
CoRR, May, 2025

ADT: Tuning Diffusion Models with Adversarial Supervision.
CoRR, April, 2025

SciVerse: Unveiling the Knowledge Comprehension and Visual Reasoning of LMMs on Multi-modal Scientific Problems.
CoRR, March, 2025

EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM.
Proceedings of the Forty-second International Conference on Machine Learning, 2025

MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency.
Proceedings of the Forty-second International Conference on Machine Learning, 2025

MAVIS: Mathematical Visual Instruction Tuning with an Automatic Data Engine.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

MMSearch: Unveiling the Potential of Large Models as Multi-modal Search Engines.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

SciVerse: Unveiling the Knowledge Comprehension and Visual Reasoning of LMMs on Multi-modal Scientific Problems.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

2024
MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines.
CoRR, 2024

MAVIS: Mathematical Visual Instruction Tuning.
CoRR, 2024

MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
CoRR, 2024

MoVA: Adapting Mixture of Vision Experts to Multimodal Context.
Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching.
Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

MATHVERSE: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
Proceedings of the Computer Vision - ECCV 2024, 2024

2023
Temporal Enhanced Training of Multi-view 3D Object Detector via Historical Object Prediction.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023


  Loading...