Shitian Zhao

According to our database¹, Shitian Zhao authored at least 22 papers between 2024 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

Med-R1: Reinforcement Learning for Generalizable Medical Reasoning in Vision-Language Models.

[BibT_eX]

[DOI]

IEEE Trans. Medical Imaging, June, 2026

Lumina-mGPT: Flexible Photorealistic Autoregressive Text-to-Image Generation.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., April, 2026

PyVision-RL: Forging Open Agentic Vision Models via RL.

[BibT_eX]

[DOI]

CoRR, February, 2026

2025

Debiasing Medical Knowledge for Prompting Universal Model in CT Image Segmentation.

[BibT_eX]

[DOI]

IEEE Trans. Medical Imaging, December, 2025

TIR-Bench: A Comprehensive Benchmark for Agentic Thinking-with-Images Reasoning.

[BibT_eX]

[DOI]

CoRR, November, 2025

PyVision: Agentic Vision with Dynamic Tooling.

[BibT_eX]

[DOI]

CoRR, July, 2025

Sekai: A Video Dataset towards World Exploration.

[BibT_eX]

[DOI]

CoRR, June, 2025

OmniCaptioner: One Captioner to Rule Them All.

[BibT_eX]

[DOI]

CoRR, April, 2025

LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis.

[BibT_eX]

[DOI]

CoRR, March, 2025

Think or Not Think: A Study of Explicit Thinking inRule-Based Visual Reinforcement Fine-Tuning.

[BibT_eX]

[DOI]

CoRR, March, 2025

Med-R1: Reinforcement Learning for Generalizable Medical Reasoning in Vision-Language Models.

[BibT_eX]

[DOI]

CoRR, March, 2025

IMAGINE-E: Image Generation Intelligence Evaluation of State-of-the-art Text-to-Image Models.

[BibT_eX]

[DOI]

CoRR, January, 2025

To Think or Not To Think: A Study of Thinking in Rule-Based Visual Reinforcement Fine-Tuning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Fontanimate: High Quality Few-Shot Font Generation Via Animating Font Transfer Process.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

2024

PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions.

[BibT_eX]

[DOI]

CoRR, 2024

Boosting Open-Domain Continual Learning via Leveraging Intra-domain Category-aware Prototype.

[BibT_eX]

[DOI]

CoRR, 2024

Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining.

[BibT_eX]

[DOI]

CoRR, 2024

SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Unleashing the Potentials of Likelihood Composition for Multi-modal Language Models.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

Causal-CoG: A Causal-Effect Look at Context Generation for Boosting Multi-Modal Language Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Shitian Zhao

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...