Fangxun Shu

Orcid: 0009-0004-9365-5993

According to our database¹, Fangxun Shu authored at least 17 papers between 2022 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2025

Fast-Slow Thinking for Large Vision-Language Model Reasoning.

[BibT_eX]

[DOI]

CoRR, April, 2025

CMMCoT: Enhancing Complex Multi-Image Comprehension via Multi-Modal Chain-of-Thought and Memory Augmentation.

[BibT_eX]

[DOI]

CoRR, March, 2025

MINT: Multi-modal Chain of Thought in Unified Generative Models for Enhanced Image Generation.

[BibT_eX]

[DOI]

CoRR, March, 2025

LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Autoregressive Pretraining with Mamba in Vision.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Streaming Video Question-Answering with In-context Video KV-Cache Retrieval.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

T2I-FactualBench: Benchmarking the Factuality of Text-to-Image Models with Knowledge-Intensive Concepts.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback.

[BibT_eX]

[DOI]

Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis.

[BibT_eX]

[DOI]

Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024

MAC: Masked Contrastive Pre-Training for Efficient Video-Text Retrieval.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2024

T2I-FactualBench: Benchmarking the Factuality of Text-to-Image Models with Knowledge-Intensive Concepts.

[BibT_eX]

[DOI]

CoRR, 2024

SAG: Style-Aligned Article Generation via Model Collaboration.

[BibT_eX]

[DOI]

CoRR, 2024

LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation.

[BibT_eX]

[DOI]

CoRR, 2024

HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

2023

Compress & Align: Curating Image-Text Data with Human Knowledge.

[BibT_eX]

[DOI]

CoRR, 2023

Audio-Visual LLM for Video Understanding.

[BibT_eX]

[DOI]

CoRR, 2023

2022

Masked Contrastive Pre-Training for Efficient Video-Text Retrieval.

[BibT_eX]

[DOI]

CoRR, 2022

Fangxun Shu

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...