Fangxun Shu

Orcid: 0009-0004-9365-5993

According to our database1, Fangxun Shu authored at least 17 papers between 2022 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
Fast-Slow Thinking for Large Vision-Language Model Reasoning.
CoRR, April, 2025

CMMCoT: Enhancing Complex Multi-Image Comprehension via Multi-Modal Chain-of-Thought and Memory Augmentation.
CoRR, March, 2025

MINT: Multi-modal Chain of Thought in Unified Generative Models for Enhanced Image Generation.
CoRR, March, 2025

LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Autoregressive Pretraining with Mamba in Vision.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Streaming Video Question-Answering with In-context Video KV-Cache Retrieval.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

T2I-FactualBench: Benchmarking the Factuality of Text-to-Image Models with Knowledge-Intensive Concepts.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024
MAC: Masked Contrastive Pre-Training for Efficient Video-Text Retrieval.
IEEE Trans. Multim., 2024

T2I-FactualBench: Benchmarking the Factuality of Text-to-Image Models with Knowledge-Intensive Concepts.
CoRR, 2024

SAG: Style-Aligned Article Generation via Model Collaboration.
CoRR, 2024

LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation.
CoRR, 2024

HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models.
CoRR, 2024

2023
Compress & Align: Curating Image-Text Data with Human Knowledge.
CoRR, 2023

Audio-Visual LLM for Video Understanding.
CoRR, 2023

2022
Masked Contrastive Pre-Training for Efficient Video-Text Retrieval.
CoRR, 2022


  Loading...