We stand with Ukraine

We stand with Ukraine

Chi Chen

Affiliations:

Tsinghua University, Beijing, China

According to our database¹, Chi Chen authored at least 23 papers between 2021 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2025

VisRAG 2.0: Evidence-Guided Multi-Image Reasoning in Visual Retrieval-Augmented Generation.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

CoRR, October, 2025

MiniCPM-V 4.5: Cooking Efficient MLLMs via Architecture, Data, and Training Recipe.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, September, 2025

AgentCPM-GUI: Building Mobile-Use Agents with Reinforcement Fine-Tuning.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, June, 2025

MUSEG: Reinforcing Video Temporal Understanding via Timestamp-Aware Multi-Segment Grounding.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

,

CoRR, May, 2025

Think in Safety: Unveiling and Mitigating Safety Alignment Collapse in Multimodal Large Reasoning Model.

[BibT_eX]

[DOI]

,

,

,

,

,

CoRR, May, 2025

DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

CoRR, March, 2025

Will Pre-Training Ever End? A First Step Toward Next-Generation Foundation MLLMs via Self-Improving Systematic Cognition.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

CoRR, March, 2025

ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

CoRR, January, 2025

AdaMMS: Model Merging for Heterogeneous Multimodal Large Language Models with Unsupervised Coefficient Optimization.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

ChartEdit: How Far Are MLLMs From Automating Chart Analysis? Evaluating MLLMs' Capability via Chart Editing.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

Proceedings of the Findings of the Association for Computational Linguistics, 2025

ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

Proceedings of the Findings of the Association for Computational Linguistics, 2025

2024

LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via Hierarchical Window Transformer.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

,

CoRR, 2024

StreamingBench: Assessing the Gap for MLLMs to Achieve Streaming Video Understanding.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

CoRR, 2024

ActiView: Evaluating Active Perception Ability for Multimodal Large Language Models.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

CoRR, 2024

Browse and Concentrate: Comprehending Multimodal Content via Prior-LLM Context Fusion.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

CODIS: Benchmarking Context-dependent Visual Comprehension for Multimodal Large Language Models.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

Model Composition for Multimodal Large Language Models.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023

Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models.

[BibT_eX]

[DOI]

,

,

,

,

,

,

CoRR, 2023

Filling the Image Information Gap for VQA: Prompting Large Language Models to Proactively Ask Questions.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Weakly Supervised Vision-and-Language Pre-training with Relative Representations.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

2022

End-to-End Unsupervised Vision-and-Language Pre-training with Referring Expression Matching.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

2021

Mask-Align: Self-Supervised Neural Word Alignment.

[BibT_eX]

[DOI]

,

,

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

Loading...