Chi Chen

Affiliations:
  • Tsinghua University, Beijing, China


According to our database1, Chi Chen authored at least 21 papers between 2021 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
AgentCPM-GUI: Building Mobile-Use Agents with Reinforcement Fine-Tuning.
CoRR, June, 2025

MUSEG: Reinforcing Video Temporal Understanding via Timestamp-Aware Multi-Segment Grounding.
CoRR, May, 2025

Think in Safety: Unveiling and Mitigating Safety Alignment Collapse in Multimodal Large Reasoning Model.
CoRR, May, 2025

DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding.
CoRR, March, 2025

Will Pre-Training Ever End? A First Step Toward Next-Generation Foundation MLLMs via Self-Improving Systematic Cognition.
CoRR, March, 2025

ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation.
CoRR, January, 2025

AdaMMS: Model Merging for Heterogeneous Multimodal Large Language Models with Unsupervised Coefficient Optimization.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

ChartEdit: How Far Are MLLMs From Automating Chart Analysis? Evaluating MLLMs' Capability via Chart Editing.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

2024
LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via Hierarchical Window Transformer.
CoRR, 2024

StreamingBench: Assessing the Gap for MLLMs to Achieve Streaming Video Understanding.
CoRR, 2024

ActiView: Evaluating Active Perception Ability for Multimodal Large Language Models.
CoRR, 2024

Browse and Concentrate: Comprehending Multimodal Content via Prior-LLM Context Fusion.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

CODIS: Benchmarking Context-dependent Visual Comprehension for Multimodal Large Language Models.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

Model Composition for Multimodal Large Language Models.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023
Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models.
CoRR, 2023

Filling the Image Information Gap for VQA: Prompting Large Language Models to Proactively Ask Questions.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Weakly Supervised Vision-and-Language Pre-training with Relative Representations.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

2022
End-to-End Unsupervised Vision-and-Language Pre-training with Referring Expression Matching.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

2021
Mask-Align: Self-Supervised Neural Word Alignment.
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021


  Loading...