Chi Chen

Orcid: 0000-0001-8008-7043

Affiliations:
  • Tsinghua University, Beijing, China


According to our database1, Chi Chen authored at least 36 papers between 2017 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
Cheers: Decoupling Patch Details from Semantic Representations Enables Unified Multimodal Comprehension and Generation.
CoRR, March, 2026

Imagination Helps Visual Reasoning, But Not Yet in Latent Space.
CoRR, February, 2026

LLaVA-UHD v2: Exploiting Hierarchical Vision Granularity in MLLMs via Inverse Semantic Pyramid.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025
MM-UAVBench: How Well Do Multimodal Large Language Models See, Think, and Plan in Low-Altitude UAV Scenarios?
CoRR, December, 2025

LLaVA-UHD v3: Progressive Visual Compression for Efficient Native-Resolution Encoding in MLLMs.
CoRR, November, 2025

VisRAG 2.0: Evidence-Guided Multi-Image Reasoning in Visual Retrieval-Augmented Generation.
CoRR, October, 2025

MiniCPM-V 4.5: Cooking Efficient MLLMs via Architecture, Data, and Training Recipe.
CoRR, September, 2025

MUSEG: Reinforcing Video Temporal Understanding via Timestamp-Aware Multi-Segment Grounding.
CoRR, May, 2025

DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding.
CoRR, March, 2025

Will Pre-Training Ever End? A First Step Toward Next-Generation Foundation MLLMs via Self-Improving Systematic Cognition.
CoRR, March, 2025

ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation.
CoRR, January, 2025

CITR: Efficient Long Video Understanding Needs Causal Importance.
Proceedings of the 33rd ACM International Conference on Multimedia, 2025

Think in Safety: Unveiling and Mitigating Safety Alignment Collapse in Multimodal Large Reasoning Model.
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

AgentCPM-GUI: Building Mobile-Use Agents with Reinforcement Fine-Tuning.
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

AdaMMS: Model Merging for Heterogeneous Multimodal Large Language Models with Unsupervised Coefficient Optimization.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

ChartEdit: How Far Are MLLMs From Automating Chart Analysis? Evaluating MLLMs' Capability via Chart Editing.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

2024
Author Correction: Predicting equilibrium distributions for molecular systems with deep learning.
Nat. Mac. Intell., 2024

Predicting equilibrium distributions for molecular systems with deep learning.
Nat. Mac. Intell., 2024

LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via Hierarchical Window Transformer.
CoRR, 2024

StreamingBench: Assessing the Gap for MLLMs to Achieve Streaming Video Understanding.
CoRR, 2024

ActiView: Evaluating Active Perception Ability for Multimodal Large Language Models.
CoRR, 2024

Browse and Concentrate: Comprehending Multimodal Content via Prior-LLM Context Fusion.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

CODIS: Benchmarking Context-dependent Visual Comprehension for Multimodal Large Language Models.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

Model Composition for Multimodal Large Language Models.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023
Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models.
CoRR, 2023

Towards Predicting Equilibrium Distributions for Molecular Systems with Deep Learning.
CoRR, 2023

Filling the Image Information Gap for VQA: Prompting Large Language Models to Proactively Ask Questions.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Weakly Supervised Vision-and-Language Pre-training with Relative Representations.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

2022
EXTR: Click-Through Rate Prediction with Externalities in E-Commerce Sponsored Search.
Proceedings of the KDD '22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, August 14, 2022

End-to-End Unsupervised Vision-and-Language Pre-training with Referring Expression Matching.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

2021
Mask-Align: Self-Supervised Neural Word Alignment.
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

2019
Investment Behaviors Can Tell What Inside: Exploring Stock Intrinsic Properties for Stock Trend Prediction.
Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2019

2017
When Will a Repost Cascade Settle Down?
Proceedings of the Web Information Systems Engineering - WISE 2017, 2017

A System for Recognizing Entities and Extracting Relations from Electronic Medical Records.
Proceedings of the 14th Web Information Systems and Applications Conference, 2017


  Loading...