We stand with Ukraine

We stand with Ukraine

Chi Chen

Orcid: 0000-0001-8008-7043

Affiliations:

Tsinghua University, Beijing, China

According to our database¹, Chi Chen authored at least 38 papers between 2017 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

Does Seeing More Mean Knowing More? Mono-Anchored Advantage Normalization for Multi-Source Visual Reasoning.

[DOI]

,

,

,

,

,

CoRR, May, 2026

MiniCPM-o 4.5: Towards Real-Time Full-Duplex Omni-Modal Interaction.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, April, 2026

Cheers: Decoupling Patch Details from Semantic Representations Enables Unified Multimodal Comprehension and Generation.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, March, 2026

Imagination Helps Visual Reasoning, But Not Yet in Latent Space.

[DOI]

,

,

,

,

,

,

CoRR, February, 2026

LLaVA-UHD v2: Exploiting Hierarchical Vision Granularity in MLLMs via Inverse Semantic Pyramid.

[DOI]

,

,

,

,

,

,

,

,

,

,

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025

MM-UAVBench: How Well Do Multimodal Large Language Models See, Think, and Plan in Low-Altitude UAV Scenarios?

[DOI]

,

,

,

,

,

,

,

,

,

,

CoRR, December, 2025

LLaVA-UHD v3: Progressive Visual Compression for Efficient Native-Resolution Encoding in MLLMs.

[DOI]

,

,

,

,

,

,

,

,

CoRR, November, 2025

VisRAG 2.0: Evidence-Guided Multi-Image Reasoning in Visual Retrieval-Augmented Generation.

[DOI]

,

,

,

,

,

,

,

CoRR, October, 2025

MiniCPM-V 4.5: Cooking Efficient MLLMs via Architecture, Data, and Training Recipe.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, September, 2025

MUSEG: Reinforcing Video Temporal Understanding via Timestamp-Aware Multi-Segment Grounding.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

CoRR, May, 2025

DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding.

[DOI]

,

,

,

,

,

,

,

CoRR, March, 2025

Will Pre-Training Ever End? A First Step Toward Next-Generation Foundation MLLMs via Self-Improving Systematic Cognition.

[DOI]

,

,

,

,

,

,

,

,

CoRR, March, 2025

ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation.

[DOI]

,

,

,

,

,

,

,

CoRR, January, 2025

CITR: Efficient Long Video Understanding Needs Causal Importance.

[DOI]

,

,

,

,

,

,

Proceedings of the 33rd ACM International Conference on Multimedia, 2025

Think in Safety: Unveiling and Mitigating Safety Alignment Collapse in Multimodal Large Reasoning Model.

[DOI]

,

,

,

,

,

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

AgentCPM-GUI: Building Mobile-Use Agents with Reinforcement Fine-Tuning.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

AdaMMS: Model Merging for Heterogeneous Multimodal Large Language Models with Unsupervised Coefficient Optimization.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

ChartEdit: How Far Are MLLMs From Automating Chart Analysis? Evaluating MLLMs' Capability via Chart Editing.

[DOI]

,

,

,

,

,

,

,

Proceedings of the Findings of the Association for Computational Linguistics, 2025

ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation.

[DOI]

,

,

,

,

,

,

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models.

[DOI]

,

,

,

,

,

,

,

,

,

,

Proceedings of the Findings of the Association for Computational Linguistics, 2025

2024

Author Correction: Predicting equilibrium distributions for molecular systems with deep learning.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Nat. Mac. Intell., 2024

Predicting equilibrium distributions for molecular systems with deep learning.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Nat. Mac. Intell., 2024

LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via Hierarchical Window Transformer.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

CoRR, 2024

StreamingBench: Assessing the Gap for MLLMs to Achieve Streaming Video Understanding.

[DOI]

,

,

,

,

,

,

,

CoRR, 2024

ActiView: Evaluating Active Perception Ability for Multimodal Large Language Models.

[DOI]

,

,

,

,

,

,

,

,

CoRR, 2024

Browse and Concentrate: Comprehending Multimodal Content via Prior-LLM Context Fusion.

[DOI]

,

,

,

,

,

,

,

,

,

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

CODIS: Benchmarking Context-dependent Visual Comprehension for Multimodal Large Language Models.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

Model Composition for Multimodal Large Language Models.

[DOI]

,

,

,

,

,

,

,

,

,

,

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023

Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models.

[DOI]

,

,

,

,

,

,

CoRR, 2023

Towards Predicting Equilibrium Distributions for Molecular Systems with Deep Learning.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, 2023

Filling the Image Information Gap for VQA: Prompting Large Language Models to Proactively Ask Questions.

[DOI]

,

,

,

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Weakly Supervised Vision-and-Language Pre-training with Relative Representations.

[DOI]

,

,

,

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

2022

EXTR: Click-Through Rate Prediction with Externalities in E-Commerce Sponsored Search.

[DOI]

,

,

,

,

,

,

,

,

,

Proceedings of the KDD '22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, August 14, 2022

End-to-End Unsupervised Vision-and-Language Pre-training with Referring Expression Matching.

[DOI]

,

,

,

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

2021

Mask-Align: Self-Supervised Neural Word Alignment.

[DOI]

,

,

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

2019

Investment Behaviors Can Tell What Inside: Exploring Stock Intrinsic Properties for Stock Trend Prediction.

[DOI]

,

,

,

,

Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2019

2017

When Will a Repost Cascade Settle Down?

[DOI]

,

,

,

Proceedings of the Web Information Systems Engineering - WISE 2017, 2017

A System for Recognizing Entities and Extracting Relations from Electronic Medical Records.

[DOI]

,

,

Proceedings of the 14th Web Information Systems and Applications Conference, 2017

Loading...