Zihao Yue

Orcid: 0000-0002-3470-5442

According to our database¹, Zihao Yue authored at least 20 papers between 2022 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

StreamPro: From Reactive Perception to Proactive Decision-Making in Streaming Video.

[BibT_eX]

[DOI]

CoRR, May, 2026

Video2GUI: Synthesizing Large-Scale Interaction Trajectories for Generalized GUI Agent Pretraining.

[BibT_eX]

[DOI]

CoRR, May, 2026

Exploring Attention Attractors in Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2026

ChartEditor: A Reinforcement Learning Framework for Robust Chart Editing.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025

GroundingME: Exposing the Visual Grounding Gap in MLLMs through Multi-Dimensional Evaluation.

[BibT_eX]

[DOI]

CoRR, December, 2025

MiMo-Embodied: X-Embodied Foundation Model Technical Report.

[BibT_eX]

[DOI]

CoRR, November, 2025

MiMo-VL Technical Report.

[BibT_eX]

[DOI]

CoRR, June, 2025

MiMo: Unlocking the Reasoning Potential of Language Model - From Pretraining to Posttraining.

[BibT_eX]

[DOI]

CoRR, May, 2025

TimeZero: Temporal Video Grounding with Reasoning-Guided LVLM.

[BibT_eX]

[DOI]

CoRR, March, 2025

Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement.

[BibT_eX]

[DOI]

CoRR, March, 2025

Time-R1: Post-Training Large Vision Language Model for Temporal Video Grounding.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

ChartM<sup>3</sup>: Benchmarking Chart Editing with Multimodal Instructions.

[BibT_eX]

[DOI]

Proceedings of the 33rd ACM International Conference on Multimedia, 2025

Unified Multimodal Understanding via Byte-Pair Visual Encoding.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

VideoOrion: Tokenizing Object Dynamics in Videos.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Movie101v2: Improved Movie Narration Benchmark.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024

Unveiling Visual Biases in Audio-Visual Localization Benchmarks.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024 Workshops, 2024

Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective.

[BibT_eX]

[DOI]

Zihao Yue

Liang Zhang

Qin Jin

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023

Learning Descriptive Image Captioning via Semipermeable Maximum Likelihood Estimation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Movie101: A New Movie Understanding Benchmark.

[BibT_eX]

[DOI]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

2022

MovieUN: A Dataset for Movie Understanding and Narrating.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

Zihao Yue

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...