Zihao Yue

Orcid: 0000-0002-3470-5442

According to our database1, Zihao Yue authored at least 20 papers between 2022 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
StreamPro: From Reactive Perception to Proactive Decision-Making in Streaming Video.
CoRR, May, 2026

Video2GUI: Synthesizing Large-Scale Interaction Trajectories for Generalized GUI Agent Pretraining.
CoRR, May, 2026

Exploring Attention Attractors in Large Language Models.
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2026

ChartEditor: A Reinforcement Learning Framework for Robust Chart Editing.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025
GroundingME: Exposing the Visual Grounding Gap in MLLMs through Multi-Dimensional Evaluation.
CoRR, December, 2025

MiMo-Embodied: X-Embodied Foundation Model Technical Report.
CoRR, November, 2025

MiMo-VL Technical Report.
CoRR, June, 2025

MiMo: Unlocking the Reasoning Potential of Language Model - From Pretraining to Posttraining.
CoRR, May, 2025

TimeZero: Temporal Video Grounding with Reasoning-Guided LVLM.
CoRR, March, 2025

Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement.
CoRR, March, 2025

Time-R1: Post-Training Large Vision Language Model for Temporal Video Grounding.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

ChartM<sup>3</sup>: Benchmarking Chart Editing with Multimodal Instructions.
Proceedings of the 33rd ACM International Conference on Multimedia, 2025

Unified Multimodal Understanding via Byte-Pair Visual Encoding.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

VideoOrion: Tokenizing Object Dynamics in Videos.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Movie101v2: Improved Movie Narration Benchmark.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024
Unveiling Visual Biases in Audio-Visual Localization Benchmarks.
Proceedings of the Computer Vision - ECCV 2024 Workshops, 2024

Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023
Learning Descriptive Image Captioning via Semipermeable Maximum Likelihood Estimation.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Movie101: A New Movie Understanding Benchmark.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

2022
MovieUN: A Dataset for Movie Understanding and Narrating.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022


  Loading...