We stand with Ukraine

We stand with Ukraine

Linli Yao

Orcid: 0000-0002-9809-8864

According to our database¹, Linli Yao authored at least 26 papers between 2019 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

StreamPro: From Reactive Perception to Proactive Decision-Making in Streaming Video.

[DOI]

,

,

,

,

,

,

,

,

,

CoRR, May, 2026

Video Understanding Reward Modeling: A Robust Benchmark and Performant Reward Models.

[DOI]

,

,

,

,

,

,

CoRR, May, 2026

DiffCap-Bench: A Comprehensive, Challenging, Robust Benchmark for Image Difference Captioning.

[DOI]

,

,

,

,

,

,

,

,

,

CoRR, May, 2026

Claw-Eval: Toward Trustworthy Evaluation of Autonomous Agents.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, April, 2026

TimeChat-Captioner: Scripting Multi-Scene Videos with Time-Aware and Structural Audio-Visual Captions.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, February, 2026

DiaDem: Advancing Dialogue Descriptions in Audiovisual Video Captioning for Multimodal Large Language Models.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, January, 2026

RICo: Refined In-Context Contribution for Automatic Instruction-Tuning Data Selection.

[DOI]

,

,

,

,

,

,

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025

Conan: Progressive Learning to Reason Like a Detective over Multi-Scale Visual Evidence.

[DOI]

,

,

,

,

,

,

,

CoRR, October, 2025

AVoCaDO: An Audiovisual Video Captioner Driven by Temporal Orchestration.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

CoRR, October, 2025

Mitigating Overthinking through Reasoning Shaping.

[DOI]

,

,

,

,

,

,

,

,

,

,

CoRR, October, 2025

RICo: Refined In-Context Contribution for Automatic Instruction-Tuning Data Selection.

[DOI]

,

,

,

,

CoRR, May, 2025

TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the 33rd ACM International Conference on Multimedia, 2025

Temporal Reasoning Transfer from Text to Video.

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

RICO: Improving Accuracy and Completeness in Image Recaptioning via Visual Reconstruction.

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

Generative Frame Sampler for Long Video Understanding.

[DOI]

,

,

,

,

,

,

,

Proceedings of the Findings of the Association for Computational Linguistics, 2025

2024

DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models.

[DOI]

,

,

,

,

,

,

CoRR, 2024

LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation?

[DOI]

,

,

,

,

,

,

,

Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024

Edit As You Wish: Video Caption Editing with Multi-grained User Control.

[DOI]

,

,

,

,

,

,

,

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

UBiSS: A Unified Framework for Bimodal Semantic Summarization of Videos.

[DOI]

,

,

Proceedings of the 2024 International Conference on Multimedia Retrieval, 2024

TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding.

[DOI]

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

Edit As You Wish: Video Description Editing with Multi-grained Commands.

[DOI]

,

,

,

,

,

,

CoRR, 2023

CapEnrich: Enriching Caption Semantics for Web Images via Cross-modal Pre-trained Knowledge.

[DOI]

,

,

Proceedings of the ACM Web Conference 2023, 2023

Rethinking Benchmarks for Cross-modal Image-text Retrieval.

[DOI]

,

,

Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2023

2022

Image Difference Captioning with Pre-training and Contrastive Learning.

[DOI]

,

,

Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2020

YouMakeup VQA Challenge: Towards Fine-grained Action Understanding in Domain-Specific Videos.

[DOI]

,

,

,

,

CoRR, 2020

2019

RUC at MediaEval 2019: Video Memorability Prediction Based on Visual Textual and Concept Related Features.

[DOI]

,

,

,

Proceedings of the Working Notes Proceedings of the MediaEval 2019 Workshop, 2019

Loading...