Linli Yao

Orcid: 0000-0002-9809-8864

According to our database1, Linli Yao authored at least 23 papers between 2019 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
Claw-Eval: Toward Trustworthy Evaluation of Autonomous Agents.
CoRR, April, 2026

TimeChat-Captioner: Scripting Multi-Scene Videos with Time-Aware and Structural Audio-Visual Captions.
CoRR, February, 2026

DiaDem: Advancing Dialogue Descriptions in Audiovisual Video Captioning for Multimodal Large Language Models.
CoRR, January, 2026

RICo: Refined In-Context Contribution for Automatic Instruction-Tuning Data Selection.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025
Conan: Progressive Learning to Reason Like a Detective over Multi-Scale Visual Evidence.
CoRR, October, 2025

AVoCaDO: An Audiovisual Video Captioner Driven by Temporal Orchestration.
CoRR, October, 2025

Mitigating Overthinking through Reasoning Shaping.
CoRR, October, 2025

RICo: Refined In-Context Contribution for Automatic Instruction-Tuning Data Selection.
CoRR, May, 2025

TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos.
Proceedings of the 33rd ACM International Conference on Multimedia, 2025

Temporal Reasoning Transfer from Text to Video.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

RICO: Improving Accuracy and Completeness in Image Recaptioning via Visual Reconstruction.
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

Generative Frame Sampler for Long Video Understanding.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

2024
DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models.
CoRR, 2024

LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation?
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024

Edit As You Wish: Video Caption Editing with Multi-grained User Control.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

UBiSS: A Unified Framework for Bimodal Semantic Summarization of Videos.
Proceedings of the 2024 International Conference on Multimedia Retrieval, 2024

TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
Edit As You Wish: Video Description Editing with Multi-grained Commands.
CoRR, 2023

CapEnrich: Enriching Caption Semantics for Web Images via Cross-modal Pre-trained Knowledge.
Proceedings of the ACM Web Conference 2023, 2023

Rethinking Benchmarks for Cross-modal Image-text Retrieval.
Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2023

2022
Image Difference Captioning with Pre-training and Contrastive Learning.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2020
YouMakeup VQA Challenge: Towards Fine-grained Action Understanding in Domain-Specific Videos.
CoRR, 2020

2019
RUC at MediaEval 2019: Video Memorability Prediction Based on Visual Textual and Concept Related Features.
Proceedings of the Working Notes Proceedings of the MediaEval 2019 Workshop, 2019


  Loading...