Yiwu Zhong

According to our database1, Yiwu Zhong authored at least 27 papers between 2020 and 2026.

Collaborative distances:

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
Preserve Support, Not Correspondence: Dynamic Routing for Offline Reinforcement Learning.
CoRR, April, 2026

DOSE: Data Selection for Multi-Modal LLMs via Off-the-Shelf Models.
CoRR, April, 2026

TextShield-R1: Reinforced Reasoning for Tampered Text Detection.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025
Rethinking Chain-of-Thought Reasoning for Videos.
CoRR, December, 2025

Webly-Supervised Image Manipulation Localization via Category-Aware Auto-Annotation.
CoRR, August, 2025

AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Fine-Grained Spatiotemporal Grounding on Egocentric Videos.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

PAVE: Patching and Adapting Video Large Language Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Revisiting Tampered Scene Text Detection in the Era of Generative AI.
Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024
AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning.
CoRR, 2024

Omni-IML: Towards Unified Image Manipulation Localization.
CoRR, 2024

TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models.
CoRR, 2024

Generalized Tampered Scene Text Detection in the era of Generative AI.
CoRR, 2024

Beyond Embeddings: The Promise of Visual Table in Multi-Modal Models.
CoRR, 2024

Beyond Embeddings: The Promise of Visual Table in Visual Reasoning.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Enhancing Temporal Modeling of Video LLMs via Time Gating.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

Towards Learning a Generalist Model for Embodied Navigation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Towards Modern Image Manipulation Localization: A Large-Scale Dataset and Novel Methods.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation.
CoRR, 2023

Robust and Interpretable Medical Image Classifiers via Concept Bottleneck Models.
CoRR, 2023

Learning Concise and Descriptive Attributes for Visual Recognition.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Learning Procedure-aware Video Representation from Instructional Videos and Their Narrations.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
RegionCLIP: Region-based Language-Image Pretraining.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Grounded Language-Image Pre-training.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021
Learning to Generate Scene Graph from Natural Language Supervision.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

A Simple Baseline for Weakly-Supervised Scene Graph Generation.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

2020
Comprehensive Image Captioning via Scene Graph Decomposition.
Proceedings of the Computer Vision - ECCV 2020, 2020


  Loading...