Yunlong Tang
Orcid: 0000-0003-2796-1787Affiliations:
- University of Rochester, NY, USA
- Tencent (China), Shenzhen, China (former)
- Southern University of Science and Technology, Department of Computer Science and Engineering, Shenzhen, China (former)
According to our database1,
Yunlong Tang
authored at least 29 papers
between 2022 and 2025.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on orcid.org
On csauthors.net:
Bibliography
2025
MMPerspective: Do MLLMs Understand Perspective? A Comprehensive Benchmark for Perspective Perception, Reasoning, and Robustness.
CoRR, May, 2025
MMIG-Bench: Towards Comprehensive and Explainable Evaluation of Multi-Modal Image Generation Models.
CoRR, May, 2025
The Sword of Damocles in ViTs: Computational Redundancy Amplifies Adversarial Transferability.
CoRR, April, 2025
Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting.
CoRR, April, 2025
CoRR, April, 2025
VERIFY: A Benchmark of Visual Explanation and Reasoning for Investigating Multimodal Reasoning Fidelity.
CoRR, March, 2025
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025
CaRDiff: Video Salient Object Ranking Chain of Thought Reasoning for Saliency Prediction with Diffusion.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025
Empowering LLMs with Pseudo-Untrimmed Videos for Audio-Visual Temporal Understanding.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025
Proceedings of the International Conference on 3D Vision, 2025
2024
CoRR, 2024
MMCOMPOSITION: Revisiting the Compositionality of Pre-trained Vision-Language Models.
CoRR, 2024
CoRR, 2024
AVicuna: Audio-Visual LLM with Interleaver and Context-Boundary Alignment for Temporal Referential Dialogue.
CoRR, 2024
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024
Proceedings of the Computer Vision - ECCV 2024 Workshops, 2024
2023
CoRR, 2023
LLMVA-GEBC: Large Language Model with Video Adapter for Generic Event Boundary Captioning.
CoRR, 2023
CoRR, 2023
2022
Multi-modal Segment Assemblage Network for Ad Video Editing with Importance-Coherence Reward.
Proceedings of the Computer Vision - ACCV 2022, 2022