Yunlong Tang

Orcid: 0000-0003-2796-1787

Affiliations:

University of Rochester, NY, USA
Tencent (China), Shenzhen, China (former)
Southern University of Science and Technology, Department of Computer Science and Engineering, Shenzhen, China (former)

According to our database¹, Yunlong Tang authored at least 31 papers between 2022 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Bibliography

2025

ToolMem: Enhancing Multimodal Agents with Learnable Tool Capability Memory.

[BibT_eX]

[DOI]

CoRR, October, 2025

VQualA 2025 Challenge on Engagement Prediction for Short Videos: Methods and Results.

[BibT_eX]

[DOI]

CoRR, September, 2025

Can Sound Replace Vision in LLaVA With Token Substitution?

[BibT_eX]

[DOI]

CoRR, June, 2025

ZeroSep: Separate Anything in Audio with Zero Training.

[BibT_eX]

[DOI]

CoRR, May, 2025

MMPerspective: Do MLLMs Understand Perspective? A Comprehensive Benchmark for Perspective Perception, Reasoning, and Robustness.

[BibT_eX]

[DOI]

CoRR, May, 2025

MMIG-Bench: Towards Comprehensive and Explainable Evaluation of Multi-Modal Image Generation Models.

[BibT_eX]

[DOI]

CoRR, May, 2025

The Sword of Damocles in ViTs: Computational Redundancy Amplifies Adversarial Transferability.

[BibT_eX]

[DOI]

CoRR, April, 2025

The Tenth NTIRE 2025 Efficient Super-Resolution Challenge Report.

[BibT_eX]

[DOI]

CoRR, April, 2025

Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting.

[BibT_eX]

[DOI]

CoRR, April, 2025

Why Reasoning Matters? A Survey of Advancements in Multimodal Reasoning (v1).

[BibT_eX]

[DOI]

CoRR, April, 2025

FreSca: Unveiling the Scaling Space in Diffusion Models.

[BibT_eX]

[DOI]

CoRR, April, 2025

VERIFY: A Benchmark of Visual Explanation and Reasoning for Investigating Multimodal Reasoning Fidelity.

[BibT_eX]

[DOI]

CoRR, March, 2025

Generative AI for Cel-Animation: A Survey.

[BibT_eX]

[DOI]

CoRR, January, 2025

VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Unveiling Visual Perception in Language Models: An Attention Head Analysis Approach.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

CaRDiff: Video Salient Object Ranking Chain of Thought Reasoning for Saliency Prediction with Diffusion.

[BibT_eX]

[DOI]

Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning.

[BibT_eX]

[DOI]

Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

Empowering LLMs with Pseudo-Untrimmed Videos for Audio-Visual Temporal Understanding.

[BibT_eX]

[DOI]

Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

GaussianStyle: Gaussian Head Avatar via StyleGAN.

[BibT_eX]

[DOI]

Proceedings of the International Conference on 3D Vision, 2025

2024

Unveiling Visual Perception in Language Models: An Attention Head Analysis Approach.

[BibT_eX]

[DOI]

CoRR, 2024

Scaling Concept With Text-Guided Diffusion Models.

[BibT_eX]

[DOI]

CoRR, 2024

MMCOMPOSITION: Revisiting the Compositionality of Pre-trained Vision-Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

Do More Details Always Introduce More Hallucinations in LVLM-based Image Captioning?

[BibT_eX]

[DOI]

CoRR, 2024

AVicuna: Audio-Visual LLM with Interleaver and Context-Boundary Alignment for Temporal Referential Dialogue.

[BibT_eX]

[DOI]

CoRR, 2024

EAGLE: Egocentric AGgregated Language-video Engine.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

AIM 2024 Challenge on Video Saliency Prediction: Methods and Results.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024 Workshops, 2024

2023

Video Understanding with Large Language Models: A Survey.

[BibT_eX]

[DOI]

CoRR, 2023

LaunchpadGPT: Language Model as Music Visualization Designer on Launchpad.

[BibT_eX]

[DOI]

Siting Xu

Yunlong Tang

Feng Zheng

CoRR, 2023

LLMVA-GEBC: Large Language Model with Video Adapter for Generic Event Boundary Captioning.

[BibT_eX]

[DOI]

CoRR, 2023

Caption Anything: Interactive Image Description with Diverse Multimodal Controls.

[BibT_eX]

[DOI]

CoRR, 2023

2022

Multi-modal Segment Assemblage Network for Ad Video Editing with Importance-Coherence Reward.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ACCV 2022, 2022

Yunlong Tang

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...