Yunlong Tang

Orcid: 0000-0003-2796-1787

Affiliations:
  • University of Rochester, NY, USA
  • Tencent (China), Shenzhen, China (former)
  • Southern University of Science and Technology, Department of Computer Science and Engineering, Shenzhen, China (former)


According to our database1, Yunlong Tang authored at least 29 papers between 2022 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
Can Sound Replace Vision in LLaVA With Token Substitution?
CoRR, June, 2025

ZeroSep: Separate Anything in Audio with Zero Training.
CoRR, May, 2025

MMPerspective: Do MLLMs Understand Perspective? A Comprehensive Benchmark for Perspective Perception, Reasoning, and Robustness.
CoRR, May, 2025

MMIG-Bench: Towards Comprehensive and Explainable Evaluation of Multi-Modal Image Generation Models.
CoRR, May, 2025

The Sword of Damocles in ViTs: Computational Redundancy Amplifies Adversarial Transferability.
CoRR, April, 2025

The Tenth NTIRE 2025 Efficient Super-Resolution Challenge Report.
CoRR, April, 2025

Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting.
CoRR, April, 2025

Why Reasoning Matters? A Survey of Advancements in Multimodal Reasoning (v1).
CoRR, April, 2025

FreSca: Unveiling the Scaling Space in Diffusion Models.
CoRR, April, 2025

VERIFY: A Benchmark of Visual Explanation and Reasoning for Investigating Multimodal Reasoning Fidelity.
CoRR, March, 2025

Generative AI for Cel-Animation: A Survey.
CoRR, January, 2025

VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Unveiling Visual Perception in Language Models: An Attention Head Analysis Approach.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

CaRDiff: Video Salient Object Ranking Chain of Thought Reasoning for Saliency Prediction with Diffusion.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

Empowering LLMs with Pseudo-Untrimmed Videos for Audio-Visual Temporal Understanding.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

GaussianStyle: Gaussian Head Avatar via StyleGAN.
Proceedings of the International Conference on 3D Vision, 2025

2024
Unveiling Visual Perception in Language Models: An Attention Head Analysis Approach.
CoRR, 2024

Scaling Concept With Text-Guided Diffusion Models.
CoRR, 2024

MMCOMPOSITION: Revisiting the Compositionality of Pre-trained Vision-Language Models.
CoRR, 2024

Do More Details Always Introduce More Hallucinations in LVLM-based Image Captioning?
CoRR, 2024

AVicuna: Audio-Visual LLM with Interleaver and Context-Boundary Alignment for Temporal Referential Dialogue.
CoRR, 2024

EAGLE: Egocentric AGgregated Language-video Engine.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024


2023
Video Understanding with Large Language Models: A Survey.
CoRR, 2023

LaunchpadGPT: Language Model as Music Visualization Designer on Launchpad.
CoRR, 2023

LLMVA-GEBC: Large Language Model with Video Adapter for Generic Event Boundary Captioning.
CoRR, 2023

Caption Anything: Interactive Image Description with Diverse Multimodal Controls.
CoRR, 2023

2022
Multi-modal Segment Assemblage Network for Ad Video Editing with Importance-Coherence Reward.
Proceedings of the Computer Vision - ACCV 2022, 2022


  Loading...