Zhengyang Liang

Orcid: 0009-0008-0205-0163

According to our database1, Zhengyang Liang authored at least 21 papers between 2021 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
Explicit Critic Guidance for Aligning Diffusion Models.
CoRR, May, 2026

DeepXiv-SDK: An Agentic Data Interface for Scientific Literature.
CoRR, March, 2026

VideoCreator: An Agentic System for Multi-turn Video Production.
Proceedings of the 2026 International Conference on Multimedia Retrieval, 2026

2025
Video-BrowseComp: Benchmarking Agentic Video Research on Open Web.
CoRR, December, 2025

UniVA: Universal Video Agent towards Open-Source Next-Generation Video Generalist.
CoRR, November, 2025

TimeScope: Towards Task-Oriented Temporal Grounding In Long Videos.
CoRR, September, 2025

Video-XL-2: Towards Very Long-Video Understanding Through Task-Aware KV Sparsification.
CoRR, June, 2025

MomentSeeker: A Comprehensive Benchmark and A Strong Baseline For Moment Retrieval Within Long Videos.
CoRR, February, 2025

Any Information Is Just Worth One Single Screenshot: Unifying Search With Visualized Information Retrieval.
CoRR, February, 2025

Dynamic Self-adaptive Multiscale Distillation from Pre-trained Multimodal Large Model for Efficient Cross-modal Retrieval.
Proceedings of the 33rd ACM International Conference on Multimedia, 2025

Video-XL: Extra-Long Vision Language Model for Hour-Scale Video Understanding.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Unveiling the Ignorance of MLLMs: Seeing Clearly, Answering Incorrectly.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

MLVU: Benchmarking Multi-task Long Video Understanding.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Any Information Is Just Worth One Single Screenshot: Unifying Search With Visualized Information Retrieval.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024
SwinGNN: Rethinking Permutation Invariance in Diffusion Models for Graph Generation.
Trans. Mach. Learn. Res., 2024

Scaling Laws For Diffusion Transformers.
CoRR, 2024

Seeing Clearly, Answering Incorrectly: A Multimodal Robustness Benchmark for Evaluating MLLMs on Leading Questions.
CoRR, 2024

Dynamic Self-adaptive Multiscale Distillation from Pre-trained Multimodal Large Model for Efficient Cross-modal Representation Learning.
CoRR, 2024

AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Self-Supervised Multi-Modal Knowledge Graph Contrastive Hashing for Cross-Modal Search.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2021
A Hypothesis for the Aesthetic Appreciation in Neural Networks.
CoRR, 2021


  Loading...