Zhenheng Yang

Orcid: 0000-0003-0303-5885

According to our database¹, Zhenheng Yang authored at least 49 papers between 2016 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

VirtueBench: Evaluating Trustworthiness under Uncertainty in Long Video Understanding.

[BibT_eX]

[DOI]

CoRR, March, 2026

UniWeTok: An Unified Binary Tokenizer with Codebook Size 2128 for Unified Multimodal Large Language Model.

[BibT_eX]

[DOI]

CoRR, February, 2026

BitDance: Scaling Autoregressive Generative Models with Binary Tokens.

[BibT_eX]

[DOI]

CoRR, February, 2026

Implicit Neural Representation Facilitates Unified Universal Vision Encoding.

[BibT_eX]

[DOI]

CoRR, January, 2026

Dynamic Content Moderation in Livestreams: Combining Supervised Classification with MLLM-Boosted Similarity Matching.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.1, 2026

UniAPO: Unified Multimodal Automated Prompt Optimization.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025

End-to-End Training for Autoregressive Video Diffusion via Self-Resampling.

[BibT_eX]

[DOI]

CoRR, December, 2025

AgentComp: From Agentic Reasoning to Compositional Mastery in Text-to-Image Models.

[BibT_eX]

[DOI]

CoRR, December, 2025

The Image as Its Own Reward: Reinforcement Learning with Adversarial Reward for Image Generation.

[BibT_eX]

[DOI]

CoRR, November, 2025

FOCUS: Efficient Keyframe Selection for Long Video Understanding.

[BibT_eX]

[DOI]

CoRR, October, 2025

Improving Text-to-Image Generation with Input-Side Inference-Time Scaling.

[BibT_eX]

[DOI]

CoRR, October, 2025

Growing Visual Generative Capacity for Pre-Trained MLLMs.

[BibT_eX]

[DOI]

CoRR, October, 2025

Mixture of Contexts for Long Video Generation.

[BibT_eX]

[DOI]

CoRR, August, 2025

UniCode2: Cascaded Large-scale Codebooks for Unified Multimodal Understanding and Generation.

[BibT_eX]

[DOI]

CoRR, June, 2025

Show-o2: Improved Native Unified Multimodal Models.

[BibT_eX]

[DOI]

Jinheng Xie

Zhenheng Yang

Mike Zheng Shou

CoRR, June, 2025

MotionSight: Boosting Fine-Grained Motion Understanding in Multimodal LLMs.

[BibT_eX]

[DOI]

CoRR, June, 2025

UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning.

[BibT_eX]

[DOI]

Weijia Mao

Zhenheng Yang

Mike Zheng Shou

CoRR, May, 2025

Cream of the Crop: Harvesting Rich, Scalable and Transferable Multi-Modal Data for Instruction Fine-Tuning.

[BibT_eX]

[DOI]

CoRR, March, 2025

UniMoD: Efficient Unified Multimodal Transformers with Mixture-of-Depths.

[BibT_eX]

[DOI]

Weijia Mao

Zhenheng Yang

Mike Zheng Shou

CoRR, February, 2025

DiCo: Revitalizing ConvNets for Scalable and Efficient Diffusion Modeling.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

COEF-VQ: Cost-Efficient Video Quality Understanding through a Cascaded Multimodal LLM Framework.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, V.2, 2025

Show-o: One Single Transformer to Unify Multimodal Understanding and Generation.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Star: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Long Context Tuning for Video Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2025, 2025

Parallelized Autoregressive Visual Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2021

Weakly Supervised Instance Segmentation for Videos With Temporal Mask Consistency.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020

Every Pixel Counts ++: Joint Learning of Geometry and Motion with 3D Holistic Understanding.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2020

Enhancing Model Parallelism in Neural Architecture Search for Multidevice System.

[BibT_eX]

[DOI]

IEEE Micro, 2020

SPAN: Spatial Pyramid Attention Network forImage Manipulation Localization.

[BibT_eX]

[DOI]

CoRR, 2020

SPAN: Spatial Pyramid Attention Network for Image Manipulation Localization.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

2019

Activity Driven Weakly Supervised Object Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

UnOS: Unified Unsupervised Optical-Flow and Stereo-Depth Estimation by Watching Videos.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018

Joint Unsupervised Learning of Optical Flow and Depth by Watching Stereo Videos.

[BibT_eX]

[DOI]

CoRR, 2018

Face and Body Association for Video-Based Face Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision, 2018

Every Pixel Counts: Unsupervised Geometry Learning with Holistic 3D Motion Understanding.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2018 Workshops, 2018

LEGO: Learning Edge With Geometry All at Once by Watching Videos.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Occlusion Aware Unsupervised Learning of Optical Flow.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Unsupervised Learning of Geometry From Videos With Edge-Aware Depth-Normal Consistency.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

2017

Occlusion Aware Unsupervised Learning of Optical Flow.

[BibT_eX]

[DOI]

CoRR, 2017

Unsupervised Learning of Geometry with Edge-aware Depth-Normal Consistency.

[BibT_eX]

[DOI]

CoRR, 2017

TURN TAP: Temporal Unit Regression Network for Temporal Action Proposals.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Computer Vision, 2017

TALL: Temporal Activity Localization via Language Query.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Computer Vision, 2017

Spatio-Temporal Action Detection with Cascade Proposal and Location Anticipation.

[BibT_eX]

[DOI]

Zhenheng Yang

Jiyang Gao

Ram Nevatia

Proceedings of the British Machine Vision Conference 2017, 2017

RED: Reinforced Encoder-Decoder Networks for Action Anticipation.

[BibT_eX]

[DOI]

Jiyang Gao

Zhenheng Yang

Ram Nevatia

Proceedings of the British Machine Vision Conference 2017, 2017

Cascaded Boundary Regression for Temporal Action Detection.

[BibT_eX]

[DOI]

Jiyang Gao

Zhenheng Yang

Ram Nevatia

Proceedings of the British Machine Vision Conference 2017, 2017

2016

A multi-scale cascade fully convolutional network face detector.

[BibT_eX]

[DOI]

Zhenheng Yang

Ramakant Nevatia

Proceedings of the 23rd International Conference on Pattern Recognition, 2016

Zhenheng Yang

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...