Manyuan Zhang

Orcid: 0009-0003-2148-1085

According to our database1, Manyuan Zhang authored at least 46 papers between 2018 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
Uni-Edit: Intelligent Editing Is A General Task For Unified Model Tuning.
CoRR, May, 2026

4DThinker: Thinking with 4D Imagery for Dynamic Spatial Understanding.
CoRR, May, 2026

OpenGame: Open Agentic Coding for Games.
CoRR, April, 2026

Unify-Agent: A Unified Multimodal Agent for World-Grounded Image Synthesis.
CoRR, March, 2026

Gen-Searcher: Reinforcing Agentic Search for Image Generation.
CoRR, March, 2026

AutoWeather4D: Autonomous Driving Video Weather Conversion via G-Buffer Dual-Pass Editing.
CoRR, March, 2026

MACRO: Advancing Multi-Reference Image Generation with Structured Long-Context Data.
CoRR, March, 2026

RPiAE: A Representation-Pivoted Autoencoder Enhancing Both Image Generation and Editing.
CoRR, March, 2026

Holi-Spatial: Evolving Video Streams into Holistic 3D Spatial Intelligence.
CoRR, March, 2026

OneVision-Encoder: Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence.
CoRR, February, 2026

OmniVideo-R1: Reinforcing Audio-visual Reasoning with Query Intention and Modality Attention.
CoRR, February, 2026

Exploring Reasoning Reward Model for Agents.
CoRR, January, 2026

CTR3D: Cross-View Token Reduction for Dense Multi-View Generation.
Proceedings of the International Conference on 3D Visio, 2026

2025
AdaTooler-V: Adaptive Tool-Use for Images and Videos.
CoRR, December, 2025

OpenSubject: Leveraging Video-Derived Identity and Diversity Priors for Subject-driven Image Generation and Manipulation.
CoRR, December, 2025

EditThinker: Unlocking Iterative Reasoning for Any Image Editor.
CoRR, December, 2025

OneThinker: All-in-one Reasoning Model for Image and Video.
CoRR, December, 2025

AlignVid: Training-Free Attention Scaling for Semantic Fidelity in Text-Guided Image-to-Video Generation.
CoRR, December, 2025

Architecture Decoupling Is Not All You Need For Unified Multimodal Model.
CoRR, November, 2025

Thinking-while-Generating: Interleaving Textual Reasoning throughout Visual Generation.
CoRR, November, 2025

Are Video Models Ready as Zero-Shot Reasoners? An Empirical Study with the MME-CoF Benchmark.
CoRR, October, 2025

IGGT: Instance-Grounded Geometry Transformer for Semantic 3D Reconstruction.
CoRR, October, 2025

Think with 3D: Geometric Imagination Grounded Spatial Reasoning from Limited Views.
CoRR, October, 2025

CodePlot-CoT: Mathematical Visual Reasoning by Thinking with Code-Driven Images.
CoRR, October, 2025

ARES: Multimodal Adaptive Reasoning via Difficulty-Aware Token-Level Entropy Shaping.
CoRR, October, 2025

LM-Searcher: Cross-domain Neural Architecture Search with LLMs via Unified Numerical Encoding.
CoRR, September, 2025

Lumina-Image 2.0: A Unified and Efficient Image Generative Framework.
CoRR, March, 2025

LM-Searcher: Cross-domain Neural Architecture Search with LLMs via Unified Numerical Encoding.
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

Let's Verify and Reinforce Image Generation Step by Step.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024
Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling.
Proceedings of the ACM SIGGRAPH 2024 Conference Papers, 2024

Three Things We Need to Know About Transferring Stable Diffusion to Visual Dense Prediction Tasks.
Proceedings of the Computer Vision - ECCV 2024, 2024

Deep Reward Supervisions for Tuning Text-to-Image Diffusion Models.
Proceedings of the Computer Vision - ECCV 2024, 2024

2023
Towards Large-scale Masked Face Recognition.
CoRR, 2023

Decoupled DETR: Spatially Disentangling Localization and Classification for Improved End-to-End Object Detection.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

VideoFlow: Exploiting Temporal Cues for Multi-frame Optical Flow Estimation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

FlowFormer++: Masked Cost Volume Autoencoding for Pretraining Optical Flow Estimation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
Towards Robust Face Recognition with Comprehensive Search.
Proceedings of the Computer Vision - ECCV 2022, 2022

2021
Switchable K-class Hyperplanes for Noise-Robust Representation Learning.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

2020
Complementary Boundary Generator with Scale-Invariant Relation Modeling for Temporal Action Localization: Submission to ActivityNet Challenge 2020.
CoRR, 2020

1st place solution for AVA-Kinetics Crossover in AcitivityNet Challenge 2020.
CoRR, 2020

Top-1 Solution of Multi-Moments in Time Challenge 2019.
CoRR, 2020

Discriminability Distillation in Group Representation Learning.
Proceedings of the Computer Vision - ECCV 2020, 2020

2019
Towards Flops-Constrained Face Recognition.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshops, 2019

2018
Privacy-preserving Sensory Data Recovery.
CoRR, 2018

Privacy-Preserving Sensory Data Recovery.
Proceedings of the 17th IEEE International Conference On Trust, 2018

Tensor Sensing for Rf Tomographic Imaging.
Proceedings of the 2018 IEEE International Conference on Multimedia and Expo, 2018


  Loading...