Manyuan Zhang

Orcid: 0009-0003-2148-1085

According to our database¹, Manyuan Zhang authored at least 46 papers between 2018 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

Uni-Edit: Intelligent Editing Is A General Task For Unified Model Tuning.

[BibT_eX]

[DOI]

CoRR, May, 2026

4DThinker: Thinking with 4D Imagery for Dynamic Spatial Understanding.

[BibT_eX]

[DOI]

CoRR, May, 2026

OpenGame: Open Agentic Coding for Games.

[BibT_eX]

[DOI]

CoRR, April, 2026

Unify-Agent: A Unified Multimodal Agent for World-Grounded Image Synthesis.

[BibT_eX]

[DOI]

CoRR, March, 2026

Gen-Searcher: Reinforcing Agentic Search for Image Generation.

[BibT_eX]

[DOI]

CoRR, March, 2026

AutoWeather4D: Autonomous Driving Video Weather Conversion via G-Buffer Dual-Pass Editing.

[BibT_eX]

[DOI]

CoRR, March, 2026

MACRO: Advancing Multi-Reference Image Generation with Structured Long-Context Data.

[BibT_eX]

[DOI]

CoRR, March, 2026

RPiAE: A Representation-Pivoted Autoencoder Enhancing Both Image Generation and Editing.

[BibT_eX]

[DOI]

CoRR, March, 2026

Holi-Spatial: Evolving Video Streams into Holistic 3D Spatial Intelligence.

[BibT_eX]

[DOI]

CoRR, March, 2026

OneVision-Encoder: Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence.

[BibT_eX]

[DOI]

CoRR, February, 2026

OmniVideo-R1: Reinforcing Audio-visual Reasoning with Query Intention and Modality Attention.

[BibT_eX]

[DOI]

CoRR, February, 2026

Exploring Reasoning Reward Model for Agents.

[BibT_eX]

[DOI]

CoRR, January, 2026

CTR3D: Cross-View Token Reduction for Dense Multi-View Generation.

[BibT_eX]

[DOI]

Proceedings of the International Conference on 3D Visio, 2026

2025

AdaTooler-V: Adaptive Tool-Use for Images and Videos.

[BibT_eX]

[DOI]

CoRR, December, 2025

OpenSubject: Leveraging Video-Derived Identity and Diversity Priors for Subject-driven Image Generation and Manipulation.

[BibT_eX]

[DOI]

CoRR, December, 2025

EditThinker: Unlocking Iterative Reasoning for Any Image Editor.

[BibT_eX]

[DOI]

CoRR, December, 2025

OneThinker: All-in-one Reasoning Model for Image and Video.

[BibT_eX]

[DOI]

CoRR, December, 2025

AlignVid: Training-Free Attention Scaling for Semantic Fidelity in Text-Guided Image-to-Video Generation.

[BibT_eX]

[DOI]

CoRR, December, 2025

Architecture Decoupling Is Not All You Need For Unified Multimodal Model.

[BibT_eX]

[DOI]

CoRR, November, 2025

Thinking-while-Generating: Interleaving Textual Reasoning throughout Visual Generation.

[BibT_eX]

[DOI]

CoRR, November, 2025

Are Video Models Ready as Zero-Shot Reasoners? An Empirical Study with the MME-CoF Benchmark.

[BibT_eX]

[DOI]

CoRR, October, 2025

IGGT: Instance-Grounded Geometry Transformer for Semantic 3D Reconstruction.

[BibT_eX]

[DOI]

CoRR, October, 2025

Think with 3D: Geometric Imagination Grounded Spatial Reasoning from Limited Views.

[BibT_eX]

[DOI]

CoRR, October, 2025

CodePlot-CoT: Mathematical Visual Reasoning by Thinking with Code-Driven Images.

[BibT_eX]

[DOI]

CoRR, October, 2025

ARES: Multimodal Adaptive Reasoning via Difficulty-Aware Token-Level Entropy Shaping.

[BibT_eX]

[DOI]

CoRR, October, 2025

LM-Searcher: Cross-domain Neural Architecture Search with LLMs via Unified Numerical Encoding.

[BibT_eX]

[DOI]

CoRR, September, 2025

Lumina-Image 2.0: A Unified and Efficient Image Generative Framework.

[BibT_eX]

[DOI]

CoRR, March, 2025

LM-Searcher: Cross-domain Neural Architecture Search with LLMs via Unified Numerical Encoding.

[BibT_eX]

[DOI]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

Let's Verify and Reinforce Image Generation Step by Step.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024

Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGGRAPH 2024 Conference Papers, 2024

Three Things We Need to Know About Transferring Stable Diffusion to Visual Dense Prediction Tasks.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Deep Reward Supervisions for Tuning Text-to-Image Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

2023

Towards Large-scale Masked Face Recognition.

[BibT_eX]

[DOI]

CoRR, 2023

Decoupled DETR: Spatially Disentangling Localization and Classification for Improved End-to-End Object Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

VideoFlow: Exploiting Temporal Cues for Multi-frame Optical Flow Estimation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

FlowFormer++: Masked Cost Volume Autoencoding for Pretraining Optical Flow Estimation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

Towards Robust Face Recognition with Comprehensive Search.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

2021

Switchable K-class Hyperplanes for Noise-Robust Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

2020

Complementary Boundary Generator with Scale-Invariant Relation Modeling for Temporal Action Localization: Submission to ActivityNet Challenge 2020.

[BibT_eX]

[DOI]

CoRR, 2020

1st place solution for AVA-Kinetics Crossover in AcitivityNet Challenge 2020.

[BibT_eX]

[DOI]

CoRR, 2020

Top-1 Solution of Multi-Moments in Time Challenge 2019.

[BibT_eX]

[DOI]

CoRR, 2020

Discriminability Distillation in Group Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

2019

Towards Flops-Constrained Face Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshops, 2019

2018

Privacy-preserving Sensory Data Recovery.

[BibT_eX]

[DOI]

CoRR, 2018

Privacy-Preserving Sensory Data Recovery.

[BibT_eX]

[DOI]

Proceedings of the 17th IEEE International Conference On Trust, 2018

Tensor Sensing for Rf Tomographic Imaging.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Multimedia and Expo, 2018

Manyuan Zhang

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...