Pengfei Wan

Orcid: 0000-0001-7225-565X

Affiliations:
  • Kuaishou Technology, Beijing, China
  • Meitu Inc., Beijing, China (former)
  • Hong Kong University of Science and Technology, Hong Kong (PhD 2015)


According to our database1, Pengfei Wan authored at least 120 papers between 2012 and 2025.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
Score Augmentation for Diffusion Models.
CoRR, August, 2025

DVIS++: Improved Decoupled Framework for Universal Video Segmentation.
IEEE Trans. Pattern Anal. Mach. Intell., July, 2025

Imbalance in Balance: Online Concept Balancing in Generation Models.
CoRR, July, 2025

VMoBA: Mixture-of-Block Attention for Video Diffusion Models.
CoRR, June, 2025

SimpleGVR: A Simple Baseline for Latent-Cascaded Video Super-Resolution.
CoRR, June, 2025

UNIC: Unified In-Context Video Editing.
CoRR, June, 2025

FullDiT2: Efficient In-Context Conditioning for Video Diffusion Transformers.
CoRR, June, 2025

Context as Memory: Scene-Consistent Interactive Long Video Generation with Memory Retrieval.
CoRR, June, 2025

CamCloneMaster: Enabling Reference-based Camera Control for Video Generation.
CoRR, June, 2025

Learning Video Generation for Robotic Manipulation with Collaborative Trajectory Control.
CoRR, June, 2025

OmniSync: Towards Universal Lip Synchronization via Diffusion Transformers.
CoRR, May, 2025

Scaling Image and Video Generation via Test-Time Evolutionary Search.
CoRR, May, 2025

Training-Free Efficient Video Generation via Dynamic Token Carving.
CoRR, May, 2025

VFRTok: Variable Frame Rates Video Tokenizer with Duration-Proportional Information Assumption.
CoRR, May, 2025

Flow-GRPO: Training Flow Matching Models via Online RL.
CoRR, May, 2025

A Survey of Interactive Generative Video.
CoRR, April, 2025

BadVideo: Stealthy Backdoor Attack against Text-to-Video Generation.
CoRR, April, 2025

SPF-Portrait: Towards Pure Portrait Customization with Semantic Pollution-Free Fine-tuning.
CoRR, April, 2025

Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation.
CoRR, March, 2025

HumanAesExpert: Advancing a Multi-Modality Foundation Model for Human Image Aesthetic Assessment.
CoRR, March, 2025

SARGes: Semantically Aligned Reliable Gesture Generation via Intent Chain.
CoRR, March, 2025

FullDiT: Multi-Task Video Generative Foundation Model with Full Attention.
CoRR, March, 2025

Boosting Resolution Generalization of Diffusion Transformers with Randomized Positional Encodings.
CoRR, March, 2025

Position: Interactive Generative Video as Next-Generation Game Engine.
CoRR, March, 2025

DiffMoE: Dynamic Token Selection for Scalable Diffusion Transformers.
CoRR, March, 2025

ReCamMaster: Camera-Controlled Generative Rendering from A Single Video.
CoRR, March, 2025

MTV-Inpaint: Multi-Task Long Video Inpainting.
CoRR, March, 2025

ExGes: Expressive Human Motion Retrieval and Modulation for Audio-Driven Gesture Synthesis.
CoRR, March, 2025

RectifiedHR: Enable Efficient High-Resolution Image Generation via Energy Rectification.
CoRR, March, 2025

FlexDuo: A Pluggable System for Enabling Full-Duplex Capabilities in Speech Dialogue Systems.
CoRR, February, 2025

CineMaster: A 3D-Aware and Controllable Framework for Cinematic Text-to-Video Generation.
CoRR, February, 2025

Improving Video Generation with Human Feedback.
CoRR, January, 2025

GameFactory: Creating New Games with Generative Interactive Videos.
CoRR, January, 2025

ConceptMaster: Multi-Concept Video Customization on Diffusion Transformer Models Without Test-Time Tuning.
CoRR, January, 2025

3DTrajMaster: Mastering 3D Trajectory for Multi-Entity Motion in Video Generation.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Stable Segment Anything Model.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Cafe-Talk: Generating 3D Talking Face Animation with Multimodal Coarse- and Fine-grained Control.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Towards Precise Scaling Laws for Video Diffusion Transformers.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

StyleMaster: Stylize Your Video with Artistic Generation and Translation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Koala-36M: A Large-scale Video Dataset Improving Consistency between Fine-grained Conditions and Video Content.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Unleashing the Potential of Multi-modal Foundation Models and Video Diffusion for 4D Dynamic Physical Scene Simulation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

SketchVideo: Sketch-based Video Generation and Editing.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

PatchVSR: Breaking Video Diffusion Resolution Limits with Patch-wise Video Super-Resolution.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024
Owl-1: Omni World Model for Consistent Long Video Generation.
CoRR, 2024

SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints.
CoRR, 2024

VIVID-10M: A Dataset and Baseline for Versatile and Interactive Video Local Editing.
CoRR, 2024

MotionGPT-2: A General-Purpose Motion-Language Model for Motion Generation and Understanding.
CoRR, 2024

SEA: Supervised Embedding Alignment for Token-Level Visual-Textual Integration in MLLMs.
CoRR, 2024

ViMo: Generating Motions from Casual Videos.
CoRR, 2024

4Dynamic: Text-to-4D Generation with Hybrid Priors.
CoRR, 2024

LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control.
CoRR, 2024

VideoTetris: Towards Compositional Text-to-Video Generation.
CoRR, 2024

A-SDM: Accelerating Stable Diffusion through Model Assembly and Feature Inheritance Strategies.
CoRR, 2024

SG-Adapter: Enhancing Text-to-Image Generation with Scene Graph Guidance.
CoRR, 2024

UNIAA: A Unified Multi-modal Image Aesthetic Assessment Baseline and Benchmark.
CoRR, 2024

Motion Inversion for Video Customization.
CoRR, 2024

Towards Unified 3D Hair Reconstruction from Single-View Portraits.
Proceedings of the SIGGRAPH Asia 2024 Conference Papers, 2024

VRMM: A Volumetric Relightable Morphable Head Model.
Proceedings of the ACM SIGGRAPH 2024 Conference Papers, 2024

Direct-a-Video: Customized Video Generation with User-Directed Camera Movement and Object Motion.
Proceedings of the ACM SIGGRAPH 2024 Conference Papers, 2024

I2V-Adapter: A General Image-to-Video Adapter for Diffusion Models.
Proceedings of the ACM SIGGRAPH 2024 Conference Papers, 2024

VideoTetris: Towards Compositional Text-to-Video Generation.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

PlacidDreamer: Advancing Harmony in Text-to-3D Generation.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Agent Attention: On the Integration of Softmax and Linear Attention.
Proceedings of the Computer Vision - ECCV 2024, 2024

2023
Multi-Modal Face Stylization with a Generative Prior.
Comput. Graph. Forum, October, 2023

Snowflake Point Deconvolution for Point Cloud Completion and Generation With Skip-Transformer.
IEEE Trans. Pattern Anal. Mach. Intell., May, 2023

EM-Gaze: eye context correlation and metric learning for gaze estimation.
Vis. Comput. Ind. Biomed. Art, 2023

Predicting Personalized Head Movement From Short Video and Speech Signal.
IEEE Trans. Multim., 2023

PMP-Net++: Point Cloud Completion by Transformer-Enhanced Multi-Step Point Moving Paths.
IEEE Trans. Pattern Anal. Mach. Intell., 2023

I2V-Adapter: A General Image-to-Video Adapter for Video Diffusion Models.
CoRR, 2023

Stable Segment Anything Model.
CoRR, 2023

Temporal-Aware Refinement for Video-based Human Pose and Shape Recovery.
CoRR, 2023

1st Place Solution for the 5th LSVOS Challenge: Video Instance Segmentation.
CoRR, 2023

1st Place Solution for PVUW Challenge 2023: Video Panoptic Segmentation.
CoRR, 2023

Towards Practical Capture of High-Fidelity Relightable Avatars.
Proceedings of the SIGGRAPH Asia 2023 Conference Papers, 2023

Augmentation-Aware Self-Supervision for Data-Efficient GAN Training.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Automatic Human Scene Interaction through Contact Estimation and Motion Adaptation.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

DVIS: Decoupled Video Instance Segmentation Framework.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

FEditNet: Few-Shot Editing of Latent Semantics in GAN Spaces.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
Bridging CLIP and StyleGAN through Latent Alignment for Image Editing.
CoRR, 2022

ITTR: Unpaired Image-to-Image Translation with Transformers.
CoRR, 2022

Debiased Self-Training for Semi-Supervised Learning.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Learning an Inference-accelerated Network from a Pre-trained Model with Frequency-enhanced Feature Distillation.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Wavelet Knowledge Distillation: Towards Efficient Image-to-Image Translation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Exploring Set Similarity for Dense Self-supervised Representation Learning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Assessing a Single Image in Reference-Guided Image Synthesis.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
Write-An-Animation: High-level Text-based Animation Editing with Character-Scene Interaction.
Comput. Graph. Forum, 2021

BlendGAN: Implicitly GAN Blending for Arbitrary Stylized Face Generation.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

SnowflakeNet: Point Cloud Completion by Snowflake Point Deconvolution with Skip-Transformer.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

PMP-Net: Point Cloud Completion by Learning Multi-Step Point Moving Paths.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Cycle4Completion: Unpaired Point Cloud Completion Using Cycle Transformation With Missing Region Coding.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Camera-Space Hand Mesh Recovery via Semantic Aggregation and Adaptive 2D-1D Registration.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020
Advancing Image Understanding in Poor Visibility Environments: A Collective Benchmark Study.
IEEE Trans. Image Process., 2020

2019
High Bit-Depth Image Acquisition Framework Using Embedded Quantization Bias.
IEEE Trans. Computational Imaging, 2019

GraphPoseGAN: 3D Hand Pose Estimation from a Monocular RGB Image via Adversarial Learning on Graphs.
CoRR, 2019



2018
Range Scaling Global U-Net for Perceptual Image Enhancement on Mobile Devices.
Proceedings of the Computer Vision - ECCV 2018 Workshops, 2018

2016
Image Bit-Depth Enhancement via Maximum A Posteriori Estimation of AC Signal.
IEEE Trans. Image Process., 2016

2015
Precision Enhancement of 3-D Surfaces from Compressed Multiview Depth Maps.
IEEE Signal Process. Lett., 2015

Motion vector fields based video coding.
Proceedings of the 2015 IEEE International Conference on Image Processing, 2015

2014
Precision Enhancement of 3D Surfaces from Multiple Compressed Depth Maps.
CoRR, 2014

Solving dense stereo matching via quadratic programming.
Proceedings of the 2014 IEEE Visual Communications and Image Processing Conference, 2014

A fast intermode decision algorithm based on analysis of inter prediction residual.
Proceedings of the IEEE 16th International Workshop on Multimedia Signal Processing, 2014

Image bit-depth enhancement via maximum-a-posteriori estimation of graph AC component.
Proceedings of the 2014 IEEE International Conference on Image Processing, 2014

High bit-precision image acquisition and reconstruction by planned sensor distortion.
Proceedings of the 2014 IEEE International Conference on Image Processing, 2014

DCT coefficients generation model for film grain noise and its application in super-resolution.
Proceedings of the 2014 IEEE International Conference on Image Processing, 2014

Palette-based compound image compression in HEVC by exploiting non-local spatial correlation.
Proceedings of the IEEE International Conference on Acoustics, 2014

SSIM-based rate-distortion optimization in H.264.
Proceedings of the IEEE International Conference on Acoustics, 2014

Improved temporal psychovisual modulation for backward-compatible stereoscopic display.
Proceedings of the 2014 IEEE Global Conference on Signal and Information Processing, 2014

Fast binary motion estimation for screen content video coding.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2014

2013
3-D Motion Estimation for Visual Saliency Modeling.
IEEE Signal Process. Lett., 2013

Precision enhancement of 3D surfaces from multiple quantized depth maps.
Proceedings of the 11th IVMSP Workshop: 3D Image/Video Technologies and Applications, 2013

Personal photo album compression and management.
Proceedings of the 2013 IEEE International Symposium on Circuits and Systems (ISCAS2013), 2013

Optimal dependent bit allocation for AVS intra-frame coding via successive convex approximation.
Proceedings of the IEEE International Conference on Image Processing, 2013

A robust interpolation-free approach for sub-pixel accuracy motion estimation.
Proceedings of the IEEE International Conference on Image Processing, 2013

3D motion in visual saliency modeling.
Proceedings of the IEEE International Conference on Acoustics, 2013

2012
From 2D Extrapolation to 1D Interpolation: Content Adaptive Image Bit-Depth Expansion.
Proceedings of the 2012 IEEE International Conference on Multimedia and Expo, 2012

Image de-quantization via spatially varying sparsity prior.
Proceedings of the 19th IEEE International Conference on Image Processing, 2012

Super resolution for subpixel-based downsampled images.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2012


  Loading...