Yaohui Wang

Orcid: 0009-0002-9487-6187

Affiliations:
  • Shanghai Artificial Intelligence Laboratory, China
  • University of Côte d'Azur, Nice, France (PhD 2021)
  • INRIA, STARS, Sophia-Antipolis, France (former)


According to our database1, Yaohui Wang authored at least 59 papers between 2018 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
Vinci: A Real-time Smart Assistant Based on Egocentric Vision-language Model for Portable Devices.
Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., September, 2025

CineTrans: Learning to Generate Videos with Cinematic Transitions via Masked Diffusion Models.
CoRR, August, 2025

LIA-X: Interpretable Latent Portrait Animator.
CoRR, August, 2025

Consistent and Controllable Image Animation with Motion Linear Diffusion Transformers.
CoRR, August, 2025

GenHOI: Generalizing Text-driven 4D Human-Object Interaction Synthesis for Unseen Objects.
CoRR, June, 2025

LaVie: High-Quality Video Generation with Cascaded Latent Diffusion Models.
Int. J. Comput. Vis., May, 2025

Training-free Stylized Text-to-Image Generation with Fast Inference.
CoRR, May, 2025

LEO: Generative Latent Image Animator for Human Video Synthesis.
Int. J. Comput. Vis., March, 2025

AccVideo: Accelerating Video Diffusion Model with Synthetic Dataset.
CoRR, March, 2025

TimeStep Master: Asymmetrical Mixture of Timestep LoRA Experts for Versatile and Efficient Diffusion Models in Vision.
CoRR, March, 2025

An Egocentric Vision-Language Model based Portable Real-time Smart Assistant.
CoRR, March, 2025

Dimitra: Audio-driven Diffusion model for Expressive Talking Head Generation.
CoRR, February, 2025

Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models.
CoRR, January, 2025

Latte: Latent Diffusion Transformer for Video Generation.
Trans. Mach. Learn. Res., 2025

Ouroboros3D: Image-to-3D Generation via 3D-aware Recursive Diffusion.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Consistent and Controllable Image Animation with Motion Diffusion Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

The Devil is in the Prompts: Retrieval-Augmented Prompt Optimization for Text-to-Video Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024
LIA: Latent Image Animator.
IEEE Trans. Pattern Anal. Mach. Intell., December, 2024

View-Invariant Skeleton Action Representation Learning via Motion Retargeting.
Int. J. Comput. Vis., July, 2024

Uncertainty-aware image inpainting with adaptive feedback network.
Expert Syst. Appl., January, 2024

Vinci: A Real-time Embodied Smart Assistant based on Egocentric Vision-Language Model.
CoRR, 2024

VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models.
CoRR, 2024

Cinemo: Consistent and Controllable Image Animation with Motion Diffusion Models.
CoRR, 2024

Ouroboros3D: Image-to-3D Generation via 3D-aware Recursive Diffusion.
CoRR, 2024

Hierarchical Diffusion Autoencoders and Disentangled Image Manipulation.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024

4Diffusion: Multi-view Video Diffusion Model for 4D Generation.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Vlogger: Make Your Dream A Vlog.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

SinSR: Diffusion-Based Image Super-Resolution in a Single Step.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

EpiDiff: Enhancing Multi-View Synthesis via Localized Epipolar-Constrained Diffusion.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

VBench: Comprehensive Benchmark Suite for Video Generative Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Brush Your Text: Synthesize Any Scene Text on Images via Diffusion Model.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

ConditionVideo: Training-Free Condition-Guided Video Generation.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Learning Invariance From Generated Variance for Unsupervised Person Re-Identification.
IEEE Trans. Pattern Anal. Mach. Intell., June, 2023

ConditionVideo: Training-Free Condition-Guided Text-to-Video Generation.
CoRR, 2023

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation.
CoRR, 2023

AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning.
CoRR, 2023

LEO: Generative Latent Image Animator for Human Video Synthesis.
CoRR, 2023

Long-Term Rhythmic Video Soundtracker.
Proceedings of the International Conference on Machine Learning, 2023

LAC - Latent Action Composition for Skeleton-based Action Segmentation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Self-Supervised Video Representation Learning via Latent Time Navigation.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
ViA: View-invariant Skeleton Action Representation Learning via Motion Retargeting.
CoRR, 2022

Latent Image Animator: Learning to Animate Images via Latent Space Navigation.
Proceedings of the Tenth International Conference on Learning Representations, 2022

2021
Learning to Generate Human Videos. (Apprendre à Générer des Vidéos de Personnes).
PhD thesis, 2021

InMoDeGAN: Interpretable Motion Decomposition Generative Adversarial Network for Video Generation.
CoRR, 2021

Selective Spatio-Temporal Aggregation Based Pose Refinement System: Towards Understanding Human Activities in Real-World Videos.
Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2021

Self-Supervised Video Pose Representation Learning for Occlusion- Robust Action Recognition.
Proceedings of the 16th IEEE International Conference on Automatic Face and Gesture Recognition, 2021

Emotion Editing in Head Reenactment Videos using Latent Space Manipulation.
Proceedings of the 16th IEEE International Conference on Automatic Face and Gesture Recognition, 2021

Joint Generative and Contrastive Learning for Unsupervised Person Re-Identification.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

UNIK: A Unified Framework for Real-world Skeleton-based Action Recognition.
Proceedings of the 32nd British Machine Vision Conference 2021, 2021

2020
ImaGINator: Conditional Spatio-Temporal GAN for Video Generation.
Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2020

A video is worth more than 1000 lies. Comparing 3DCNN approaches for detecting deepfakes.
Proceedings of the 15th IEEE International Conference on Automatic Face and Gesture Recognition, 2020

G3AN: Disentangling Appearance and Motion for Video Generation.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019
G<sup>3</sup>AN: This video does not exist. Disentangling motion and appearance for video generation.
CoRR, 2019

2018
Comparing Methods for Assessment of Facial Dynamics in Patients with Major Neurocognitive Disorders.
Proceedings of the Computer Vision - ECCV 2018 Workshops, 2018

From Attribute-Labels to Faces: Face Generation Using a Conditional Generative Adversarial Network.
Proceedings of the Computer Vision - ECCV 2018 Workshops, 2018

From attributes to faces: a conditional generative network for face generation.
Proceedings of the 2018 International Conference of the Biometrics Special Interest Group, 2018


  Loading...