We stand with Ukraine

We stand with Ukraine

Anyi Rao

Orcid: 0000-0003-1004-7753

According to our database¹, Anyi Rao authored at least 55 papers between 2017 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

Online presence:

on orcid.org

On csauthors.net:

Bibliography

2026

Triplet-Block Diffusion RWKV.

[DOI]

,

,

,

,

CoRR, May, 2026

EvalVerse: Pipeline-Aware and Expert-Calibrated Benchmarking for Professional Cinematic Video Generation.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Maneesh Agrawala

,

,

,

CoRR, May, 2026

UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors.

[DOI]

,

,

,

,

,

,

,

,

,

,

CoRR, May, 2026

Pseudo-Unification: Entropy Probing Reveals Divergent Information Patterns in Unified Multimodal Models.

[DOI]

,

,

CoRR, April, 2026

InstanceAnimator: Multi-Instance Sketch Video Colorization.

[DOI]

,

,

,

,

,

,

,

CoRR, March, 2026

Controllable Text-to-Motion Generation via Modular Body-Part Phase Control.

[DOI]

,

,

,

,

CoRR, March, 2026

Astrolabe: Steering Forward-Process Reinforcement Learning for Distilled Autoregressive Video Models.

[DOI]

,

,

,

,

,

,

,

,

CoRR, March, 2026

ShotVerse: Advancing Cinematic Camera Control for Text-Driven Multi-Shot Video Creation.

[DOI]

,

,

,

,

,

,

,

,

,

CoRR, March, 2026

SesaHand: Enhancing 3D Hand Reconstruction via Controllable Generation with Semantic and Structural Alignment.

[DOI]

,

,

,

,

,

CoRR, March, 2026

Collaposer: Transforming Photo Collections into Visual Assets for Storytelling with Collages.

[DOI]

,

,

,

,

,

Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems, 2026

DataSway: Vivifying Metaphoric Visualization with Animation Clip Generation and Coordination.

[DOI]

,

,

,

,

Proceedings of the 2026 Designing Interactive Systems Conference, 2026

2025

Pretraining Frame Preservation in Autoregressive Video Memory Compression.

[DOI]

,

,

,

,

,

,

,

Gordon Wetzstein

,

Maneesh Agrawala

CoRR, December, 2025

Composing Concepts from Images and Videos via Concept-prompt Binding.

[DOI]

,

,

,

,

,

CoRR, December, 2025

Hollywood Town: Long-Video Generation via Cross-Modal Multi-Agent Orchestration.

[DOI]

,

,

,

,

,

,

,

Maneesh Agrawala

,

CoRR, October, 2025

Taming Flow-based I2V Models for Creative Video Editing.

[DOI]

,

,

,

,

Gordon Wetzstein

,

Maneesh Agrawala

,

CoRR, September, 2025

Dense Semantic Matching with VGGT Prior.

[DOI]

,

,

,

,

,

CoRR, September, 2025

Light of Normals: Unified Feature Representation for Universal Photometric Stereo.

[DOI]

,

,

,

,

,

,

,

,

,

,

Satoshi Ikehata

,

,

,

CoRR, June, 2025

ProFashion: Prototype-guided Fashion Video Generation with Multiple Reference Images.

[DOI]

,

,

,

,

,

,

,

CoRR, May, 2025

Simulating the Real World: A Unified Survey of Multimodal Generative Models.

[DOI]

,

,

,

,

,

,

,

,

,

CoRR, March, 2025

CineVision: An Interactive Pre-visualization Storyboard System for Director-Cinematographer Collaboration.

[DOI]

,

,

,

,

,

,

Maneesh Agrawala

,

,

Proceedings of the 38th Annual ACM Symposium on User Interface Software and Technology, 2025

Scaling In-the-Wild Training for Diffusion-based Illumination Harmonization and Editing by Imposing Consistent Light Transport.

[DOI]

,

,

Maneesh Agrawala

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Light-a-Video: Training-Free Video Relighting via Progressive Light Fusion.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Generative AI for Film Creation: A Survey of Recent Advances.

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2025

Keyframe-Guided Creative Video Inpainting.

[DOI]

,

,

,

,

,

,

Maneesh Agrawala

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024

Mindalogue: LLM-Powered Nonlinear Interaction for Effective Learning and Task Exploration.

[DOI]

,

,

,

,

CoRR, 2024

CinePreGen: Camera Controllable Video Previsualization via Engine-powered Diffusion.

[DOI]

,

,

,

,

,

,

,

CoRR, 2024

ScriptViz: A Visualization Tool to Aid Scriptwriting based on a Large Movie Database.

[DOI]

,

Jean-Peïc Chou

,

Maneesh Agrawala

Proceedings of the 37th Annual ACM Symposium on User Interface Software and Technology, 2024

Generative Models for Visual Content Editing and Creation.

[DOI]

,

,

,

,

,

Maneesh Agrawala

Proceedings of the ACM SIGGRAPH 2024 Courses, 2024

CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers.

[DOI]

,

,

,

,

,

Proceedings of the Forty-first International Conference on Machine Learning, 2024

AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning.

[DOI]

,

,

,

Zhengyang Liang

,

,

,

Maneesh Agrawala

,

,

Proceedings of the Twelfth International Conference on Learning Representations, 2024

SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models.

[DOI]

,

,

,

Maneesh Agrawala

,

,

Proceedings of the Computer Vision - ECCV 2024, 2024

Cinematic Behavior Transfer via NeRF-based Differentiable Filming.

[DOI]

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

A Coarse-to-Fine Framework for Automatic Video Unscreen.

[DOI]

,

,

,

,

,

,

IEEE Trans. Multim., 2023

AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning.

[DOI]

,

,

,

,

,

,

CoRR, 2023

Automated Conversion of Music Videos into Lyric Videos.

[DOI]

,

,

,

Rubaiat Habib Kazi

,

Hijung Valentina Shin

,

Maneesh Agrawala

Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, 2023

Dynamic Storyboard Generation in an Engine-based Virtual Environment for Video Production.

[DOI]

,

,

,

,

,

,

,

Proceedings of the ACM SIGGRAPH 2023 Posters, 2023

Zero-shot Skeleton-based Action Recognition via Mutual Information Estimation and Maximization.

[DOI]

,

,

,

,

,

Proceedings of the 31st ACM International Conference on Multimedia, 2023

HireVAE: An Online and Adaptive Factor Model Based on Hierarchical and Regime-Switch VAE.

[DOI]

,

,

,

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

Adding Conditional Control to Text-to-Image Diffusion Models.

[DOI]

,

,

Maneesh Agrawala

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Self-Supervised Action Representation Learning from Partial Spatio-Temporal Skeleton Sequences.

[DOI]

,

,

,

,

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022

Jointly Learning the Attributes and Composition of Shots for Boundary Detection in Videos.

[DOI]

,

,

,

,

IEEE Trans. Multim., 2022

Temporal and Contextual Transformer for Multi-Camera Editing of TV Shows.

[DOI]

,

,

,

,

,

,

,

,

,

CoRR, 2022

A Molecular Multimodal Foundation Model Associating Molecule Graphs with Natural Language.

[DOI]

,

,

,

,

,

,

,

,

CoRR, 2022

Shoot360: Normal View Video Creation from City Panorama Footage.

[DOI]

,

,

Proceedings of the SIGGRAPH '22: Special Interest Group on Computer Graphics and Interactive Techniques Conference, Vancouver, BC, Canada, August 7, 2022

BungeeNeRF: Progressive Neural Radiance Field for Extreme Multi-scale Scene Rendering.

[DOI]

,

,

,

,

,

Christian Theobalt

,

,

Proceedings of the Computer Vision - ECCV 2022, 2022

AutoGPart: Intermediate Supervision Search for Generalizable 3D Part Segmentation.

[DOI]

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021

CityNeRF: Building NeRF at City Scale.

[DOI]

,

,

,

,

,

Christian Theobalt

,

,

CoRR, 2021

BlockPlanner: City Block Generation with Vectorized Graph Representation.

[DOI]

,

,

,

,

,

,

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

2020

Online Multi-modal Person Search in Videos.

[DOI]

,

,

,

,

,

Proceedings of the Computer Vision - ECCV 2020, 2020

A Unified Framework for Shot Type Classification Based on Subject Centric Lens.

[DOI]

,

,

,

,

,

,

Proceedings of the Computer Vision - ECCV 2020, 2020

MovieNet: A Holistic Dataset for Movie Understanding.

[DOI]

,

,

,

,

Proceedings of the Computer Vision - ECCV 2020, 2020

A Local-to-Global Approach to Multi-Modal Movie Scene Segmentation.

[DOI]

,

,

,

,

,

,

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2018

Automatic Music Accompanist.

[DOI]

,

Francis C. M. Lau

CoRR, 2018

HotFlip: White-Box Adversarial Examples for Text Classification.

[DOI]

,

,

,

Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018

2017

HotFlip: White-Box Adversarial Examples for NLP.

[DOI]

,

,

,

CoRR, 2017

Loading...