Anyi Rao

Orcid: 0000-0003-1004-7753

According to our database1, Anyi Rao authored at least 52 papers between 2017 and 2026.

Collaborative distances:

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

Online presence:

On csauthors.net:

Bibliography

2026
Pseudo-Unification: Entropy Probing Reveals Divergent Information Patterns in Unified Multimodal Models.
CoRR, April, 2026

InstanceAnimator: Multi-Instance Sketch Video Colorization.
CoRR, March, 2026

Controllable Text-to-Motion Generation via Modular Body-Part Phase Control.
CoRR, March, 2026

Astrolabe: Steering Forward-Process Reinforcement Learning for Distilled Autoregressive Video Models.
CoRR, March, 2026

ShotVerse: Advancing Cinematic Camera Control for Text-Driven Multi-Shot Video Creation.
CoRR, March, 2026

SesaHand: Enhancing 3D Hand Reconstruction via Controllable Generation with Semantic and Structural Alignment.
CoRR, March, 2026

Collaposer: Transforming Photo Collections into Visual Assets for Storytelling with Collages.
Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems, 2026

2025
Pretraining Frame Preservation in Autoregressive Video Memory Compression.
CoRR, December, 2025

Composing Concepts from Images and Videos via Concept-prompt Binding.
CoRR, December, 2025

Hollywood Town: Long-Video Generation via Cross-Modal Multi-Agent Orchestration.
CoRR, October, 2025

Taming Flow-based I2V Models for Creative Video Editing.
CoRR, September, 2025

Dense Semantic Matching with VGGT Prior.
CoRR, September, 2025

DataSway: Vivifying Metaphoric Visualization with Animation Clip Generation and Coordination.
CoRR, July, 2025

Light of Normals: Unified Feature Representation for Universal Photometric Stereo.
CoRR, June, 2025

ProFashion: Prototype-guided Fashion Video Generation with Multiple Reference Images.
CoRR, May, 2025

Simulating the Real World: A Unified Survey of Multimodal Generative Models.
CoRR, March, 2025

CineVision: An Interactive Pre-visualization Storyboard System for Director-Cinematographer Collaboration.
Proceedings of the 38th Annual ACM Symposium on User Interface Software and Technology, 2025

Scaling In-the-Wild Training for Diffusion-based Illumination Harmonization and Editing by Imposing Consistent Light Transport.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Light-a-Video: Training-Free Video Relighting via Progressive Light Fusion.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025


Keyframe-Guided Creative Video Inpainting.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024
Mindalogue: LLM-Powered Nonlinear Interaction for Effective Learning and Task Exploration.
CoRR, 2024

CinePreGen: Camera Controllable Video Previsualization via Engine-powered Diffusion.
CoRR, 2024

ScriptViz: A Visualization Tool to Aid Scriptwriting based on a Large Movie Database.
Proceedings of the 37th Annual ACM Symposium on User Interface Software and Technology, 2024

Generative Models for Visual Content Editing and Creation.
Proceedings of the ACM SIGGRAPH 2024 Courses, 2024

CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models.
Proceedings of the Computer Vision - ECCV 2024, 2024

Cinematic Behavior Transfer via NeRF-based Differentiable Filming.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
A Coarse-to-Fine Framework for Automatic Video Unscreen.
IEEE Trans. Multim., 2023

AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning.
CoRR, 2023

Automated Conversion of Music Videos into Lyric Videos.
Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, 2023

Dynamic Storyboard Generation in an Engine-based Virtual Environment for Video Production.
Proceedings of the ACM SIGGRAPH 2023 Posters, 2023

Zero-shot Skeleton-based Action Recognition via Mutual Information Estimation and Maximization.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

HireVAE: An Online and Adaptive Factor Model Based on Hierarchical and Regime-Switch VAE.
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

Adding Conditional Control to Text-to-Image Diffusion Models.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Self-Supervised Action Representation Learning from Partial Spatio-Temporal Skeleton Sequences.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
Jointly Learning the Attributes and Composition of Shots for Boundary Detection in Videos.
IEEE Trans. Multim., 2022

Temporal and Contextual Transformer for Multi-Camera Editing of TV Shows.
CoRR, 2022

A Molecular Multimodal Foundation Model Associating Molecule Graphs with Natural Language.
CoRR, 2022

Shoot360: Normal View Video Creation from City Panorama Footage.
Proceedings of the SIGGRAPH '22: Special Interest Group on Computer Graphics and Interactive Techniques Conference, Vancouver, BC, Canada, August 7, 2022

BungeeNeRF: Progressive Neural Radiance Field for Extreme Multi-scale Scene Rendering.
Proceedings of the Computer Vision - ECCV 2022, 2022

AutoGPart: Intermediate Supervision Search for Generalizable 3D Part Segmentation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021
CityNeRF: Building NeRF at City Scale.
CoRR, 2021

BlockPlanner: City Block Generation with Vectorized Graph Representation.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

2020
Online Multi-modal Person Search in Videos.
Proceedings of the Computer Vision - ECCV 2020, 2020

A Unified Framework for Shot Type Classification Based on Subject Centric Lens.
Proceedings of the Computer Vision - ECCV 2020, 2020

MovieNet: A Holistic Dataset for Movie Understanding.
Proceedings of the Computer Vision - ECCV 2020, 2020

A Local-to-Global Approach to Multi-Modal Movie Scene Segmentation.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2018
Automatic Music Accompanist.
CoRR, 2018

HotFlip: White-Box Adversarial Examples for Text Classification.
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018

2017
HotFlip: White-Box Adversarial Examples for NLP.
CoRR, 2017


  Loading...