Yiren Song

Orcid: 0000-0002-7028-3347

According to our database1, Yiren Song authored at least 66 papers between 2022 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
AnySurf: Any Surface Generation with Directed Edge.
CoRR, May, 2026

SWEET: Sparse World Modeling with Image Editing for Embodied Task Execution.
CoRR, May, 2026

Soap2Soap: Long Cinematic Video Remaking via Multi-Agent Collaboration.
CoRR, May, 2026

VISTA: Triplet-Supervised Video Style Transfer with Diffusion Transformers.
CoRR, May, 2026

StreamingEffect: Real-Time Human-Centric Video Effect Generation.
CoRR, May, 2026

OmniHumanoid: Streaming Cross-Embodiment Video Generation with Paired-Free Adaptation.
CoRR, May, 2026

EditTransfer++: Toward Faithful and Efficient Visual-Prompt-Guided Image Editing.
CoRR, May, 2026

OpenWorldLib: A Unified Codebase and Definition of Advanced World Models.
CoRR, April, 2026

UENR-600K: A Large-Scale Physically Grounded Dataset for Nighttime Video Deraining.
CoRR, April, 2026

Unlocking the Latent Canvas: Eliciting and Benchmarking Symbolic Visual Expression in LLMs.
CoRR, March, 2026

SIGMA: Selective-Interleaved Generation with Multi-Attribute Tokens.
CoRR, February, 2026

MCA-Bench: A Multimodal Benchmark for Evaluating CAPTCHA Robustness Against VLM-based Attacks.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

EasyText: Controllable Diffusion Transformer for Multilingual Text Rendering.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025
Loom: Diffusion-Transformer for Interleaved Generation.
CoRR, December, 2025

Mitty: Diffusion-based Human-to-Robot Video Generation.
CoRR, December, 2025

IC-Effect: Precise and Efficient Video Effects Editing via In-Context Learning.
CoRR, December, 2025

H2R-Grounder: A Paired-Data-Free Paradigm for Translating Human Interaction Videos into Physically Grounded Robot Videos.
CoRR, December, 2025

OmniPSD: Layered PSD Generation with Diffusion Transformer.
CoRR, December, 2025

X-Humanoid: Robotize Human Videos to Generate Humanoid Videos at Scale.
CoRR, December, 2025

TokenPure: Watermark Removal through Tokenized Appearance and Structural Guidance.
CoRR, December, 2025

WorldWander: Bridging Egocentric and Exocentric Worlds in Video Generation.
CoRR, November, 2025

The Consistency Critic: Correcting Inconsistencies in Generated Images via Reference-Guided Attentive Alignment.
CoRR, November, 2025

OmniRefiner: Reinforcement-Guided Local Diffusion Refinement.
CoRR, November, 2025

Edit2Perceive: Image Editing Diffusion Models Are Strong Dense Perceivers.
CoRR, November, 2025

Personalized Vision via Visual In-Context Learning.
CoRR, September, 2025

WordCon: Word-level Typography Control in Scene Text Rendering.
CoRR, June, 2025

MCA-Bench: A Multimodal Benchmark for Evaluating CAPTCHA Robustness Against VLM-based Attacks.
CoRR, June, 2025

Autoregressive Images Watermarking through Lexical Biasing: An Approach Resistant to Regeneration Attack.
CoRR, June, 2025

DiffDecompose: Layer-Wise Decomposition of Alpha-Composited Images via Diffusion Transformers.
CoRR, May, 2025

GRE Suite: Geo-localization Inference via Fine-Tuned Vision-Language Models and Enhanced Reasoning Chains.
CoRR, May, 2025

FocusedAD: Character-centric Movie Audio Description.
CoRR, April, 2025

TransAnimate: Taming Layer Diffusion to Generate RGBA Video.
CoRR, March, 2025

EasyControl: Adding Efficient and Flexible Control for Diffusion Transformer.
CoRR, March, 2025

PhotoDoodle: Learning Artistic Image Editing from Few-Shot Pairwise Data.
CoRR, February, 2025

MakeAnything: Harnessing Diffusion Transformers for Multi-Domain Procedural Sequence Generation.
CoRR, February, 2025

Two-dimensional normalized knowledge distillation leveraging class relations.
J. Vis. Commun. Image Represent., 2025

StableMakeup: When Real-World Makeup Transfer Meets Diffusion Model.
Proceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference, 2025

GRE Suite: Geo-localization Inference via Fine-Tuned Vision-Language Models and Enhanced Reasoning Chains.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

RelationAdapter: Learning and Transferring Visual Relation with Diffusion Transformers.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

WMAdapter: Adding WaterMark Control to Latent Diffusion Models.
Proceedings of the Forty-second International Conference on Machine Learning, 2025

Image Watermarks are Removable using Controllable Regeneration from Clean Noise.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

EasyControl: Adding Efficient and Flexible Control for Diffusion Transformer.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

DiffSim: Taming Diffusion Models for Evaluating Visual Similarity.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

LayerTracer: Cognitive-Aligned Layered SVG Synthesis via Diffusion Transformer.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

FonTS: Text Rendering with Typography and Style Controls.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

ArtEditor: Learning Customized Instructional Image Editor From Few-Shot Examples.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Any2anytryon: Leveraging Adaptive Position Embeddings for Versatile Virtual Clothing Tasks.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

IDProtector: An Adversarial Noise Encoder to Protect Against ID-Preserving Image Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Stable-Hair: Real-World Hair Transfer via Diffusion Model.
Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024
GridShow: Omni Visual Generation.
CoRR, 2024

Anti-Reference: Universal and Immediate Defense Against Reference-Based Generation.
CoRR, 2024

Stable-Hair: Real-World Hair Transfer via Diffusion Model.
CoRR, 2024

Steganalysis on Digital Watermarking: Is Your Defense Truly Impervious?
CoRR, 2024

ProcessPainter: Learn Painting Process from Sequence Data.
CoRR, 2024

Fast Personalized Text-to-Image Syntheses With Attention Injection.
CoRR, 2024

Stable-Makeup: When Real-World Makeup Transfer Meets Diffusion Model.
CoRR, 2024

ProcessPainter: Learning to draw from sequence data.
Proceedings of the SIGGRAPH Asia 2024 Conference Papers, 2024

Can Simple Averaging Defeat Modern Watermarks?
Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Fast Personalized Text to Image Synthesis with Attention Injection.
Proceedings of the IEEE International Conference on Acoustics, 2024

RingID: Rethinking Tree-Ring Watermarking for Enhanced Multi-key Identification.
Proceedings of the Computer Vision - ECCV 2024, 2024

SSR-Encoder: Encoding Selective Subject Representation for Subject-Driven Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
CLIPVG: Text-Guided Image Manipulation Using Differentiable Vector Graphics.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
Multi-sensor measurement fusion based on minimum mixture error entropy with non-Gaussian measurement noise.
Digit. Signal Process., 2022

CLIPTexture: Text-Driven Texture Synthesis.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

CLIPFont: Text Guided Vector WordArt Generation.
Proceedings of the 33rd British Machine Vision Conference 2022, 2022


  Loading...