Bin Lin

Orcid: 0009-0003-4805-9730

Affiliations:
  • Peking University, Shenzhen Graduate School, Rabbitpre Intelligence, China


According to our database1, Bin Lin authored at least 34 papers between 2023 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

Online presence:

On csauthors.net:

Bibliography

2026
Manifold-Aware Exploration for Reinforcement Learning in Video Generation.
CoRR, March, 2026

iFSQ: Improving FSQ for Image Generation with 1 Line of Code.
CoRR, January, 2026

Look-Back: Implicit Visual Re-focusing in MLLM Reasoning.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

Next Patch Prediction for AutoRegressive Visual Generation.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

360Explorer: Exploring 4D Controllable World in Panoramic Videos.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025
JarvisEvo: Towards a Self-Evolving Photo Editing Agent with Synergistic Editor-Evaluator Optimization.
CoRR, November, 2025

Does Understanding Inform Generation in Unified Multimodal Models? From Analysis to Path Forward.
CoRR, November, 2025

Uniworld-V2: Reinforce Image Editing with Diffusion Negative-aware Finetuning and MLLM Implicit Feedback.
CoRR, October, 2025

MagicTime: Time-Lapse Video Generation Models as Metamorphic Simulators.
IEEE Trans. Pattern Anal. Mach. Intell., September, 2025

FlashI2V: Fourier-Guided Latent Shifting Prevents Conditional Image Leakage in Image-to-Video Generation.
CoRR, September, 2025

Unified Multimodal Model as Auto-Encoder.
CoRR, September, 2025

UniWorld-V1: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation.
CoRR, June, 2025

OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation.
CoRR, May, 2025

ImgEdit: A Unified Image Editing Dataset and Benchmark.
CoRR, May, 2025

SwapAnyone: Consistent and Realistic Video Synthesis for Swapping Any Person into Any Video.
CoRR, March, 2025

WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation.
CoRR, March, 2025

TaxDiff: taxonomic-guided diffusion model for protein sequence generation.
Sci. China Inf. Sci., 2025

OD-VAE: An Omni-dimensional Video Compressor for Improving Latent Video Diffusion Model.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2025

WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Cycle3D: High-quality and Consistent Image-to-3D Generation via Generation-Reconstruction Cycle.
Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024
Next Patch Prediction for Autoregressive Visual Generation.
CoRR, 2024

DreamDance: Animating Human Images by Enriching 3D Geometry Cues from 2D Poses.
CoRR, 2024

Open-Sora Plan: Open-Source Large Video Generation Model.
CoRR, 2024

WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model.
CoRR, 2024

OD-VAE: An Omni-dimensional Video Compressor for Improving Latent Video Diffusion Model.
CoRR, 2024

ShareGPT4Video: Improving Video Understanding and Generation with Better Captions.
CoRR, 2024

UNIAA: A Unified Multi-modal Image Aesthetic Assessment Baseline and Benchmark.
CoRR, 2024

TaxDiff: Taxonomic-Guided Diffusion Model for Protein Sequence Generation.
CoRR, 2024

LLMBind: A Unified Modality-Task Integration Framework.
CoRR, 2024

MoE-LLaVA: Mixture of Experts for Large Vision-Language Models.
CoRR, 2024

LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Video-LLaVA: Learning United Visual Representation by Alignment Before Projection.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

2023
Video-Bench: A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models.
CoRR, 2023

LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment.
CoRR, 2023


  Loading...