Le Zhuo

Orcid: 0000-0001-7895-091X

According to our database¹, Le Zhuo authored at least 38 papers between 2022 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2025

PICABench: How Far Are We from Physically Realistic Image Editing?

[BibT_eX]

[DOI]

CoRR, October, 2025

ProteinAE: Protein Diffusion Autoencoders for Structure Encoding.

[BibT_eX]

[DOI]

CoRR, October, 2025

Lumina-DiMOO: An Omni Diffusion Large Language Model for Multi-Modal Generation and Understanding.

[BibT_eX]

[DOI]

CoRR, October, 2025

Factuality Matters: When Image Generation and Editing Meet Structured Visuals.

[BibT_eX]

[DOI]

CoRR, October, 2025

Lumina-mGPT 2.0: Stand-Alone AutoRegressive Image Modeling.

[BibT_eX]

[DOI]

Victor Shea-Jay Huang

CoRR, July, 2025

Resurrect Mask AutoRegressive Modeling for Efficient and Scalable Image Generation.

[BibT_eX]

[DOI]

CoRR, July, 2025

T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT.

[BibT_eX]

[DOI]

CoRR, May, 2025

From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning.

[BibT_eX]

[DOI]

CoRR, April, 2025

VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning.

[BibT_eX]

[DOI]

CoRR, April, 2025

OmniCaptioner: One Captioner to Rule Them All.

[BibT_eX]

[DOI]

CoRR, April, 2025

Lumina-OmniLV: A Unified Multimodal Framework for General Low-Level Vision.

[BibT_eX]

[DOI]

CoRR, April, 2025

Lumina-Image 2.0: A Unified and Efficient Image Generative Framework.

[BibT_eX]

[DOI]

CoRR, March, 2025

Vision-to-Music Generation: A Survey.

[BibT_eX]

[DOI]

Victor Shea-Jay Huang

Yue Liao

CoRR, March, 2025

TIDE : Temporal-Aware Sparse Autoencoders for Interpretable Diffusion Transformers in Image Generation.

[BibT_eX]

[DOI]

Victor Shea-Jay Huang

CoRR, March, 2025

IMAGINE-E: Image Generation Intelligence Evaluation of State-of-the-art Text-to-Image Models.

[BibT_eX]

[DOI]

CoRR, January, 2025

A Survey on Vision-to-Music Generation: Methods, Datasets, Evaluation, and Challenges.

[BibT_eX]

[DOI]

Victor Shea-Jay Huang

Yue Liao

Proceedings of the 26th International Society for Music Information Retrieval Conference, 2025

LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Lumina-T2X: Scalable Flow-based Large Diffusion Transformer for Flexible Resolution Generation.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024

Multimodal Music Generation with Explicit Bridges and Retrieval Augmentation.

[BibT_eX]

[DOI]

CoRR, 2024

Customize Your Visual Autoregressive Recipe with Set Autoregressive Modeling.

[BibT_eX]

[DOI]

CoRR, 2024

I-Max: Maximize the Resolution Potential of Pre-trained Rectified Flow Transformers with Projected Flow.

[BibT_eX]

[DOI]

CoRR, 2024

PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions.

[BibT_eX]

[DOI]

CoRR, 2024

LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation.

[BibT_eX]

[DOI]

CoRR, 2024

Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining.

[BibT_eX]

[DOI]

CoRR, 2024

Lumina-Next: Making Lumina-T2X Stronger and Faster with Next-DiT.

[BibT_eX]

[DOI]

CoRR, 2024

Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers.

[BibT_eX]

[DOI]

CoRR, 2024

ProtLLM: An Interleaved Protein-Language LLM with Protein-as-Word Pre-Training.

[BibT_eX]

[DOI]

CoRR, 2024

Lumina-Next : Making Lumina-T2X Stronger and Faster with Next-DiT.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

ProtLLM: An Interleaved Protein-Language LLM with Protein-as-Word Pre-Training.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023

LLMs as Visual Explainers: Advancing Image Classification with Evolving Visual Descriptions.

[BibT_eX]

[DOI]

CoRR, 2023

GraphText: Graph Reasoning in Text Space.

[BibT_eX]

[DOI]

CoRR, 2023

MARBLE: Music Audio Representation Benchmark for Universal Evaluation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

DiffDance: Cascaded Human Motion Diffusion Model for Dance Generation.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

LyricWhiz: Robust Multilingual Zero-Shot Lyrics Transcription by Whispering to ChatGPT.

[BibT_eX]

[DOI]

Proceedings of the 24th International Society for Music Information Retrieval Conference, 2023

Video Background Music Generation: Dataset, Method and Evaluation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

2022

Video Background Music Generation: Dataset, Method and Evaluation.

[BibT_eX]

[DOI]

CoRR, 2022

Le Zhuo

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...