Le Zhuo

Orcid: 0000-0001-7895-091X

According to our database1, Le Zhuo authored at least 33 papers between 2022 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
Lumina-mGPT 2.0: Stand-Alone AutoRegressive Image Modeling.
CoRR, July, 2025

Resurrect Mask AutoRegressive Modeling for Efficient and Scalable Image Generation.
CoRR, July, 2025

T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT.
CoRR, May, 2025

From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning.
CoRR, April, 2025

VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning.
CoRR, April, 2025

OmniCaptioner: One Captioner to Rule Them All.
CoRR, April, 2025

Lumina-OmniLV: A Unified Multimodal Framework for General Low-Level Vision.
CoRR, April, 2025

Lumina-Image 2.0: A Unified and Efficient Image Generative Framework.
CoRR, March, 2025

Vision-to-Music Generation: A Survey.
CoRR, March, 2025

TIDE : Temporal-Aware Sparse Autoencoders for Interpretable Diffusion Transformers in Image Generation.
CoRR, March, 2025

IMAGINE-E: Image Generation Intelligence Evaluation of State-of-the-art Text-to-Image Models.
CoRR, January, 2025

LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Lumina-T2X: Scalable Flow-based Large Diffusion Transformer for Flexible Resolution Generation.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024
Multimodal Music Generation with Explicit Bridges and Retrieval Augmentation.
CoRR, 2024

Customize Your Visual Autoregressive Recipe with Set Autoregressive Modeling.
CoRR, 2024

I-Max: Maximize the Resolution Potential of Pre-trained Rectified Flow Transformers with Projected Flow.
CoRR, 2024

PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions.
CoRR, 2024

LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation.
CoRR, 2024

Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining.
CoRR, 2024

Lumina-Next: Making Lumina-T2X Stronger and Faster with Next-DiT.
CoRR, 2024

Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers.
CoRR, 2024

ProtLLM: An Interleaved Protein-Language LLM with Protein-as-Word Pre-Training.
CoRR, 2024

Lumina-Next : Making Lumina-T2X Stronger and Faster with Next-DiT.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

ProtLLM: An Interleaved Protein-Language LLM with Protein-as-Word Pre-Training.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023
LLMs as Visual Explainers: Advancing Image Classification with Evolving Visual Descriptions.
CoRR, 2023

GraphText: Graph Reasoning in Text Space.
CoRR, 2023

MARBLE: Music Audio Representation Benchmark for Universal Evaluation.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

DiffDance: Cascaded Human Motion Diffusion Model for Dance Generation.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

LyricWhiz: Robust Multilingual Zero-Shot Lyrics Transcription by Whispering to ChatGPT.
Proceedings of the 24th International Society for Music Information Retrieval Conference, 2023

Video Background Music Generation: Dataset, Method and Evaluation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

2022
Video Background Music Generation: Dataset, Method and Evaluation.
CoRR, 2022


  Loading...