Yufan Zhou

Affiliations:
  • Luma AI, San Francisco, CA, USA
  • Adobe, San Jose, CA, USA
  • University at Buffalo, Department of Computer Science and Engineering, Buffalo, NY, USA


According to our database1, Yufan Zhou authored at least 39 papers between 2018 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
Multimodal LLMs as Customized Reward Models for Text-to-Image Generation.
CoRR, July, 2025

Towards Visual Text Grounding of Multimodal Large Language Model.
CoRR, April, 2025

A Survey on Mechanistic Interpretability for Multi-Modal Foundation Models.
CoRR, February, 2025

ARTIST: Improving the Generation of Text-Rich Images with Disentangled Diffusion Models and Large Language Models.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2025

TTVD: Towards a Geometric Framework for Test-Time Adaptation Based on Voronoi Diagram.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

SV-RAG: LoRA-Contextualizing Adaptation of MLLMs for Long Document Understanding.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

A High-Quality Text-Rich Image Instruction Tuning Dataset via Hybrid Instruction Generation.
Proceedings of the 31st International Conference on Computational Linguistics, 2025

Numerical Pruning for Efficient Autoregressive Models.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024
Numerical Pruning for Efficient Autoregressive Models.
CoRR, 2024

SUGAR: Subject-Driven Video Customization in a Zero-Shot Manner.
CoRR, 2024

LoRA-Contextualizing Adaptation of Large Multimodal Models for Long Document Understanding.
CoRR, 2024

Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models.
CoRR, 2024

MMR: Evaluating Reading Ability of Large Multimodal Models.
CoRR, 2024

LLaVA-Read: Enhancing Reading Ability of Multimodal Language Models.
CoRR, 2024

ARTIST: Improving the Generation of Text-rich Images by Disentanglement.
CoRR, 2024

Toffee: Efficient Million-Scale Dataset Construction for Subject-Driven Text-to-Image Generation.
CoRR, 2024

Towards Aligned Layout Generation via Diffusion Model with Aesthetic Constraints.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

TextLap: Customizing Language Models for Text-to-Layout Planning.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

Customization Assistant for Text-to-image Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

TRINS: Towards Multimodal Language Models that Can Read.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Navigating the Dual Facets: A Comprehensive Evaluation of Sequential Memory Editing in Large Language Models.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023
LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding.
CoRR, 2023

Enhancing Detail Preservation for Customized Text-to-Image Generation: A Regularization-Free Approach.
CoRR, 2023

Shifted Diffusion for Text-to-image Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
Lafite2: Few-shot Text-to-Image Generation.
CoRR, 2022

Towards Language-Free Training for Text-to-Image Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

TiGAN: Text-Based Interactive Image Generation and Manipulation.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
A Generic Approach for Enhancing GANs by Regularized Latent Optimization.
CoRR, 2021

LAFITE: Towards Language-Free Training for Text-to-Image Generation.
CoRR, 2021

Learning High-Dimensional Distributions with Latent Neural Fokker-Planck Kernels.
CoRR, 2021

Meta-Learning with Neural Tangent Kernels.
Proceedings of the 9th International Conference on Learning Representations, 2021

MixKD: Towards Efficient Distillation of Large-scale Language Models.
Proceedings of the 9th International Conference on Learning Representations, 2021

2020
Graph Neural Networks with Composite Kernels.
CoRR, 2020

Learning Manifold Implicitly via Explicit Heat-Kernel Learning.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Weakly-Supervised Brain Tumor Classification with Global Diagnosis Label.
Proceedings of the 17th IEEE International Symposium on Biomedical Imaging, 2020

Variational Adversarial Kernel Learned Imitation Learning.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

Learning Diverse Stochastic Human-Action Generators by Learning Smooth Latent Transitions.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019
KernelNet: A Data-Dependent Kernel Parameterization for Deep Generative Modeling.
CoRR, 2019

2018
Holistic Brain Tumor Screening and Classification Based on DenseNet and Recurrent Neural Network.
Proceedings of the Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries, 2018


  Loading...