Jiasen Lu

According to our database¹, Jiasen Lu authored at least 51 papers between 2011 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

Large Language Models are Universal Reasoners for Visual Generation.

[BibT_eX]

[DOI]

CoRR, May, 2026

Imitating What Works: Simulation-Filtered Modular Policy Learning from Human Videos.

[BibT_eX]

[DOI]

CoRR, February, 2026

2025

UniGen-1.5: Enhancing Image Generation and Editing through Reward Unification in Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, November, 2025

Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing.

[BibT_eX]

[DOI]

CoRR, October, 2025

Autoregressive Video Generation beyond Next Frames Prediction.

[BibT_eX]

[DOI]

CoRR, September, 2025

CAR-Flow: Condition-Aware Reparameterization Aligns Source and Target for Better Flow Matching.

[BibT_eX]

[DOI]

CoRR, September, 2025

AToken: A Unified Tokenizer for Vision.

[BibT_eX]

[DOI]

CoRR, September, 2025

GIE-Bench: Towards Grounded Evaluation for Text-Guided Image Editing.

[BibT_eX]

[DOI]

CoRR, May, 2025

SlowFast-LLaVA-1.5: A Family of Token-Efficient Video Large Language Models for Long-Form Video Understanding.

[BibT_eX]

[DOI]

CoRR, March, 2025

UniGen: Enhanced Training & Test-Time Strategies for Unified Multimodal Understanding and Generation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

CAR-Flow: Condition-Aware Reparameterization Aligns Source and Target for Better Flow Matching.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

MMEgo: Towards Building Egocentric Multimodal LLMs for Video QA.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

The Semantic Hub Hypothesis: Language Models Share Semantic Representations Across Languages and Modalities.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

STIV: Scalable Text and Image Conditioned Video Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

One Diffusion to Generate Them All.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024

STIV: Scalable Text and Image Conditioned Video Generation.

[BibT_eX]

[DOI]

CoRR, 2024

MM-Ego: Towards Building Egocentric Multimodal LLMs.

[BibT_eX]

[DOI]

CoRR, 2024

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models.

[BibT_eX]

[DOI]

CoRR, 2024

SoupLM: Model Integration in Large Language and Multi-Modal Models.

[BibT_eX]

[DOI]

CoRR, 2024

Preserving Identity with Variational Score for General-purpose 3D Editing.

[BibT_eX]

[DOI]

CoRR, 2024

Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

Cross-Domain Graph Convolutions for Adversarial Unsupervised Domain Adaptation.

[BibT_eX]

[DOI]

IEEE Trans. Neural Networks Learn. Syst., August, 2023

UNIFIED-IO: A Unified Model for Vision, Language, and Multi-modal Tasks.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

2022

ASC me to Do Anything: Multi-task Training for Embodied AI.

[BibT_eX]

[DOI]

CoRR, 2022

MERLOT RESERVE: Neural Script Knowledge through Vision and Language and Sound.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Multi-Modal Answer Validation for Knowledge-Based VQA.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021

A Simple Long-Tailed Recognition Baseline via Vision-Language Model.

[BibT_eX]

[DOI]

CoRR, 2021

Container: Context Aggregation Network.

[BibT_eX]

[DOI]

CoRR, 2021

Container: Context Aggregation Networks.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Transferable Feature Learning on Graphs Across Visual Domains.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE International Conference on Multimedia and Expo, 2021

2020

Visually Grounded Language Understanding and Generation.

[BibT_eX]

[DOI]

Jiasen Lu

PhD thesis, 2020

Dialog without Dialog Data: Learning Visual Dialog Agents from VQA Data.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal Transformers.

[BibT_eX]

[DOI]

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

Spatially Aware Multimodal Transformers for TextVQA.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

12-in-1: Multi-Task Vision and Language Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019

Emergence of Compositional Language with Deep Generational Transmission.

[BibT_eX]

[DOI]

CoRR, 2019

ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Self-Monitoring Navigation Agent via Auxiliary Progress Estimation.

[BibT_eX]

[DOI]

Proceedings of the 7th International Conference on Learning Representations, 2019

2018

Graph R-CNN for Scene Graph Generation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2018, 2018

Neural Baby Talk.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Visual Curiosity: Learning to Ask Questions to Learn Visual Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2nd Annual Conference on Robot Learning, 2018

2017

VQA: Visual Question Answering - www.visualqa.org.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., 2017

Best of Both Worlds: Transferring Knowledge from Discriminative Learning to a Generative Visual Dialog Model.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

ParlAI: A Dialog Research Software Platform.

[BibT_eX]

[DOI]

Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2017

Knowing When to Look: Adaptive Attention via a Visual Sentinel for Image Captioning.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

2016

Hierarchical Question-Image Co-Attention for Visual Question Answering.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

2015

VQA: Visual Question Answering.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Computer Vision, 2015

Human action segmentation with hierarchical supervoxel consistency.

[BibT_eX]

[DOI]

Jiasen Lu

Ran Xu

Jason J. Corso

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015

2012

Palmprint and Face Multi-Modal Biometric Recognition Based on SDA-GSVD and Its Kernelization.

[BibT_eX]

[DOI]

Sensors, 2012

2011

Supervised local sparsity preserving projection for face feature extraction.

[BibT_eX]

[DOI]

Proceedings of the First Asian Conference on Pattern Recognition, 2011

Jiasen Lu

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...