We stand with Ukraine

We stand with Ukraine

Jaemin Cho

Orcid: 0000-0002-1558-6169

Affiliations:

UNC Chapel Hill, NC, USA
Allen Institute for AI, Seattle, WA, USA (former)

According to our database¹, Jaemin Cho authored at least 41 papers between 2018 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

Online presence:

on orcid.org
on j-min.io

On csauthors.net:

Bibliography

2025

One Life to Learn: Inferring Symbolic World Models for Stochastic Environments from Unguided Exploration.

[BibT_eX]

[DOI]

,

,

Elias Stengel-Eskin

,

,

CoRR, October, 2025

RotBench: Evaluating Multimodal Large Language Models on Identifying Image Rotation.

[BibT_eX]

[DOI]

,

,

Elias Stengel-Eskin

,

CoRR, August, 2025

Bifrost-1: Bridging Multimodal LLMs and Diffusion Models with Patch-level CLIP Latents.

[BibT_eX]

[DOI]

,

,

,

,

CoRR, August, 2025

A Survey on Long-Video Storytelling Generation: Architectures, Consistency, and Cinematic Quality.

[BibT_eX]

[DOI]

CoRR, July, 2025

CLaMR: Contextualized Late-Interaction for Multimodal Content Retrieval.

[BibT_eX]

[DOI]

,

,

Elias Stengel-Eskin

,

,

CoRR, June, 2025

Video-Skill-CoT: Skill-based Chain-of-Thoughts for Domain-Adaptive Video Reasoning.

[BibT_eX]

[DOI]

,

,

,

CoRR, June, 2025

EPiC: Efficient Video Camera Control Learning with Precise Anchor-Video Guidance.

[BibT_eX]

[DOI]

,

,

,

,

,

,

CoRR, May, 2025

CAPTURe: Evaluating Spatial Reasoning in Vision Language Models via Occluded Object Counting.

[BibT_eX]

[DOI]

,

Elias Stengel-Eskin

,

,

CoRR, April, 2025

Executable Functional Abstractions: Inferring Generative Programs for Advanced Math Problems.

[BibT_eX]

[DOI]

,

Elias Stengel-Eskin

,

,

,

CoRR, April, 2025

Training-free Guidance in Text-to-Video Generation via Multimodal Planning and Structured Noise Initialization.

[BibT_eX]

[DOI]

,

,

,

,

,

CoRR, April, 2025

Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

DataEnvGym: Data Generation Agents in Teacher Environments with Student Feedback.

[BibT_eX]

[DOI]

,

Elias Stengel-Eskin

,

,

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024

VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement.

[BibT_eX]

[DOI]

,

,

,

CoRR, 2024

M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding.

[BibT_eX]

[DOI]

,

Debanjan Mahata

,

,

,

CoRR, 2024

EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents.

[BibT_eX]

[DOI]

,

,

,

,

CoRR, 2024

SELMA: Learning and Merging Skill-Specific Text-to-Image Experts with Auto-Generated Data.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Text-to-Image Generation.

[BibT_eX]

[DOI]

,

,

Jason M. Baldridge

,

,

,

,

,

Jordi Pont-Tuset

,

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Contrastive Region Guidance: Improving Grounding in Vision-Language Models Without Training.

[BibT_eX]

[DOI]

,

,

Elias Stengel-Eskin

,

Proceedings of the Computer Vision - ECCV 2024, 2024

DOCCI: Descriptions of Connected and Contrasting Images.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

Jordi Pont-Tuset

,

,

,

Jason Baldridge

Proceedings of the Computer Vision - ECCV 2024, 2024

Rethinking Interactive Image Segmentation with Low Latency, High Quality, and Diverse Prompts.

[BibT_eX]

[DOI]

,

,

,

Marc Niethammer

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Diagnostic Benchmark and Iterative Inpainting for Layout-Guided Image Generation.

[BibT_eX]

[DOI]

,

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

DiagrammerGPT: Generating Open-Domain, Open-Platform Diagrams via LLM Planning.

[BibT_eX]

[DOI]

,

,

,

CoRR, 2023

VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided Planning.

[BibT_eX]

[DOI]

,

,

,

CoRR, 2023

Visual Programming for Text-to-Image Generation and Evaluation.

[BibT_eX]

[DOI]

,

,

CoRR, 2023

PERCEIVER-VL: Efficient Vision-and-Language Modeling with Iterative Latent Attention.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023

Self-Chained Image-Language Model for Video Localization and Question Answering.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Paxion: Patching Action Knowledge in Video-Language Foundation Models.

[BibT_eX]

[DOI]

Zhenhailong Wang

,

,

,

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Visual Programming for Step-by-Step Text-to-Image Generation and Evaluation.

[BibT_eX]

[DOI]

,

,

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

DALL-EVAL: Probing the Reasoning Skills and Social Biases of Text-to-Image Generation Models.

[BibT_eX]

[DOI]

,

,

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Hierarchical Video-Moment Retrieval and Step-Captioning.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generative Transformers.

[BibT_eX]

[DOI]

,

,

CoRR, 2022

TVLT: Textless Vision-Language Transformer.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer Learning.

[BibT_eX]

[DOI]

,

,

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Fine-grained Image Captioning with CLIP Reward.

[BibT_eX]

[DOI]

,

,

,

Franck Dernoncourt

,

,

Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2022, 2022

VL-ADAPTER: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks.

[BibT_eX]

[DOI]

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

MuMuQA: Multimedia Multi-Hop News Question Answering via Cross-Media Knowledge Extraction and Grounding.

[BibT_eX]

[DOI]

Revanth Gangi Reddy

,

,

,

,

,

,

,

,

,

,

Alexander G. Schwing

,

Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021

VidLanKD: Improving Language Understanding via Video-Distilled Knowledge Transfer.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Unifying Vision-and-Language Tasks via Text Generation.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the 38th International Conference on Machine Learning, 2021

2020

X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal Transformers.

[BibT_eX]

[DOI]

,

,

,

Hannaneh Hajishirzi

,

Aniruddha Kembhavi

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

2019

Mixture Content Selection for Diverse Sequence Generation.

[BibT_eX]

[DOI]

,

,

Hannaneh Hajishirzi

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019

2018

A Hierarchical Latent Structure for Variational Conversation Modeling.

[BibT_eX]

[DOI]

,

,

Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018

Loading...