Jaemin Cho

Orcid: 0000-0002-1558-6169

Affiliations:
  • UNC Chapel Hill, NC, USA
  • Allen Institute for AI, Seattle, WA, USA (former)


According to our database1, Jaemin Cho authored at least 40 papers between 2018 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
RotBench: Evaluating Multimodal Large Language Models on Identifying Image Rotation.
CoRR, August, 2025

Bifrost-1: Bridging Multimodal LLMs and Diffusion Models with Patch-level CLIP Latents.
CoRR, August, 2025

A Survey on Long-Video Storytelling Generation: Architectures, Consistency, and Cinematic Quality.
CoRR, July, 2025

CLaMR: Contextualized Late-Interaction for Multimodal Content Retrieval.
CoRR, June, 2025

Video-Skill-CoT: Skill-based Chain-of-Thoughts for Domain-Adaptive Video Reasoning.
CoRR, June, 2025

EPiC: Efficient Video Camera Control Learning with Precise Anchor-Video Guidance.
CoRR, May, 2025

CAPTURe: Evaluating Spatial Reasoning in Vision Language Models via Occluded Object Counting.
CoRR, April, 2025

Executable Functional Abstractions: Inferring Generative Programs for Advanced Math Problems.
CoRR, April, 2025

Training-free Guidance in Text-to-Video Generation via Multimodal Planning and Structured Noise Initialization.
CoRR, April, 2025

Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

DataEnvGym: Data Generation Agents in Teacher Environments with Student Feedback.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024
VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement.
CoRR, 2024

M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding.
CoRR, 2024

EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents.
CoRR, 2024

SELMA: Learning and Merging Skill-Specific Text-to-Image Experts with Auto-Generated Data.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Text-to-Image Generation.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Contrastive Region Guidance: Improving Grounding in Vision-Language Models Without Training.
Proceedings of the Computer Vision - ECCV 2024, 2024

DOCCI: Descriptions of Connected and Contrasting Images.
Proceedings of the Computer Vision - ECCV 2024, 2024

Rethinking Interactive Image Segmentation with Low Latency, High Quality, and Diverse Prompts.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Diagnostic Benchmark and Iterative Inpainting for Layout-Guided Image Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
DiagrammerGPT: Generating Open-Domain, Open-Platform Diagrams via LLM Planning.
CoRR, 2023

VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided Planning.
CoRR, 2023

Visual Programming for Text-to-Image Generation and Evaluation.
CoRR, 2023

PERCEIVER-VL: Efficient Vision-and-Language Modeling with Iterative Latent Attention.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023

Self-Chained Image-Language Model for Video Localization and Question Answering.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Paxion: Patching Action Knowledge in Video-Language Foundation Models.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Visual Programming for Step-by-Step Text-to-Image Generation and Evaluation.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

DALL-EVAL: Probing the Reasoning Skills and Social Biases of Text-to-Image Generation Models.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Hierarchical Video-Moment Retrieval and Step-Captioning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generative Transformers.
CoRR, 2022

TVLT: Textless Vision-Language Transformer.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer Learning.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Fine-grained Image Captioning with CLIP Reward.
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2022, 2022

VL-ADAPTER: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

MuMuQA: Multimedia Multi-Hop News Question Answering via Cross-Media Knowledge Extraction and Grounding.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
VidLanKD: Improving Language Understanding via Video-Distilled Knowledge Transfer.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Unifying Vision-and-Language Tasks via Text Generation.
Proceedings of the 38th International Conference on Machine Learning, 2021

2020
X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal Transformers.
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

2019
Mixture Content Selection for Diverse Sequence Generation.
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019

2018
A Hierarchical Latent Structure for Variational Conversation Modeling.
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018


  Loading...