Youngjae Yu

Orcid: 0000-0002-5867-0782

According to our database1, Youngjae Yu authored at least 95 papers between 2016 and 2025.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
InfoCausalQA:Can Models Perform Non-explicit Causal Reasoning Based on Infographic?
CoRR, August, 2025

V.I.P. : Iterative Online Preference Distillation for Efficient Video Diffusion Models.
CoRR, August, 2025

HIPPO-Video: Simulating Watch Histories with Large Language Models for Personalized Video Highlighting.
CoRR, July, 2025

SlumpGuard: An AI-Powered Real-Time System for Automated Concrete Slump Prediction via Video Analysis.
CoRR, July, 2025

NMIXX: Domain-Adapted Neural Embeddings for Cross-Lingual eXploration of Finance.
CoRR, July, 2025

Verifying the Verifiers: Unveiling Pitfalls and Potentials in Fact Verifiers.
CoRR, June, 2025

Subtle Risks, Critical Failures: A Framework for Diagnosing Physical Safety of LLMs for Embodied Decision Making.
CoRR, May, 2025

Don't Look Only Once: Towards Multimodal Interactive Reasoning with Selective Visual Revisitation.
CoRR, May, 2025

MAVL: A Multilingual Audio-Video Lyrics Dataset for Animated Song Translation.
CoRR, May, 2025

DUSK: Do Not Unlearn Shared Knowledge.
CoRR, May, 2025

When AI Co-Scientists Fail: SPOT-a Benchmark for Automated Verification of Scientific Research.
CoRR, May, 2025

G-FOCUS: Towards a Robust Method for Assessing UI Design Persuasiveness.
CoRR, May, 2025

Persona Dynamics: Unveiling the Impact of Personality Traits on Agents in Text-Based Games.
CoRR, April, 2025

Explain with Visual Keypoints Like a Real Mentor! A Benchmark for Multimodal Solution Explanation.
CoRR, April, 2025

Representation Bending for Large Language Model Safety.
CoRR, April, 2025

VisEscape: A Benchmark for Evaluating Exploration-driven Decision-making in Virtual Escape Rooms.
CoRR, March, 2025

GuideDog: A Real-World Egocentric Multimodal Dataset for Blind and Low-Vision Accessibility-Aware Guidance.
CoRR, March, 2025

Teaching Metric Distance to Autoregressive Multimodal Foundational Models.
CoRR, March, 2025

KL Penalty Control via Perturbation for Direct Preference Optimization.
CoRR, February, 2025

SEAL: Entangled White-box Watermarks on Low-Rank Adaptation.
CoRR, January, 2025

Complete Coherent Demodulation and Recovery of Spread Spectrum Clocking-Based Electromagnetic Information Leakage: Theory and Demonstration.
IEEE Trans. Inf. Forensics Secur., 2025

Do LLMs Have Distinct and Consistent Personality? TRAIT: Personality Testset designed for LLMs with Psychometrics.
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2025, Albuquerque, New Mexico, USA, April 29, 2025

C²: Scalable Auto-Feedback for LLM-based Chart Generation.
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

EgoSpeak: Learning When to Speak for Egocentric Conversational Agents in the Wild.
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2025, Albuquerque, New Mexico, USA, April 29, 2025

Representation Bending for Large Language Model Safety.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Persona Dynamics: Unveiling the Impact of Persona Traits on Agents in Text-Based Games.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Speaking Beyond Language: A Large-Scale Multimodal Dataset for Learning Nonverbal Cues from Video-Grounded Dialogues.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Are Any-to-Any Models More Consistent Across Modality Transfers Than Specialists?
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

DEEPTalk: Dynamic Emotion Embedding for Probabilistic Speech-Driven 3D Face Animation.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

MASS: Overcoming Language Bias in Image-Text Matching.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

ISR-DPO: Aligning Large Multimodal Models for Videos by Iterative Self-Retrospective DPO.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024
CLARA: Classifying and Disambiguating User Commands for Reliable Interactive Robotic Agents.
IEEE Robotics Autom. Lett., February, 2024

DisCoRD: Discrete Tokens to Continuous Motion via Rectified Flow Decoding.
CoRR, 2024

TIPO: Text to Image with Text Presampling for Prompt Optimization.
CoRR, 2024

C<sup>2</sup>: Scalable Auto-Feedback for LLM-based Chart Generation.
CoRR, 2024

CANVAS: Commonsense-Aware Navigation System for Intuitive Human-Robot Interaction.
CoRR, 2024

Layout-and-Retouch: A Dual-stage Framework for Improving Diversity in Personalized Image Generation.
CoRR, 2024

Scalp Diagnostic System With Label-Free Segmentation and Training-Free Image Translation.
CoRR, 2024

i-SRT: Aligning Large Multimodal Models for Videos by Iterative Self-Retrospective Judgment.
CoRR, 2024

Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models.
CoRR, 2024

Pearl: A Review-driven Persona-Knowledge Grounded Conversational Recommendation Dataset.
CoRR, 2024

Towards Visual Text Design Transfer Across Languages.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

SMILE: Multimodal Dataset for Understanding Laughter in Video with Language Models.
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2024, 2024

How to Train Your Fact Verifier: Knowledge Transfer with Multimodal Open Models.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

Cactus: Towards Psychological Counseling Conversations using Cognitive Behavioral Theory.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

Selective Vision is the Challenge for Visual Reasoning: A Benchmark for Visual Argument Understanding.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Can visual language models resolve textual ambiguity with visual cues? Let visual puns tell you!
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

ActionSwitch: Class-Agnostic Detection of Simultaneous Actions in Streaming Videos.
Proceedings of the Computer Vision - ECCV 2024, 2024

Aligning Large Language Models by On-Policy Self-Judgment.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

Pearl: A Review-driven Persona-Knowledge Grounded Conversational Recommendation Dataset.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

Can Large Language Models be Good Emotional Supporter? Mitigating Preference Bias on Emotional Support Conversation.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023
Long Story Short: a Summarize-then-Search Method for Long Video Question Answering.
CoRR, 2023

Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved with Text.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Localized Symbolic Knowledge Distillation for Visual Commonsense Models.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Zero-shot Active Visual Search (ZAVIS): Intelligent Object Search for Robotic Assistants.
Proceedings of the IEEE International Conference on Robotics and Automation, 2023

Champagne: Learning Real-world Conversation from Large-Scale Web Videos.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Reading Books is Great, But Not if You Are Driving! Visually Grounded Reasoning about Defeasible Commonsense Norms.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

VLIS: Unimodal Language Models Guide Multimodal Language Generation.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Dialogue Chain-of-Thought Distillation for Commonsense-aware Conversational Agents.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

SODA: Million-scale Dialogue Distillation with Social Commonsense Contextualization.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Fusing Pre-Trained Language Models with Multimodal Prompts through Reinforcement Learning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Long Story Short: a Summarize-then-Search Method for Prompt-Based Long Video Question Answering.
Proceedings of the 34th British Machine Vision Conference 2023, 2023

Symbolic Chain-of-Thought Distillation: Small Models Can Also "Think" Step-by-Step.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

2022
SODA: Million-scale Dialogue Distillation with Social Commonsense Contextualization.
CoRR, 2022

Learning Joint Representation of Human Motion and Language.
CoRR, 2022

Active Visual Search in the Wild.
CoRR, 2022

Multimodal Knowledge Alignment with Reinforcement Learning.
CoRR, 2022

Connecting the Dots between Audio and Text without Parallel Data through Visual Knowledge Transfer.
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

NeuroLogic A*esque Decoding: Constrained Text Generation with Lookahead Heuristics.
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

ProsocialDialog: A Prosocial Backbone for Conversational Agents.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

MERLOT RESERVE: Neural Script Knowledge through Vision and Language and Sound.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021
Cycled Compositional Learning between Images and Text.
CoRR, 2021

Automatic Curation of Large-Scale Datasets for Audio-Visual Representation Learning.
CoRR, 2021

MERLOT: Multimodal Neural Script Knowledge Models.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Self-Supervised Learning of Compressed Video Representations.
Proceedings of the 9th International Conference on Learning Representations, 2021

Parameter Efficient Multimodal Transformers for Video Representation Learning.
Proceedings of the 9th International Conference on Learning Representations, 2021

Pano-AVQA: Grounded Audio-Visual Question Answering on 360° Videos.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

ACAV100M: Automatic Curation of Large-Scale Datasets for Audio-Visual Video Representation Learning.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Transitional Adaptation of Pretrained Models for Visual Storytelling.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Dual Compositional Learning in Interactive Image Retrieval.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
CurlingNet: Compositional Learning between Images and Text for Fashion IQ Data.
CoRR, 2020

Character Grounding and Re-identification in Story of Videos and Text Descriptions.
Proceedings of the Computer Vision - ECCV 2020, 2020

Augmenting Data for Sarcasm Detection with Unlabeled Conversation Context.
Proceedings of the Second Workshop on Figurative Language Processing, 2020

2019
Video Question Answering with Spatio-Temporal Reasoning.
Int. J. Comput. Vis., 2019

2018
A Joint Sequence Fusion Model for Video Question Answering and Retrieval.
Proceedings of the Computer Vision - ECCV 2018, 2018

A Memory Network Approach for Story-Based Temporal Summarization of 360° Videos.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

A Deep Ranking Model for Spatio-Temporal Highlight Detection From a 360◦ Video.
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

2017
Encoding Video and Label Priors for Multi-label Video Classification on YouTube-8M dataset.
CoRR, 2017

TimesVector: a vectorized clustering approach to the analysis of time series transcriptome data from multiple phenotypes.
Bioinform., 2017

End-to-End Concept Word Detection for Video Captioning, Retrieval, and Question Answering.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

Supervising Neural Attention Models for Video Captioning by Human Gaze Data.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

TGIF-QA: Toward Spatio-Temporal Reasoning in Visual Question Answering.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

2016
Video Captioning and Retrieval Models with Semantic Attention.
CoRR, 2016


  Loading...