Youngjae Yu

Orcid: 0000-0002-5867-0782

According to our database1, Youngjae Yu authored at least 49 papers between 2016 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
CLARA: Classifying and Disambiguating User Commands for Reliable Interactive Robotic Agents.
IEEE Robotics Autom. Lett., February, 2024

Pearl: A Review-driven Persona-Knowledge Grounded Conversational Recommendation Dataset.
CoRR, 2024

Can Large Language Models be Good Emotional Supporter? Mitigating Preference Bias on Emotional Support Conversation.
CoRR, 2024

Aligning Large Language Models by On-Policy Self-Judgment.
CoRR, 2024

Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback.
CoRR, 2024

2023
SMILE: Multimodal Dataset for Understanding Laughter in Video with Language Models.
CoRR, 2023

Localized Symbolic Knowledge Distillation for Visual Commonsense Models.
CoRR, 2023

Long Story Short: a Summarize-then-Search Method for Long Video Question Answering.
CoRR, 2023

Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved with Text.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Localized Symbolic Knowledge Distillation for Visual Commonsense Models.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Zero-shot Active Visual Search (ZAVIS): Intelligent Object Search for Robotic Assistants.
Proceedings of the IEEE International Conference on Robotics and Automation, 2023

Champagne: Learning Real-world Conversation from Large-Scale Web Videos.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Reading Books is Great, But Not if You Are Driving! Visually Grounded Reasoning about Defeasible Commonsense Norms.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

VLIS: Unimodal Language Models Guide Multimodal Language Generation.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Dialogue Chain-of-Thought Distillation for Commonsense-aware Conversational Agents.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

SODA: Million-scale Dialogue Distillation with Social Commonsense Contextualization.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Fusing Pre-Trained Language Models with Multimodal Prompts through Reinforcement Learning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Long Story Short: a Summarize-then-Search Method for Prompt-Based Long Video Question Answering.
Proceedings of the 34th British Machine Vision Conference 2023, 2023

Symbolic Chain-of-Thought Distillation: Small Models Can Also "Think" Step-by-Step.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

2022
SODA: Million-scale Dialogue Distillation with Social Commonsense Contextualization.
CoRR, 2022

Learning Joint Representation of Human Motion and Language.
CoRR, 2022

Active Visual Search in the Wild.
CoRR, 2022

Multimodal Knowledge Alignment with Reinforcement Learning.
CoRR, 2022

Connecting the Dots between Audio and Text without Parallel Data through Visual Knowledge Transfer.
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

NeuroLogic A*esque Decoding: Constrained Text Generation with Lookahead Heuristics.
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

ProsocialDialog: A Prosocial Backbone for Conversational Agents.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

MERLOT RESERVE: Neural Script Knowledge through Vision and Language and Sound.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021
Cycled Compositional Learning between Images and Text.
CoRR, 2021

Automatic Curation of Large-Scale Datasets for Audio-Visual Representation Learning.
CoRR, 2021

MERLOT: Multimodal Neural Script Knowledge Models.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Self-Supervised Learning of Compressed Video Representations.
Proceedings of the 9th International Conference on Learning Representations, 2021

Parameter Efficient Multimodal Transformers for Video Representation Learning.
Proceedings of the 9th International Conference on Learning Representations, 2021

Pano-AVQA: Grounded Audio-Visual Question Answering on 360° Videos.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

ACAV100M: Automatic Curation of Large-Scale Datasets for Audio-Visual Video Representation Learning.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Transitional Adaptation of Pretrained Models for Visual Storytelling.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Dual Compositional Learning in Interactive Image Retrieval.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
CurlingNet: Compositional Learning between Images and Text for Fashion IQ Data.
CoRR, 2020

Character Grounding and Re-identification in Story of Videos and Text Descriptions.
Proceedings of the Computer Vision - ECCV 2020, 2020

Augmenting Data for Sarcasm Detection with Unlabeled Conversation Context.
Proceedings of the Second Workshop on Figurative Language Processing, 2020

2019
Video Question Answering with Spatio-Temporal Reasoning.
Int. J. Comput. Vis., 2019

2018
A Joint Sequence Fusion Model for Video Question Answering and Retrieval.
Proceedings of the Computer Vision - ECCV 2018, 2018

A Memory Network Approach for Story-Based Temporal Summarization of 360° Videos.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

A Deep Ranking Model for Spatio-Temporal Highlight Detection From a 360◦ Video.
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

2017
Encoding Video and Label Priors for Multi-label Video Classification on YouTube-8M dataset.
CoRR, 2017

TimesVector: a vectorized clustering approach to the analysis of time series transcriptome data from multiple phenotypes.
Bioinform., 2017

End-to-End Concept Word Detection for Video Captioning, Retrieval, and Question Answering.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

Supervising Neural Attention Models for Video Captioning by Human Gaze Data.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

TGIF-QA: Toward Spatio-Temporal Reasoning in Visual Question Answering.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

2016
Video Captioning and Retrieval Models with Semantic Attention.
CoRR, 2016


  Loading...