Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

EgoSpeak: Learning When to Speak for Egocentric Conversational Agents in the Wild.

[BibT_eX]

[DOI]

Junhyeok Kim

Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2025, Albuquerque, New Mexico, USA, April 29, 2025

Scalp Diagnostic System with Label-Free Segmentation and Training-Free Image Translation.

[BibT_eX]

[DOI]

Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2025, 2025

CANVAS: Commonsense-Aware Navigation System for Intuitive Human-Robot Interaction.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Robotics and Automation, 2025

VAGUE: Visual Contexts Clarify Ambiguous Expressions.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

V.I.P.: Iterative Online Preference Distillation for Efficient Video Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

DisCoRD: Discrete Tokens to Continuous Motion via Rectified Flow Decoding.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Multimodal UNcommonsense: From Odd to Ordinary and Ordinary to Odd.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2025, 2025

Subtle Risks, Critical Failures: A Framework for Diagnosing Physical Safety of LLMs for Embodied Decision Making.

[BibT_eX]

[DOI]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

VisEscape: A Benchmark for Evaluating Exploration-driven Decision-making in Virtual Escape Rooms.

[BibT_eX]

[DOI]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

Zero-shot Multimodal Document Retrieval via Cross-modal Question Generation.

[BibT_eX]

[DOI]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

MAVL: A Multilingual Audio-Video Lyrics Dataset for Animated Song Translation.

[BibT_eX]

[DOI]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

Representation Bending for Large Language Model Safety.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Persona Dynamics: Unveiling the Impact of Persona Traits on Agents in Text-Based Games.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Speaking Beyond Language: A Large-Scale Multimodal Dataset for Learning Nonverbal Cues from Video-Grounded Dialogues.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Are Any-to-Any Models More Consistent Across Modality Transfers Than Specialists?

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

DEEPTalk: Dynamic Emotion Embedding for Probabilistic Speech-Driven 3D Face Animation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

MASS: Overcoming Language Bias in Image-Text Matching.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

ISR-DPO: Aligning Large Multimodal Models for Videos by Iterative Self-Retrospective DPO.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024

CLARA: Classifying and Disambiguating User Commands for Reliable Interactive Robotic Agents.

[BibT_eX]

[DOI]

IEEE Robotics Autom. Lett., February, 2024

TIPO: Text to Image with Text Presampling for Prompt Optimization.

[BibT_eX]

[DOI]

CoRR, 2024

C<sup>2</sup>: Scalable Auto-Feedback for LLM-based Chart Generation.

[BibT_eX]

[DOI]

CoRR, 2024

Layout-and-Retouch: A Dual-stage Framework for Improving Diversity in Personalized Image Generation.

[BibT_eX]

[DOI]

CoRR, 2024

i-SRT: Aligning Large Multimodal Models for Videos by Iterative Self-Retrospective Judgment.

[BibT_eX]

[DOI]

CoRR, 2024

Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

Pearl: A Review-driven Persona-Knowledge Grounded Conversational Recommendation Dataset.

[BibT_eX]

[DOI]

CoRR, 2024

Towards Visual Text Design Transfer Across Languages.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

SMILE: Multimodal Dataset for Understanding Laughter in Video with Language Models.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2024, 2024

How to Train Your Fact Verifier: Knowledge Transfer with Multimodal Open Models.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

Cactus: Towards Psychological Counseling Conversations using Cognitive Behavioral Theory.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

Selective Vision is the Challenge for Visual Reasoning: A Benchmark for Visual Argument Understanding.

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Can visual language models resolve textual ambiguity with visual cues? Let visual puns tell you!

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models.

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

ActionSwitch: Class-Agnostic Detection of Simultaneous Actions in Streaming Videos.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Aligning Large Language Models by On-Policy Self-Judgment.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

Pearl: A Review-driven Persona-Knowledge Grounded Conversational Recommendation Dataset.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2024

Can Large Language Models be Good Emotional Supporter? Mitigating Preference Bias on Emotional Support Conversation.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023

Long Story Short: a Summarize-then-Search Method for Long Video Question Answering.

[BibT_eX]

[DOI]

Jiwan Chung

Youngjae Yu

CoRR, 2023

Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved with Text.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Localized Symbolic Knowledge Distillation for Visual Commonsense Models.

[BibT_eX]

[DOI]

Jae Sung Park

Jack Hessel

Khyathi Raghavi Chandu

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Zero-shot Active Visual Search (ZAVIS): Intelligent Object Search for Robotic Assistants.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Robotics and Automation, 2023

Champagne: Learning Real-world Conversation from Large-Scale Web Videos.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Reading Books is Great, But Not if You Are Driving! Visually Grounded Reasoning about Defeasible Commonsense Norms.

[BibT_eX]

[DOI]

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

VLIS: Unimodal Language Models Guide Multimodal Language Generation.

[BibT_eX]

[DOI]

Jiwan Chung

Youngjae Yu

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Dialogue Chain-of-Thought Distillation for Commonsense-aware Conversational Agents.

[BibT_eX]

[DOI]

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

SODA: Million-scale Dialogue Distillation with Social Commonsense Contextualization.

[BibT_eX]

[DOI]

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Fusing Pre-Trained Language Models with Multimodal Prompts through Reinforcement Learning.

[BibT_eX]

[DOI]

Prithviraj Ammanabrolu

Ronan Le Bras

Gunhee Kim

Yejin Choi

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Long Story Short: a Summarize-then-Search Method for Prompt-Based Long Video Question Answering.

[BibT_eX]

[DOI]

Jiwan Chung

Youngjae Yu

Proceedings of the 34th British Machine Vision Conference 2023, 2023

Symbolic Chain-of-Thought Distillation: Small Models Can Also "Think" Step-by-Step.

[BibT_eX]

[DOI]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

2022

SODA: Million-scale Dialogue Distillation with Social Commonsense Contextualization.

[BibT_eX]

[DOI]

CoRR, 2022

Learning Joint Representation of Human Motion and Language.

[BibT_eX]

[DOI]

CoRR, 2022

Active Visual Search in the Wild.

[BibT_eX]

[DOI]

CoRR, 2022

Multimodal Knowledge Alignment with Reinforcement Learning.

[BibT_eX]

[DOI]

Prithviraj Ammanabrolu

CoRR, 2022

Connecting the Dots between Audio and Text without Parallel Data through Visual Knowledge Transfer.

[BibT_eX]

[DOI]

Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

NeuroLogic A*esque Decoding: Constrained Text Generation with Lookahead Heuristics.

[BibT_eX]

[DOI]

Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

ProsocialDialog: A Prosocial Backbone for Conversational Agents.

[BibT_eX]

[DOI]

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

MERLOT RESERVE: Neural Script Knowledge through Vision and Language and Sound.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021

Cycled Compositional Learning between Images and Text.

[BibT_eX]

[DOI]

CoRR, 2021

Automatic Curation of Large-Scale Datasets for Audio-Visual Representation Learning.

[BibT_eX]

[DOI]

CoRR, 2021

MERLOT: Multimodal Neural Script Knowledge Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Self-Supervised Learning of Compressed Video Representations.

[BibT_eX]

[DOI]

Proceedings of the 9th International Conference on Learning Representations, 2021

Parameter Efficient Multimodal Transformers for Video Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the 9th International Conference on Learning Representations, 2021

Pano-AVQA: Grounded Audio-Visual Question Answering on 360° Videos.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

ACAV100M: Automatic Curation of Large-Scale Datasets for Audio-Visual Video Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Transitional Adaptation of Pretrained Models for Visual Storytelling.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Dual Compositional Learning in Interactive Image Retrieval.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020

CurlingNet: Compositional Learning between Images and Text for Fashion IQ Data.

[BibT_eX]

[DOI]

CoRR, 2020

Character Grounding and Re-identification in Story of Videos and Text Descriptions.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

Augmenting Data for Sarcasm Detection with Unlabeled Conversation Context.

[BibT_eX]

[DOI]

Hankyol Lee

Youngjae Yu

Gunhee Kim

Proceedings of the Second Workshop on Figurative Language Processing, 2020

2019

Video Question Answering with Spatio-Temporal Reasoning.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., 2019

2018

A Joint Sequence Fusion Model for Video Question Answering and Retrieval.

[BibT_eX]

[DOI]

Youngjae Yu

Jongseok Kim

Gunhee Kim

Proceedings of the Computer Vision - ECCV 2018, 2018

A Memory Network Approach for Story-Based Temporal Summarization of 360° Videos.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

A Deep Ranking Model for Spatio-Temporal Highlight Detection From a 360◦ Video.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

2017

Encoding Video and Label Priors for Multi-label Video Classification on YouTube-8M dataset.

[BibT_eX]

[DOI]

CoRR, 2017

TimesVector: a vectorized clustering approach to the analysis of time series transcriptome data from multiple phenotypes.

[BibT_eX]

[DOI]

Bioinform., 2017

End-to-End Concept Word Detection for Video Captioning, Retrieval, and Question Answering.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

Supervising Neural Attention Models for Video Captioning by Human Gaze Data.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

TGIF-QA: Toward Spatio-Temporal Reasoning in Visual Question Answering.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

2016

Video Captioning and Retrieval Models with Semantic Attention.

[BibT_eX]

[DOI]

CoRR, 2016

Youngjae Yu

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...