Xin Wang

CoRR, 2023

MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens.

[BibT_eX]

[DOI]

Kaizhi Zheng

Xuehai He

CoRR, 2023

T2IAT: Measuring Valence and Stereotypical Biases in Text-to-Image Generation.

[BibT_eX]

[DOI]

CoRR, 2023

R2H: Building Multimodal Navigation Helpers that Respond to Help.

[BibT_eX]

[DOI]

CoRR, 2023

Discriminative Diffusion Models as Few-shot Vision and Language Learners.

[BibT_eX]

[DOI]

CoRR, 2023

CUDA-GHR: Controllable Unsupervised Domain Adaptation for Gaze and Head Redirection.

[BibT_eX]

[DOI]

Swati Jindal

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023

LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

PHOTOSWAP: Personalized Subject Swapping in Images.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

LayoutGPT: Compositional Visual Planning and Generation with Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

ESC: Exploration with Soft Commonsense Constraints for Zero-shot Object Navigation.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

Neuro-Symbolic Procedural Planning with Commonsense Prompting.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Collaborative Generative AI: Integrating GPT-k for Efficient Editing in Text-to-Image Generation.

[BibT_eX]

[DOI]

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Parameter-Efficient Cross-lingual Transfer of Vision and Language Models via Translation-based Alignment.

[BibT_eX]

[DOI]

Zhen Zhang

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Visualize Before You Write: Imagination-Guided Open-Ended Text Generation.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EACL 2023, 2023

ImaginE: An Imagination-Based Automatic Evaluation Metric for Natural Language Generation.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EACL 2023, 2023

Multimodal Graph Transformer for Multimodal Question Answering.

[BibT_eX]

[DOI]

Xuehai He

Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 2023

T2IAT: Measuring Valence and Stereotypical Biases in Text-to-Image Generation.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

Parameter-Efficient Model Adaptation for Vision Transformers.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022

Anticipating the Unseen Discrepancy for Vision and Language Navigation.

[BibT_eX]

[DOI]

CoRR, 2022

Neuro-Symbolic Causal Language Planning with Commonsense Prompting.

[BibT_eX]

[DOI]

CoRR, 2022

Aerial Vision-and-Dialog Navigation.

[BibT_eX]

[DOI]

CoRR, 2022

Parameter-efficient Fine-tuning for Vision Transformers.

[BibT_eX]

[DOI]

CoRR, 2022

VLMbench: A Compositional Benchmark for Vision-and-Language Manipulation.

[BibT_eX]

[DOI]

Kaizhi Zheng

Xiaotong Chen

Odest Chadwicke Jenkins

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Diagnosing Vision-and-Language Navigation: What Really Matters.

[BibT_eX]

[DOI]

Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

Imagination-Augmented Natural Language Understanding.

[BibT_eX]

[DOI]

Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

Understanding Instance-Level Impact of Fairness Constraints.

[BibT_eX]

[DOI]

Yang Liu

Proceedings of the International Conference on Machine Learning, 2022

CPL: Counterfactual Prompt Learning for Vision and Language Models.

[BibT_eX]

[DOI]

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

FedVLN: Privacy-Preserving Federated Vision-and-Language Navigation.

[BibT_eX]

[DOI]

Kaiwen Zhou

Proceedings of the Computer Vision - ECCV 2022, 2022

Language-Driven Artistic Style Transfer.

[BibT_eX]

[DOI]

Tsu-Jui Fu

Proceedings of the Computer Vision - ECCV 2022, 2022

M<sup>3</sup>L: Language-based Video Editing via Multi-Modal Multi-Level Transformers.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Compositional Temporal Grounding with Structured Variational Cross-Graph Correspondence Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Assessing Multilingual Fairness in Pre-trained Multimodal Representations.

[BibT_eX]

[DOI]

Yang Liu

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022, 2022

Interpretable Research Replication Prediction via Variational Contextual Consistency Sentence Masking.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022, 2022

Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions.

[BibT_eX]

[DOI]

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

2021

Vision-Language Navigation Policy Learning and Adaptation.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2021

CUDA-GR: Controllable Unsupervised Domain Adaptation for Gaze Redirection.

[BibT_eX]

[DOI]

Swati Jindal

CoRR, 2021

Language-Driven Image Style Transfer.

[BibT_eX]

[DOI]

Tsu-Jui Fu

CoRR, 2021

Language-based Video Editing via Multi-Modal Multi-Level Transformer.

[BibT_eX]

[DOI]

CoRR, 2021

Visual Question Rewriting for Increasing Response Rate.

[BibT_eX]

[DOI]

Proceedings of the SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2021

VALUE: A Multi-Task Benchmark for Video-and-Language Understanding Evaluation.

[BibT_eX]

[DOI]

Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, 2021

Are Gender-Neutral Queries Really Gender-Neutral? Mitigating Gender Bias in Image Search.

[BibT_eX]

[DOI]

Yang Liu

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

Multimodal Text Style Transfer for Outdoor Vision-and-Language Navigation.

[BibT_eX]

[DOI]

Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 2021

L2C: Describing Visual Differences Needs Semantic Understanding of Individuals.

[BibT_eX]

[DOI]

Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 2021

2020

Closing the Loop Between Language and Vision for Embodied Agents.

[BibT_eX]

[DOI]

PhD thesis, 2020

Relational Graph Learning for Grounded Video Description Generation.

[BibT_eX]

[DOI]

Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Towards Understanding Sample Variance in Visually Grounded Language Generation: Evaluations and Observations.

[BibT_eX]

[DOI]

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

Learning to Stop: A Simple yet Effective Approach to Urban Vision-Language Navigation.

[BibT_eX]

[DOI]

Jiannan Xiang

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020, 2020

SSCR: Iterative Language-Based Image Editing via Self-Supervised Counterfactual Reasoning.

[BibT_eX]

[DOI]

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

Environment-Agnostic Multitask Learning for Natural Language Grounded Navigation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

Counterfactual Vision-and-Language Navigation via Adversarial Path Sampler.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

REVERIE: Remote Embodied Visual Referring Expression in Real Indoor Environments.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Unsupervised Reinforcement Learning of Transferable Meta-Skills for Embodied Navigation.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Generative Adversarial Zero-Shot Relational Learning for Knowledge Graphs.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019

Counterfactual Vision-and-Language Navigation via Adversarial Path Sampling.

[BibT_eX]

[DOI]

CoRR, 2019

Cross-Lingual Vision-Language Navigation.

[BibT_eX]

[DOI]

CoRR, 2019

Not All Actions Are Equal: Learning to Stop in Language-Grounded Urban Navigation.

[BibT_eX]

[DOI]

Jiannan Xiang

Proceedings of the Visually Grounded Interaction and Language (ViGIL), 2019

Natural Language Grounded Multitask Navigation.

[BibT_eX]

[DOI]

Proceedings of the Visually Grounded Interaction and Language (ViGIL), 2019

Extract and Edit: An Alternative to Back-Translation for Unsupervised Neural Machine Translation.

[BibT_eX]

[DOI]

Jiawei Wu

Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019

VaTeX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

TIGEr: Text-to-Image Grounding for Image Caption Evaluation.

[BibT_eX]

[DOI]

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019

MAN: Moment Alignment Network for Natural Language Moment Retrieval via Iterative Graph Adjustment.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Self-Supervised Dialogue Learning.

[BibT_eX]

[DOI]

Jiawei Wu

Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019

Self-Supervised Learning for Contextualized Extractive Summarization.

[BibT_eX]

[DOI]

Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019

Towards Generating Long and Coherent Text with Multi-Level Latent Variable Models.

[BibT_eX]

[DOI]

Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019

Learning to Compose Topic-Aware Mixture of Experts for Zero-Shot Video Captioning.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018

Virtual dictionary based kernel sparse representation for face recognition.

[BibT_eX]

[DOI]

Pattern Recognit., 2018

Enhancing the Robustness of Prior Network in Out-of-Distribution Detection.

[BibT_eX]

[DOI]

CoRR, 2018

Watch, Listen, and Describe: Globally and Locally Aligned Cross-Modal Attentions for Video Captioning.

[BibT_eX]

[DOI]

Yuan-Fang Wang