Anna Rohrbach

Orcid: 0000-0003-1161-6006

Affiliations:
  • TU Darmstadt, Germany
  • University of California, Berkeley, CA, USA (former)
  • Max Planck Institute for Informatics, Saarbrücken, Germany (former)


According to our database1, Anna Rohrbach authored at least 62 papers between 2014 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Simple Token-Level Confidence Improves Caption Correctness.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024

Shape-Guided Diffusion with Inside-Outside Attention.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024

2023
Object-based (yet Class-agnostic) Video Domain Adaptation.
CoRR, 2023

More Control for Free! Image Synthesis with Semantic Diffusion Guidance.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023

Watch Those Words: Video Falsification Detection Using Word-Conditioned Facial Motion.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023

Using Language to Extend to Unseen Domains.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Revisiting Generalizability in Deepfake Detection: Improving Metrics and Stabilizing Transfer.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

MammalNet: A Large-Scale Video Benchmark for Mammal Recognition and Behavior Understanding.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
G^3: Geolocation via Guidebook Grounding.
CoRR, 2022

TL;DW? Summarizing Instructional Videos with Task Relevance & Cross-Modal Saliency.
CoRR, 2022

Structured Video Tokens @ Ego4D PNR Temporal Localization Challenge 2022.
CoRR, 2022

K-LITE: Learning Transferable Visual Models with External Knowledge.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Bringing Image Scene Structure to Video via Frame-Clip Consistency of Object Tokens.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Exposing the Limits of Video-Text Models through Contrast Sets.
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

Twitter-COMMs: Detecting Climate, COVID, and Military Multimodal Misinformation.
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

How Much Can CLIP Benefit Vision-and-Language Tasks?
Proceedings of the Tenth International Conference on Learning Representations, 2022

Focus! Relevant and Sufficient Context Selection for News Image Captioning.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

G3: Geolocation via Guidebook Grounding.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

Reliable Visual Question Answering: Abstain Rather Than Answer Incorrectly.
Proceedings of the Computer Vision - ECCV 2022, 2022

TL;DW? Summarizing Instructional Videos with Task Relevance and Cross-Modal Saliency.
Proceedings of the Computer Vision - ECCV 2022, 2022

The Abduction of Sherlock Holmes: A Dataset for Visual Abductive Reasoning.
Proceedings of the Computer Vision - ECCV 2022, 2022

On Guiding Visual Attention with Language Specification.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Object-Region Video Transformers.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

DETReg: Unsupervised Pretraining with Region Priors for Object Detection.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

ReCLIP: A Strong Zero-Shot Baseline for Referring Expression Comprehension.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

2021
Benchmark for Compositional Text-to-Image Synthesis.
Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, 2021

CLIP-It! Language-Guided Video Summarization.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Compositional Video Synthesis with Action Graphs.
Proceedings of the 38th International Conference on Machine Learning, 2021

NewsCLIPpings: Automatic Generation of Out-of-Context Multimodal Media.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

2020
Identity-Aware Multi-sentence Video Description.
Proceedings of the Computer Vision - ECCV 2020, 2020

Advisable Learning for Self-Driving Vehicles by Internalizing Observation-to-Action Rules.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019
Viewpoint Invariant Change Captioning.
CoRR, 2019

Robust Change Captioning.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Language-Conditioned Graph Networks for Relational Reasoning.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Adversarial Inference for Multi-Sentence Video Description.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019

Are You Looking? Grounding to Multiple Modalities in Vision-and-Language Navigation.
Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019

2018
Women also Snowboard: Overcoming Bias in Captioning Models (Extended Abstract).
CoRR, 2018

Women also Snowboard: Overcoming Bias in Captioning Models.
CoRR, 2018

Speaker-Follower Models for Vision-and-Language Navigation.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

A vision-grounded dataset for predicting typical locations for verbs.
Proceedings of the Eleventh International Conference on Language Resources and Evaluation, 2018

Object Hallucination in Image Captioning.
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31, 2018

Textual Explanations for Self-Driving Vehicles.
Proceedings of the Computer Vision - ECCV 2018, 2018

Video Object Segmentation with Referring Expressions.
Proceedings of the Computer Vision - ECCV 2018 Workshops, 2018

Women Also Snowboard: Overcoming Bias in Captioning Models.
Proceedings of the Computer Vision - ECCV 2018, 2018

Fooling Vision and Language Models Despite Localization and Attention Mechanism.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Multimodal Explanations: Justifying Decisions and Pointing to the Evidence.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Video Object Segmentation with Language Referring Expressions.
Proceedings of the Computer Vision - ACCV 2018, 2018

2017
Generation and grounding of natural language descriptions for visual data.
PhD thesis, 2017

Movie Description.
Int. J. Comput. Vis., 2017

Attentive Explanations: Justifying Decisions and Pointing to the Evidence (Extended Abstract).
CoRR, 2017

Can you fool AI with adversarial examples on a visual Turing test?
CoRR, 2017

Generating Descriptions with Grounded and Co-referenced People.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

A Dataset and Exploration of Models for Understanding Video Data through Fill-in-the-Blank Question-Answering.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

Gradient-free Policy Architecture Search and Adaptation.
Proceedings of the 1st Annual Conference on Robot Learning, CoRL 2017, Mountain View, 2017

2016
Recognizing Fine-Grained and Composite Activities Using Hand-Centric Features and Script Data.
Int. J. Comput. Vis., 2016

Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding.
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016

Grounding of Textual Phrases in Images by Reconstruction.
Proceedings of the Computer Vision - ECCV 2016, 2016

Commonsense in Parts: Mining Part-Whole Relations from the Web and Image Tags.
Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016

2015
The Long-Short Story of Movie Description.
Proceedings of the Pattern Recognition - 37th German Conference, 2015

A dataset for Movie Description.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015

2014
Coherent Multi-Sentence Video Description with Variable Level of Detail.
CoRR, 2014

Coherent Multi-sentence Video Description with Variable Level of Detail.
Proceedings of the Pattern Recognition - 36th German Conference, 2014


  Loading...