Antoine Miech

IEEE Trans. Pattern Anal. Mach. Intell., 2024

A Simple Recipe for Contrastively Pre-Training Video-First Encoders Beyond 16 Frames.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

Making the Most of What You Have: Adapting Pre-trained Visual Language Models in the Low-data Regime.

[BibT_eX]

[DOI]

Chuhan Zhang

Jiajun Shen

Pauline Luc

CoRR, 2023

Zorro: the masked multimodal transformer.

[BibT_eX]

[DOI]

CoRR, 2023

Perception Test: A Diagnostic Benchmark for Multimodal Video Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

Multi-Task Learning of Object State Changes from Uncurated Videos.

[BibT_eX]

[DOI]

Tomás Soucek

CoRR, 2022

Flamingo: a Visual Language Model for Few-Shot Learning.

[BibT_eX]

[DOI]

CoRR, 2022

Zero-Shot Video Question Answering via Frozen Bidirectional Language Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Flamingo: a Visual Language Model for Few-Shot Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

TubeDETR: Spatio-Temporal Video Grounding with Transformers.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Look for the Change: Learning Object States and State-Modifying Actions from Untrimmed Web Videos.

[BibT_eX]

[DOI]

Tomás Soucek

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021

Just Ask: Learning to Answer Questions from Millions of Narrated Videos.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Thinking Fast and Slow: Efficient Text-to-Visual Retrieval With Transformers.

[BibT_eX]

[DOI]

Andrew Zisserman

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020

Large-scale Learning from Video and Natural Language. (Apprentissage vidéo et langage naturel à grande échelle).

[BibT_eX]

[DOI]

PhD thesis, 2020

RareAct: A video dataset of unusual interactions.

[BibT_eX]

[DOI]

Andrew Zisserman

CoRR, 2020

The End-of-End-to-End: A Video Understanding Pentathlon Challenge (2020).

[BibT_eX]

[DOI]

CoRR, 2020

End-to-End Learning of Visual Representations From Uncurated Instructional Videos.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019

HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips.

[BibT_eX]

[DOI]

Dimitri Zhukov

Makarand Tapaswi

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Leveraging the Present to Anticipate the Future in Videos.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019

2018

Learning a Text-Video Embedding from Incomplete and Heterogeneous Data.

[BibT_eX]

[DOI]

CoRR, 2018

2017

Learnable pooling with Context Gating for video classification.

[BibT_eX]

[DOI]

CoRR, 2017

Learning from Video and Text via Large-Scale Discriminative Clustering.

[BibT_eX]

[DOI]

Piotr Bojanowski