Jean-Baptiste Alayrac

IEEE Trans. Pattern Anal. Mach. Intell., 2024

Capabilities of Gemini Models in Medicine.

[BibT_eX]

[DOI]

Juanma Zambrano Chaves

Philip Andrew Mansfield

Alan Karthikesalingam

Vivek Natarajan

CoRR, 2024

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context.

[BibT_eX]

[DOI]

et al.

CoRR, 2024

2023

Making the Most of What You Have: Adapting Pre-trained Visual Language Models in the Low-data Regime.

[BibT_eX]

[DOI]

Chuhan Zhang

Jiajun Shen

Pauline Luc

CoRR, 2023

Three ways to improve feature alignment for open vocabulary detection.

[BibT_eX]

[DOI]

CoRR, 2023

Zorro: the masked multimodal transformer.

[BibT_eX]

[DOI]

CoRR, 2023

2022

Multi-Task Learning of Object State Changes from Uncurated Videos.

[BibT_eX]

[DOI]

Tomás Soucek

CoRR, 2022

Flamingo: a Visual Language Model for Few-Shot Learning.

[BibT_eX]

[DOI]

CoRR, 2022

Flamingo: a Visual Language Model for Few-Shot Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

General-purpose, long-context autoregressive modeling with Perceiver AR.

[BibT_eX]

[DOI]

Jesse H. Engel

Proceedings of the International Conference on Machine Learning, 2022

Perceiver IO: A General Architecture for Structured Inputs & Outputs.

[BibT_eX]

[DOI]

Andrew Jaegle

Sebastian Borgeaud

Proceedings of the Tenth International Conference on Learning Representations, 2022

Towards Learning Universal Audio Representations.

[BibT_eX]

[DOI]

Sander Dieleman

Aäron van den Oord

Proceedings of the IEEE International Conference on Acoustics, 2022

Look for the Change: Learning Object States and State-Modifying Actions from Untrimmed Web Videos.

[BibT_eX]

[DOI]

Tomás Soucek

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021

Decoupling the Role of Data, Attention, and Losses in Multimodal Transformers.

[BibT_eX]

[DOI]

Lisa Anne Hendricks

John Mellor

Rosalia Schneider

Aida Nematzadeh

Trans. Assoc. Comput. Linguistics, 2021

Generative Art Using Neural Visual Grammars and Dual Encoders.

[BibT_eX]

[DOI]

Chrisantha Fernando

S. M. Ali Eslami

Piotr Mirowski

Dylan Banarse

Simon Osindero

CoRR, 2021

Multimodal Self-Supervised Learning of General Audio Representations.

[BibT_eX]

[DOI]

Luyu Wang

Pauline Luc

Adrià Recasens

Aäron van den Oord

CoRR, 2021

Broaden Your Views for Self-Supervised Video Learning.

[BibT_eX]

[DOI]

Adrià Recasens

Pauline Luc

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Efficient Visual Pretraining with Contrastive Detection.

[BibT_eX]

[DOI]

Olivier J. Hénaff

Skanda Koppula

Aäron van den Oord

Oriol Vinyals

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Machine Translation Decoding beyond Beam Search.

[BibT_eX]

[DOI]

Rémi Leblond

Laurent Sifre

Miruna Pislar

Jean-Baptiste Lespiau

Ioannis Antonoglou

Karen Simonyan

Oriol Vinyals

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

Thinking Fast and Slow: Efficient Text-to-Visual Retrieval With Transformers.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020

RareAct: A video dataset of unusual interactions.

[BibT_eX]

[DOI]

CoRR, 2020

Self-Supervised MultiModal Versatile Networks.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Learning Actionness via Long-Range Temporal Order Verification.

[BibT_eX]

[DOI]

Dimitri Zhukov

Proceedings of the Computer Vision - ECCV 2020, 2020

Visual Grounding in Video for Unsupervised Word Translation.

[BibT_eX]

[DOI]

Gunnar A. Sigurdsson

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

End-to-End Learning of Visual Representations From Uncurated Instructional Videos.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Learning to Segment Actions from Observation and Narration.

[BibT_eX]

[DOI]

Daniel Fried

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

2019

Are Labels Required for Improving Adversarial Robustness?

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips.

[BibT_eX]

[DOI]

Dimitri Zhukov

Makarand Tapaswi

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Controllable Attention for Structured Layered Video Decomposition.

[BibT_eX]

[DOI]

Relja Arandjelovic

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Cross-Task Weakly Supervised Learning From Instructional Videos.

[BibT_eX]

[DOI]

Dimitri Zhukov

Ramazan Gokberk Cinbis

David F. Fouhey

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

The Visual Centrifuge: Model-Free Layered Video Representations.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018

Structured Learning from Videos and Language. (Apprentissage structuré à partir de vidéos et langage).

[BibT_eX]

[DOI]

PhD thesis, 2018

Learning from Narrated Instruction Videos.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2018

Learning to Localize and Align Fine-Grained Actions to Sparse Instructions.

[BibT_eX]

[DOI]

Meera Hahn

Nataniel Ruiz

James M. Rehg

CoRR, 2018

A flexible model for training action localization with varying levels of supervision.

[BibT_eX]

[DOI]

Guilhem Chéron

Cordelia Schmid

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

SEARNN: Training RNNs with global-local losses.

[BibT_eX]

[DOI]

Rémi Leblond

Anton Osokin

Proceedings of the 6th International Conference on Learning Representations, 2018

2017

Joint Discovery of Object States and Manipulating Actions.

[BibT_eX]

[DOI]

CoRR, 2017

Learning from Video and Text via Large-Scale Discriminative Clustering.

[BibT_eX]

[DOI]

Piotr Bojanowski

Proceedings of the IEEE International Conference on Computer Vision, 2017

Joint Discovery of Object States and Manipulation Actions.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Computer Vision, 2017

2016

Minding the Gaps for Block Frank-Wolfe Optimization of Structured SVMs.

[BibT_eX]

[DOI]

Anton Osokin

Isabella Lukasewitz

Puneet Kumar Dokania

Proceedings of the 33nd International Conference on Machine Learning, 2016

Unsupervised Learning from Narrated Instruction Videos.

[BibT_eX]

[DOI]