Jean-Baptiste Alayrac

Orcid: 0000-0002-3071-4157

According to our database1, Jean-Baptiste Alayrac authored at least 40 papers between 2016 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context.
CoRR, 2024

2023
Gemini: A Family of Highly Capable Multimodal Models.
CoRR, 2023

Making the Most of What You Have: Adapting Pre-trained Visual Language Models in the Low-data Regime.
CoRR, 2023

Three ways to improve feature alignment for open vocabulary detection.
CoRR, 2023

Zorro: the masked multimodal transformer.
CoRR, 2023

2022
Multi-Task Learning of Object State Changes from Uncurated Videos.
CoRR, 2022

Flamingo: a Visual Language Model for Few-Shot Learning.
CoRR, 2022


General-purpose, long-context autoregressive modeling with Perceiver AR.
Proceedings of the International Conference on Machine Learning, 2022

Perceiver IO: A General Architecture for Structured Inputs & Outputs.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Towards Learning Universal Audio Representations.
Proceedings of the IEEE International Conference on Acoustics, 2022

Look for the Change: Learning Object States and State-Modifying Actions from Untrimmed Web Videos.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021
Decoupling the Role of Data, Attention, and Losses in Multimodal Transformers.
Trans. Assoc. Comput. Linguistics, 2021

Generative Art Using Neural Visual Grammars and Dual Encoders.
CoRR, 2021

Multimodal Self-Supervised Learning of General Audio Representations.
CoRR, 2021

Broaden Your Views for Self-Supervised Video Learning.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Efficient Visual Pretraining with Contrastive Detection.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Machine Translation Decoding beyond Beam Search.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

Thinking Fast and Slow: Efficient Text-to-Visual Retrieval With Transformers.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020
RareAct: A video dataset of unusual interactions.
CoRR, 2020

Self-Supervised MultiModal Versatile Networks.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Learning Actionness via Long-Range Temporal Order Verification.
Proceedings of the Computer Vision - ECCV 2020, 2020

Visual Grounding in Video for Unsupervised Word Translation.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

End-to-End Learning of Visual Representations From Uncurated Instructional Videos.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Learning to Segment Actions from Observation and Narration.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

2019
Are Labels Required for Improving Adversarial Robustness?
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Controllable Attention for Structured Layered Video Decomposition.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Cross-Task Weakly Supervised Learning From Instructional Videos.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

The Visual Centrifuge: Model-Free Layered Video Representations.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018
Structured Learning from Videos and Language. (Apprentissage structuré à partir de vidéos et langage).
PhD thesis, 2018

Learning from Narrated Instruction Videos.
IEEE Trans. Pattern Anal. Mach. Intell., 2018

Learning to Localize and Align Fine-Grained Actions to Sparse Instructions.
CoRR, 2018

A flexible model for training action localization with varying levels of supervision.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

SEARNN: Training RNNs with global-local losses.
Proceedings of the 6th International Conference on Learning Representations, 2018

2017
Joint Discovery of Object States and Manipulating Actions.
CoRR, 2017

Learning from Video and Text via Large-Scale Discriminative Clustering.
Proceedings of the IEEE International Conference on Computer Vision, 2017

Joint Discovery of Object States and Manipulation Actions.
Proceedings of the IEEE International Conference on Computer Vision, 2017

2016
Minding the Gaps for Block Frank-Wolfe Optimization of Structured SVMs.
Proceedings of the 33nd International Conference on Machine Learning, 2016

Unsupervised Learning from Narrated Instruction Videos.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016


  Loading...