Lucas Beyer

CoRR, February, 2025

2024

PaliGemma 2: A Family of Versatile VLMs for Transfer.

[BibT_eX]

[DOI]

CoRR, 2024

PaliGemma: A versatile 3B VLM for transfer.

[BibT_eX]

[DOI]

CoRR, 2024

LocCa: Visual Pretraining with Location-aware Captioners.

[BibT_eX]

[DOI]

Ibrahim M. Alabdulmohsin

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

No Filter: Cultural and Socioeconomic Diversity in Contrastive Vision-Language Models.

[BibT_eX]

[DOI]

Ibrahim M. Alabdulmohsin

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

On Scaling Up a Multilingual Vision and Language Model.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

PaLI-3 Vision Language Models: Smaller, Faster, Stronger.

[BibT_eX]

[DOI]

CoRR, 2023

PaLI-X: On Scaling up a Multilingual Vision and Language Model.

[BibT_eX]

[DOI]

CoRR, 2023

A Study of Autoregressive Decoders for Multi-Tasking in Computer Vision.

[BibT_eX]

[DOI]

CoRR, 2023

Image Captioners Are Scalable Vision Learners Too.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Three Towers: Flexible Contrastive Learning with Pretrained Image Models.

[BibT_eX]

[DOI]

Effrosyni Kokiopoulou

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Getting ViT in Shape: Scaling Laws for Compute-Optimal Model Design.

[BibT_eX]

[DOI]

Ibrahim M. Alabdulmohsin

Alexander Kolesnikov

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Tuning Computer Vision Models With Task Rewards.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

Scaling Vision Transformers to 22 Billion Parameters.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

PaLI: A Jointly-Scaled Multilingual Language-Image Model.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Sigmoid Loss for Language Image Pre-Training.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

FlexiViT: One Model for All Patch Sizes.

[BibT_eX]

[DOI]

Filip Pavetic

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

Deep visual human sensing with application in robotics.

[BibT_eX]

[DOI]

PhD thesis, 2022

How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2022

VeLO: Training Versatile Learned Optimizers by Scaling Up.

[BibT_eX]

[DOI]

Jascha Sohl-Dickstein

CoRR, 2022

PaLI: A Jointly-Scaled Multilingual Language-Image Model.

[BibT_eX]

[DOI]

CoRR, 2022

Better plain ViT baselines for ImageNet-1k.

[BibT_eX]

[DOI]

Alexander Kolesnikov

CoRR, 2022

UViM: A Unified Modeling Approach for Vision with Learned Guiding Codes.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

The Efficiency Misnomer.

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Learning Representations, 2022

A Simple Single-Scale Vision Transformer for Object Detection and Instance Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

LiT: Zero-Shot Transfer with Locked-image text Tuning.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Scaling Vision Transformers.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Kubric: A scalable dataset generator.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Knowledge distillation: A good teacher is patient and consistent.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021

A Simple Single-Scale Vision Transformer for Object Localization and Instance Segmentation.

[BibT_eX]

[DOI]

CoRR, 2021

SI-Score: An image dataset for fine-grained analysis of robustness to object location, rotation and size.

[BibT_eX]

[DOI]

CoRR, 2021

MLP-Mixer: An all-MLP Architecture for Vision.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale.

[BibT_eX]

[DOI]

Proceedings of the 9th International Conference on Learning Representations, 2021

On Robustness and Transferability of Convolutional Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020

Are we done with ImageNet?

[BibT_eX]

[DOI]

CoRR, 2020

Big Transfer (BiT): General Visual Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

2019

Large Scale Learning of General Visual Representations for Transfer.

[BibT_eX]

[DOI]

CoRR, 2019

The Visual Task Adaptation Benchmark.

[BibT_eX]

[DOI]

CoRR, 2019

MULEX: Disentangling Exploitation from Exploration in Deep RL.

[BibT_eX]

[DOI]

CoRR, 2019

Deep multi-class learning from label proportions.

[BibT_eX]

[DOI]

CoRR, 2019

S<sup>4</sup>L: Self-Supervised Semi-Supervised Learning.

[BibT_eX]

[DOI]

CoRR, 2019

S4L: Self-Supervised Semi-Supervised Learning.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Revisiting Self-Supervised Visual Representation Learning.

[BibT_eX]

[DOI]

Alexander Kolesnikov

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018

Deep Person Detection in Two-Dimensional Range Data.

[BibT_eX]

[DOI]

IEEE Robotics Autom. Lett., 2018

Deep Person Detection in 2D Range Data.

[BibT_eX]

[DOI]

CoRR, 2018

Detection- Tracking for Efficient Person Analysis: The DetTA Pipeline.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2018

2017

The STRANDS Project: Long-Term Autonomy in Everyday Environments.

[BibT_eX]

[DOI]

IEEE Robotics Autom. Mag., 2017

DROW: Real-Time Deep Learning-Based Wheelchair Detection in 2-D Range Data.

[BibT_eX]

[DOI]

IEEE Robotics Autom. Lett., 2017

The Atari Grand Challenge Dataset.

[BibT_eX]

[DOI]

CoRR, 2017

In Defense of the Triplet Loss for Person Re-Identification.

[BibT_eX]

[DOI]

CoRR, 2017

Towards a Principled Integration of Multi-camera Re-identification and Tracking Through Optimal Bayes Filters.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2017

2016

DROW: Real-Time Deep Learning based Wheelchair Detection in 2D Range Data.

[BibT_eX]

[DOI]

CoRR, 2016

2015

SPENCER: A Socially Aware Service Robot for Passenger Guidance and Help in Busy Airports.

[BibT_eX]

[DOI]

Proceedings of the Field and Service Robotics, 2015

Biternion Nets: Continuous Head Pose Regression from Discrete Training Labels.

[BibT_eX]

[DOI]

Proceedings of the Pattern Recognition - 37th German Conference, 2015

2013

Streaming Data from HDD to GPUs for Sustained Peak Performance

[BibT_eX]

[DOI]

Paolo Bientinesi

CoRR, 2013

GWAS on GPUs: Streaming Data from HDD for Sustained Performance.

[BibT_eX]

[DOI]