Xiaohua Zhai

CoRR, July, 2025

Zero-shot Autonomous Microscopy for Scalable and Intelligent Characterization of 2D Materials.

[BibT_eX]

[DOI]

CoRR, April, 2025

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features.

[BibT_eX]

[DOI]

Michael Tschannen

Alexey A. Gritsenko

Xiao Wang

Muhammad Ferjad Naeem

CoRR, February, 2025

Scaling Pre-training to One Hundred Billion Data for Vision Language Models.

[BibT_eX]

[DOI]

Xiao Wang

CoRR, February, 2025

Recursive Inference Scaling: A Winning Path to Scalable Inference in Language and Multimodal Systems.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

2024

Guest Editorial: Special Issue on the Promises and Dangers of Large Vision Models.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., April, 2024

PaliGemma 2: A Family of Versatile VLMs for Transfer.

[BibT_eX]

[DOI]

CoRR, 2024

PaliGemma: A versatile 3B VLM for transfer.

[BibT_eX]

[DOI]

CoRR, 2024

Toward a Diffusion-Based Generalist for Dense Vision Tasks.

[BibT_eX]

[DOI]

Muhammad Ferjad Naeem

Bernt Schiele

Federico Tombari

CoRR, 2024

LocCa: Visual Pretraining with Location-aware Captioners.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

No Filter: Cultural and Socioeconomic Diversity in Contrastive Vision-Language Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

CLIP the Bias: How Useful is Balancing Data in Multimodal Learning?

[BibT_eX]

[DOI]

Xiao Wang

Andreas Peter Steiner

Priya Goyal

Alexander D'Amour

Proceedings of the Twelfth International Conference on Learning Representations, 2024

SILC: Improving Vision Language Pretraining with Self-distillation.

[BibT_eX]

[DOI]

Muhammad Ferjad Naeem

Proceedings of the Computer Vision - ECCV 2024, 2024

On Scaling Up a Multilingual Vision and Language Model.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

PaLI-3 Vision Language Models: Smaller, Faster, Stronger.

[BibT_eX]

[DOI]

CoRR, 2023

PaLI-X: On Scaling up a Multilingual Vision and Language Model.

[BibT_eX]

[DOI]

CoRR, 2023

A Study of Autoregressive Decoders for Multi-Tasking in Computer Vision.

[BibT_eX]

[DOI]

CoRR, 2023

Image Captioners Are Scalable Vision Learners Too.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Three Towers: Flexible Contrastive Learning with Pretrained Image Models.

[BibT_eX]

[DOI]

Effrosyni Kokiopoulou

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Getting ViT in Shape: Scaling Laws for Compute-Optimal Model Design.

[BibT_eX]

[DOI]

Alexander Kolesnikov

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Tuning Computer Vision Models With Task Rewards.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

Scaling Vision Transformers to 22 Billion Parameters.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

Sigmoid Loss for Language Image Pre-Training.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

FlexiViT: One Model for All Patch Sizes.

[BibT_eX]

[DOI]

Filip Pavetic

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2022

Underspecification Presents Challenges for Credibility in Modern Machine Learning.

[BibT_eX]

[DOI]

J. Mach. Learn. Res., 2022

PaLI: A Jointly-Scaled Multilingual Language-Image Model.

[BibT_eX]

[DOI]

CoRR, 2022

Simple Open-Vocabulary Object Detection with Vision Transformers.

[BibT_eX]

[DOI]

CoRR, 2022

Better plain ViT baselines for ImageNet-1k.

[BibT_eX]

[DOI]

Alexander Kolesnikov

CoRR, 2022

Revisiting Neural Scaling Laws in Language and Vision.

[BibT_eX]

[DOI]

Behnam Neyshabur

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

UViM: A Unified Modeling Approach for Vision with Learned Guiding Codes.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Simple Open-Vocabulary Object Detection.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

A Simple Single-Scale Vision Transformer for Object Detection and Instance Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

LiT: Zero-Shot Transfer with Locked-image text Tuning.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Scaling Vision Transformers.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Knowledge distillation: A good teacher is patient and consistent.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021

A Simple Single-Scale Vision Transformer for Object Localization and Instance Segmentation.

[BibT_eX]

[DOI]

CoRR, 2021

SI-Score: An image dataset for fine-grained analysis of robustness to object location, rotation and size.

[BibT_eX]

[DOI]

CoRR, 2021

Comparing Transfer and Meta Learning Approaches on a Unified Few-Shot Classification Benchmark.

[BibT_eX]

[DOI]

CoRR, 2021

MLP-Mixer: An all-MLP Architecture for Vision.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Revisiting the Calibration of Modern Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

A Unified Few-Shot Classification Benchmark to Compare Transfer and Meta Learning Approaches.

[BibT_eX]

[DOI]

Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, 2021

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale.

[BibT_eX]

[DOI]

Proceedings of the 9th International Conference on Learning Representations, 2021

On Robustness and Transferability of Convolutional Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020

Are we done with ImageNet?

[BibT_eX]

[DOI]

CoRR, 2020

Training General Representations for Remote Sensing Using in-Domain Knowledge.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Geoscience and Remote Sensing Symposium, 2020

Big Transfer (BiT): General Visual Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

2019

Large Scale Learning of General Visual Representations for Transfer.

[BibT_eX]

[DOI]

CoRR, 2019

In-domain representation learning for remote sensing.

[BibT_eX]

[DOI]

CoRR, 2019

The Visual Task Adaptation Benchmark.

[BibT_eX]

[DOI]

CoRR, 2019

S<sup>4</sup>L: Self-Supervised Semi-Supervised Learning.

[BibT_eX]

[DOI]

CoRR, 2019

High-Fidelity Image Generation With Fewer Labels.

[BibT_eX]

[DOI]

Proceedings of the 36th International Conference on Machine Learning, 2019

A Large-Scale Study on Regularization and Normalization in GANs.

[BibT_eX]

[DOI]

Proceedings of the 36th International Conference on Machine Learning, 2019

S4L: Self-Supervised Semi-Supervised Learning.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Revisiting Self-Supervised Visual Representation Learning.

[BibT_eX]

[DOI]

Alexander Kolesnikov

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Self-Supervised GANs via Auxiliary Rotation Loss.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018

Self-Supervised Generative Adversarial Networks.

[BibT_eX]

[DOI]

CoRR, 2018

Self-Supervised GAN to Counter Forgetting.

[BibT_eX]

[DOI]

Ting Chen

Neil Houlsby

CoRR, 2018

The GAN Landscape: Losses, Architectures, Regularization, and Normalization.

[BibT_eX]

[DOI]

CoRR, 2018

MemGEN: Memory is All You Need.

[BibT_eX]

[DOI]

CoRR, 2018

2016

Semi-Supervised Cross-Media Feature Learning With Unified Patch Graph Regularization.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., 2016

2014

Learning Cross-Media Joint Representation With Sparse and Semisupervised Regularization.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., 2014

2013

Cross-media retrieval by intra-media and inter-media correlation mining.

[BibT_eX]

[DOI]

Multim. Syst., 2013

PKU_ICST at TRECVID2013 : Instance Search Task.

[BibT_eX]

[DOI]

Proceedings of the 2013 TREC Video Retrieval Evaluation, 2013

Cross-media retrieval by cluster-based correlation analysis.

[BibT_eX]

[DOI]

Ding Ma

Proceedings of the IEEE International Conference on Image Processing, 2013

Heterogeneous Metric Learning with Joint Graph Regularization for Cross-Media Retrieval.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence, 2013

2012

PKU-ICST @TRECVID2012: Known-item Search Task.

[BibT_eX]

[DOI]

Proceedings of the 2012 TREC Video Retrieval Evaluation, 2012

Effective Heterogeneous Similarity Measure with Nearest Neighbors for Cross-Media Retrieval.

[BibT_eX]

[DOI]

Proceedings of the Advances in Multimedia Modeling - 18th International Conference, 2012

PDSS: patch-descriptor-similarity space for effective face verification.

[BibT_eX]

[DOI]

Proceedings of the 20th ACM Multimedia Conference, MM '12, Nara, Japan, October 29, 2012

Tri-space and ranking based heterogeneous similarity measure for cross-media retrieval.

[BibT_eX]

[DOI]

Li Ling

Proceedings of the 21st International Conference on Pattern Recognition, 2012

Cross-modality correlation propagation for cross-media retrieval.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

2009

PKU-ICST at TRECVID2009: High Level Feature Extraction and Search.

[BibT_eX]

[DOI]

Proceedings of the TRECVID 2009 workshop participants notebook papers, 2009

2006

Adaptive Control Based on Recurrent Fuzzy Wavelet Neural Network and Its Application on Robotic Tracking Control.

[BibT_eX]

[DOI]

Wei Sun

Yaonan Wang