Xiaohua Zhai

According to our database1, Xiaohua Zhai authored at least 65 papers between 2006 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Guest Editorial: Special Issue on the Promises and Dangers of Large Vision Models.
Int. J. Comput. Vis., April, 2024

LocCa: Visual Pretraining with Location-aware Captioners.
CoRR, 2024

CLIP the Bias: How Useful is Balancing Data in Multimodal Learning?
CoRR, 2024

2023
SILC: Improving Vision Language Pretraining with Self-Distillation.
CoRR, 2023

PaLI-3 Vision Language Models: Smaller, Faster, Stronger.
CoRR, 2023

PaLI-X: On Scaling up a Multilingual Vision and Language Model.
CoRR, 2023

Three Towers: Flexible Contrastive Learning with Pretrained Image Models.
CoRR, 2023

A Study of Autoregressive Decoders for Multi-Tasking in Computer Vision.
CoRR, 2023

Scaling Vision Transformers to 22 Billion Parameters.
CoRR, 2023

Image Captioners Are Scalable Vision Learners Too.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Three Towers: Flexible Contrastive Learning with Pretrained Image Models.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Getting ViT in Shape: Scaling Laws for Compute-Optimal Model Design.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Tuning Computer Vision Models With Task Rewards.
Proceedings of the International Conference on Machine Learning, 2023


Sigmoid Loss for Language Image Pre-Training.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

FlexiViT: One Model for All Patch Sizes.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers.
Trans. Mach. Learn. Res., 2022

Underspecification Presents Challenges for Credibility in Modern Machine Learning.
J. Mach. Learn. Res., 2022

PaLI: A Jointly-Scaled Multilingual Language-Image Model.
CoRR, 2022

Simple Open-Vocabulary Object Detection with Vision Transformers.
CoRR, 2022

Better plain ViT baselines for ImageNet-1k.
CoRR, 2022

Revisiting Neural Scaling Laws in Language and Vision.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

UViM: A Unified Modeling Approach for Vision with Learned Guiding Codes.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022


A Simple Single-Scale Vision Transformer for Object Detection and Instance Segmentation.
Proceedings of the Computer Vision - ECCV 2022, 2022

LiT: Zero-Shot Transfer with Locked-image text Tuning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Scaling Vision Transformers.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Knowledge distillation: A good teacher is patient and consistent.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021
A Simple Single-Scale Vision Transformer for Object Localization and Instance Segmentation.
CoRR, 2021

SI-Score: An image dataset for fine-grained analysis of robustness to object location, rotation and size.
CoRR, 2021

Comparing Transfer and Meta Learning Approaches on a Unified Few-Shot Classification Benchmark.
CoRR, 2021

MLP-Mixer: An all-MLP Architecture for Vision.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Revisiting the Calibration of Modern Neural Networks.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

A Unified Few-Shot Classification Benchmark to Compare Transfer and Meta Learning Approaches.
Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, 2021

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale.
Proceedings of the 9th International Conference on Learning Representations, 2021

On Robustness and Transferability of Convolutional Neural Networks.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020
Are we done with ImageNet?
CoRR, 2020

Training General Representations for Remote Sensing Using in-Domain Knowledge.
Proceedings of the IEEE International Geoscience and Remote Sensing Symposium, 2020

Big Transfer (BiT): General Visual Representation Learning.
Proceedings of the Computer Vision - ECCV 2020, 2020

2019
Large Scale Learning of General Visual Representations for Transfer.
CoRR, 2019

In-domain representation learning for remote sensing.
CoRR, 2019

The Visual Task Adaptation Benchmark.
CoRR, 2019

S<sup>4</sup>L: Self-Supervised Semi-Supervised Learning.
CoRR, 2019

High-Fidelity Image Generation With Fewer Labels.
Proceedings of the 36th International Conference on Machine Learning, 2019

A Large-Scale Study on Regularization and Normalization in GANs.
Proceedings of the 36th International Conference on Machine Learning, 2019

S4L: Self-Supervised Semi-Supervised Learning.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Revisiting Self-Supervised Visual Representation Learning.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Self-Supervised GANs via Auxiliary Rotation Loss.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018
Self-Supervised Generative Adversarial Networks.
CoRR, 2018

Self-Supervised GAN to Counter Forgetting.
CoRR, 2018

The GAN Landscape: Losses, Architectures, Regularization, and Normalization.
CoRR, 2018

MemGEN: Memory is All You Need.
CoRR, 2018

2016
Semi-Supervised Cross-Media Feature Learning With Unified Patch Graph Regularization.
IEEE Trans. Circuits Syst. Video Technol., 2016

2014
Learning Cross-Media Joint Representation With Sparse and Semisupervised Regularization.
IEEE Trans. Circuits Syst. Video Technol., 2014

2013
Cross-media retrieval by intra-media and inter-media correlation mining.
Multim. Syst., 2013

PKU_ICST at TRECVID2013 : Instance Search Task.
Proceedings of the 2013 TREC Video Retrieval Evaluation, 2013

Cross-media retrieval by cluster-based correlation analysis.
Proceedings of the IEEE International Conference on Image Processing, 2013

Heterogeneous Metric Learning with Joint Graph Regularization for Cross-Media Retrieval.
Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence, 2013

2012
PKU-ICST @TRECVID2012: Known-item Search Task.
Proceedings of the 2012 TREC Video Retrieval Evaluation, 2012

Effective Heterogeneous Similarity Measure with Nearest Neighbors for Cross-Media Retrieval.
Proceedings of the Advances in Multimedia Modeling - 18th International Conference, 2012

PDSS: patch-descriptor-similarity space for effective face verification.
Proceedings of the 20th ACM Multimedia Conference, MM '12, Nara, Japan, October 29, 2012

Tri-space and ranking based heterogeneous similarity measure for cross-media retrieval.
Proceedings of the 21st International Conference on Pattern Recognition, 2012

Cross-modality correlation propagation for cross-media retrieval.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

2009
PKU-ICST at TRECVID2009: High Level Feature Extraction and Search.
Proceedings of the TRECVID 2009 workshop participants notebook papers, 2009

2006
Adaptive Control Based on Recurrent Fuzzy Wavelet Neural Network and Its Application on Robotic Tracking Control.
Proceedings of the Advances in Neural Networks - ISNN 2006, Third International Symposium on Neural Networks, Chengdu, China, May 28, 2006


  Loading...