Bin Xiao

Orcid: 0000-0001-6477-5911

Affiliations:
  • Microsoft Cloud+AI, Microsoft Research Asia, China
  • South China University of Technology, School of Electronic and Information Engineering, China


According to our database1, Bin Xiao authored at least 28 papers between 2014 and 2023.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2023
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks.
CoRR, 2023

i-Code V2: An Autoregressive Generation Framework over Vision, Language, and Speech Data.
CoRR, 2023

i-Code: An Integrative and Composable Multimodal Learning Framework.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks.
CoRR, 2022

CLIP-TD: CLIP Targeted Distillation for Vision-Language Tasks.
CoRR, 2022

Efficient Self-supervised Vision Transformers for Representation Learning.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Learning Visual Representation from Modality-Shared Contrastive Language-Image Pre-training.
Proceedings of the Computer Vision - ECCV 2022, 2022

TinyViT: Fast Pretraining Distillation for Small Vision Transformers.
Proceedings of the Computer Vision - ECCV 2022, 2022

DaViT: Dual Attention Vision Transformers.
Proceedings of the Computer Vision, 2022

MiniViT: Compressing Vision Transformers with Weight Multiplexing.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Unified Contrastive Learning in Image-Text-Label Space.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021
Deep High-Resolution Representation Learning for Visual Recognition.
IEEE Trans. Pattern Anal. Mach. Intell., 2021

Florence: A New Foundation Model for Computer Vision.
CoRR, 2021

Focal Self-attention for Local-Global Interactions in Vision Transformers.
CoRR, 2021

Focal Attention for Long-Range Interactions in Vision Transformers.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

CvT: Introducing Convolutions to Vision Transformers.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Lite-HRNet: A Lightweight High-Resolution Network.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Bottom-Up Human Pose Estimation via Disentangled Keypoint Regression.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Dynamic Head: Unifying Object Detection Heads With Attentions.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020
Bottom-Up Human Pose Estimation by Ranking Heatmap-Guided Adaptive Keypoint Estimates.
CoRR, 2020

HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

3D Human Pose Estimation via Explicit Compositional Depth Maps.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019
Bottom-up Higher-Resolution Networks for Multi-Person Pose Estimation.
CoRR, 2019

High-Resolution Representations for Labeling Pixels and Regions.
CoRR, 2019

Deep High-Resolution Representation Learning for Human Pose Estimation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018
Simple Baselines for Human Pose Estimation and Tracking.
Proceedings of the Computer Vision - ECCV 2018, 2018

2014
Mariana: Tencent Deep Learning Platform and its Applications.
Proc. VLDB Endow., 2014


  Loading...