Xiyang Dai

Orcid: 0009-0001-3066-7098

According to our database¹, Xiyang Dai authored at least 66 papers between 2016 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

HyCTAS: Multi-objective hybrid convolution-transformer architecture search for real-time image segmentation.

[BibT_eX]

[DOI]

Neurocomputing, 2026

LLM2CLIP: Powerful Language Model Unlocks Richer Cross-Modality Representation.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025

OmniTracker: Unifying Visual Object Tracking by Tracking-With-Detection.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., April, 2025

Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs.

[BibT_eX]

[DOI]

Abdelrahman Abouelenin

CoRR, March, 2025

Exploring Invariance in Images through One-way Wave Equations.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

ProLongVid: A Simple but Strong Baseline for Long-context Video Instruction Tuning.

[BibT_eX]

[DOI]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

2024

LLM2CLIP: Powerful Language Model Unlocks Richer Visual Representation.

[BibT_eX]

[DOI]

CoRR, 2024

Real-Time Image Segmentation via Hybrid Convolutional-Transformer Architecture Search.

[BibT_eX]

[DOI]

CoRR, 2024

Data-Augmentation Based CBAM-ResNet-GCN Method for Unbalance Fault Diagnosis of Rotating Machinery.

[BibT_eX]

[DOI]

IEEE Access, 2024

DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Efficient Modulation for Vision Networks.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Rewrite the Stars.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

On the Hidden Waves of Image.

[BibT_eX]

[DOI]

CoRR, 2023

Image is First-order Norm+Linear Autoregressive.

[BibT_eX]

[DOI]

CoRR, 2023

ChatVideo: A Tracklet-centric Multimodal and Versatile Video Understanding System.

[BibT_eX]

[DOI]

CoRR, 2023

OmniTracker: Unifying Object Tracking by Tracking-with-Detection.

[BibT_eX]

[DOI]

CoRR, 2023

Learning from Rich Semantics and Coarse Locations for Long-tailed Object Detection.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Layer Grafted Pre-training: Bridging Contrastive Learning And Masked Image Modeling For Label-Efficient Representations.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

LACMA: Language-Aligning Contrastive Learning with Meta-Actions for Embodied Instruction Following.

[BibT_eX]

[DOI]

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Generalized Decoding for Pixel, Image, and Language.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Look Before You Match: Instance Understanding Matters in Video Object Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Detection Hub: Unifying Object Detection Datasets via Query Adaptation on Language Embedding.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

Self-Supervised Learning based on Heat Equation.

[BibT_eX]

[DOI]

CoRR, 2022

Video Mobile-Former: Video Recognition with Efficient Global Spatial-temporal Modeling.

[BibT_eX]

[DOI]

CoRR, 2022

Should All Proposals be Treated Equally in Object Detection?

[BibT_eX]

[DOI]

CoRR, 2022

Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks.

[BibT_eX]

[DOI]

CoRR, 2022

Residual Mixture of Experts.

[BibT_eX]

[DOI]

CoRR, 2022

CLIP-TD: CLIP Targeted Distillation for Vision-Language Tasks.

[BibT_eX]

[DOI]

CoRR, 2022

GLIPv2: Unifying Localization and Vision-Language Understanding.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Focal Modulation Networks.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Visual Clues: Bridging Vision and Language Foundations for Image Paragraph Captioning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Efficient Self-supervised Vision Transformers for Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Learning Representations, 2022

Should All Proposals Be Treated Equally in Object Detection?

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

RegionCLIP: Region-based Language-Image Pretraining.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

BEVT: BERT Pretraining of Video Transformers.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Reduce Information Loss in Transformers for Pluralistic Image Inpainting.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Mobile-Former: Bridging MobileNet and Transformer.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021

Florence: A New Foundation Model for Computer Vision.

[BibT_eX]

[DOI]

CoRR, 2021

UFO: A UniFied TransfOrmer for Vision-Language Representation Learning.

[BibT_eX]

[DOI]

CoRR, 2021

Focal Self-attention for Local-Global Interactions in Vision Transformers.

[BibT_eX]

[DOI]

CoRR, 2021

Weak NAS Predictors Are All You Need.

[BibT_eX]

[DOI]

CoRR, 2021

Focal Attention for Long-Range Interactions in Vision Transformers.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Stronger NAS with Weaker Predictors.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Revisiting Dynamic Convolution via Matrix Decomposition.

[BibT_eX]

[DOI]

Proceedings of the 9th International Conference on Learning Representations, 2021

Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

CvT: Introducing Convolutions to Vision Transformers.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

MicroNet: Improving Image Recognition with Extremely Low FLOPs.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Dynamic DETR: End-to-End Object Detection with Dynamic Attention.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Dynamic Head: Unifying Object Detection Heads With Attentions.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020

MicroNet: Towards Image Recognition with Extremely Low FLOPs.

[BibT_eX]

[DOI]

CoRR, 2020

DA-NAS: Data Adapted Pruning for Efficient Neural Architecture Search.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

Dynamic ReLU.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

METAL: Minimum Effort Temporal Activity Localization in Untrimmed Videos.

[BibT_eX]

[DOI]

Da Zhang

Xiyang Dai

Yuan-Fang Wang

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Dynamic Convolution: Attention Over Convolution Kernels.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019

TAN: Temporal Aggregation Network for Dense Multi-Label Action Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2019

MAN: Moment Alignment Network for Natural Language Moment Retrieval via Iterative Graph Adjustment.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018

Modeling Deep Context in Spatial and Temporal Domain.

[BibT_eX]

[DOI]

Xiyang Dai

PhD thesis, 2018

Deep Motion Boundary Detection.

[BibT_eX]

[DOI]

CoRR, 2018

S3D: Single Shot multi-Span Detector via Fully 3D Convolutional Networks.

[BibT_eX]

[DOI]

Proceedings of the British Machine Vision Conference 2018, 2018

Dynamic Temporal Pyramid Network: A Closer Look at Multi-scale Modeling for Activity Detection.

[BibT_eX]

[DOI]

Da Zhang

Xiyang Dai

Yuan-Fang Wang

Proceedings of the Computer Vision - ACCV 2018, 2018

2017

Efficient Fine-Grained Classification and Part Localization Using One Compact Network.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Computer Vision Workshops, 2017

Temporal Context Network for Activity Localization in Videos.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Computer Vision, 2017

FASON: First and Second Order Information Fusion Network for Texture Recognition.

[BibT_eX]

[DOI]

Xiyang Dai

Joe Yue-Hei Ng

Larry S. Davis

Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

2016

Parameterizing Region Covariance: An Efficient Way To Apply Sparse Codes On Second Order Statistics.

[BibT_eX]

[DOI]

CoRR, 2016

Xiyang Dai

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...