Xitong Yang

Orcid: 0000-0003-4372-241X

According to our database1, Xitong Yang authored at least 31 papers between 2015 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Video ReCap: Recursive Captioning of Hour-Long Videos.
CoRR, 2024

2023
Integrated immunological analysis of single-cell and bulky tissue transcriptomes reveals the role of interactions between M0 macrophages and naïve CD4<sup>+</sup> T cells in the immunosuppressive microenvironment of cervical cancer.
Comput. Biol. Medicine, September, 2023

Building an Open-Vocabulary Video CLIP Model with Better Architectures, Optimization and Data.
CoRR, 2023

MINOTAUR: Multi-task Video Grounding From Multimodal Queries.
CoRR, 2023

Transforming CLIP to an Open-vocabulary Video Model via Interpolated Weight Optimization.
CoRR, 2023

Open-VCLIP: Transforming CLIP to an Open-vocabulary Video Model via Interpolated Weight Optimization.
Proceedings of the International Conference on Machine Learning, 2023

Relational Space-Time Query in Long-Form Videos.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Vision Transformers are Good Mask Auto-Labelers.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Towards Scalable Neural Representation for Diverse Videos.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
Semi-supervised Vision Transformers.
Proceedings of the Computer Vision - ECCV 2022, 2022

Efficient Video Transformers with Spatial-Temporal Token Selection.
Proceedings of the Computer Vision - ECCV 2022, 2022

ASM-Loc: Action-aware Segment Modeling for Weakly-Supervised Temporal Action Localization.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021
Long-Term Temporal Modeling for video Action Understanding.
PhD thesis, 2021

Efficient Video Transformers with Spatial-Temporal Token Selection.
CoRR, 2021

Beyond Short Clips: End-to-End Video-Level Learning With Collaborative Memories.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Hierarchical Contrastive Motion Learning for Video Action Recognition.
Proceedings of the 32nd British Machine Vision Conference 2021, 2021

GTA: Global Temporal Attention for Video Action Understanding.
Proceedings of the 32nd British Machine Vision Conference 2021, 2021

2020
A Generic Visualization Approach for Convolutional Neural Networks.
Proceedings of the Computer Vision - ECCV 2020, 2020

2019
An Interactive Greedy Approach to Group Sparsity in High Dimensions.
Technometrics, 2019

Exploring Uncertainty in Conditional Multi-Modal Retrieval Systems.
CoRR, 2019

Cross-X Learning for Fine-Grained Visual Categorization.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

STEP: Spatio-Temporal Progressive Learning for Video Action Detection.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018
Deep Temporal Multimodal Fusion for Medical Procedure Monitoring Using Wearable Sensors.
IEEE Trans. Multim., 2018

Two Stream Self-Supervised Learning for Action Recognition.
CoRR, 2018

The Effectiveness of Instance Normalization: a Strong Baseline for Single Image Dehazing.
CoRR, 2018

Strong Baseline for Single Image Dehazing with Deep Features and Instance Normalization.
Proceedings of the British Machine Vision Conference 2018, 2018

Towards Perceptual Image Dehazing by Physics-Based Disentanglement and Adversarial Training.
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

2017
Tracking Illicit Drug Dealing and Abuse on Instagram Using Multimodal Analysis.
ACM Trans. Intell. Syst. Technol., 2017

Deep Multimodal Representation Learning from Temporal Data.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

2015
Pinterest Board Recommendation for Twitter Users.
Proceedings of the 23rd Annual ACM Conference on Multimedia Conference, MM '15, Brisbane, Australia, October 26, 2015

Semantic Video Entity Linking Based on Visual Content and Metadata.
Proceedings of the 2015 IEEE International Conference on Computer Vision, 2015


  Loading...