Xin Wen

Orcid: 0000-0003-3898-0406

Affiliations:
  • University of Hong Kong, CVMI Lab, Hong Kong
  • Tongji University, Shanghai, China (former)


According to our database1, Xin Wen authored at least 26 papers between 2020 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

Online presence:

On csauthors.net:

Bibliography

2026
Vision Foundation Models as Generalist Tokenizers for Image Generation.
CoRR, May, 2026

ComSim: Building Scalable Real-World Robot Data Generation via Compositional Simulation.
CoRR, April, 2026

Referring-Aware Visuomotor Policy Learning for Closed-Loop Manipulation.
CoRR, April, 2026

TouchGuide: Inference-Time Steering of Visuomotor Policies via Touch Guidance.
CoRR, January, 2026

2025
Vision Foundation Models as Effective Visual Tokenizers for Autoregressive Image Generation.
CoRR, July, 2025

Equipping Vision Foundation Model with Mixture of Experts for Out-of-Distribution Detection.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

"Principal Components" Enable a New Language of Images.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Learning from Neighbors: Category Extrapolation for Long-Tail Learning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

A Data-Centric Revisit of Pre-Trained Vision Models for Robot Learning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024
Granularity Matters in Long-Tail Learning.
CoRR, 2024

Generalization Beyond Data Imbalance: A Controlled Study on CLIP for Transferable Insights.
CoRR, 2024

What Makes CLIP More Robust to Long-Tailed Pre-Training Data? A Controlled Study for Transferable Insights.
Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Can OOD Object Detectors Learn from Foundation Models?
Proceedings of the Computer Vision - ECCV 2024, 2024

What If the TV was off? Examining Counterfactual Reasoning Abilities of Multi-modal Language Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Towards Large-Scale 3D Representation Learning with Multi-Dataset Point Prompt Training.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Classes Are Not Equal: An Empirical Study on Image Recognition Fairness.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
CoDet: Co-occurrence Guided Region-Word Alignment for Open-Vocabulary Object Detection.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

What If the TV Was Off? Examining Counterfactual Reasoning Abilities of Multi-modal Language Models.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Learning Semi-supervised Gaussian Mixture Models for Generalized Category Discovery.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Parametric Classification for Generalized Category Discovery: A Baseline Study.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Masked Scene Contrast: A Scalable Framework for Unsupervised 3D Representation Learning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
A Simple Parametric Classification Baseline for Generalized Category Discovery.
CoRR, 2022

Self-Supervised Visual Representation Learning with Semantic Grouping.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

2021
Temporal Context Aggregation for Video Retrieval with Contrastive Learning.
Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2021

2020
Context Encoding for Video Retrieval with Contrastive Learning.
CoRR, 2020

Distilling Visual Priors from Self-Supervised Learning.
Proceedings of the Computer Vision - ECCV 2020 Workshops, 2020


  Loading...