Xingjian He

Orcid: 0000-0001-5396-6253

According to our database¹, Xingjian He authored at least 43 papers between 2017 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

When Robots Do the Chores: A Benchmark and Agent for Long-Horizon Household Task Execution.

[BibT_eX]

[DOI]

CoRR, May, 2026

Thinking in Streaming Video.

[BibT_eX]

[DOI]

CoRR, March, 2026

Generalized referring expression segmentation driven by instance-oriented queries.

[BibT_eX]

[DOI]

Pattern Recognit., 2026

UrbanNav: Learning Language-Guided Embodied Urban Navigation from Web-Scale Human Trajectories.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025

UrbanNav: Learning Language-Guided Urban Navigation from Web-Scale Human Trajectories.

[BibT_eX]

[DOI]

CoRR, December, 2025

Hierarchical Contrastive Learning for Semantic Segmentation.

[BibT_eX]

[DOI]

IEEE Trans. Neural Networks Learn. Syst., June, 2025

Towards Unified Referring Expression Segmentation Across Omni-Level Visual Target Granularities.

[BibT_eX]

[DOI]

CoRR, April, 2025

VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., February, 2025

VLAB: Enhancing Video Language Pretraining by Feature Adapting and Blending.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2025

CGViT: Cross-image GroupViT for zero-shot semantic segmentation.

[BibT_eX]

[DOI]

Pattern Recognit., 2025

GroundingMate: Aiding Object Grounding for Goal-Oriented Vision-and-Language Navigation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2025

MiniVLN: Efficient Vision-and-Language Navigation by Progressive Knowledge Distillation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Robotics and Automation, 2025

ViPE: Visual Perception in Parameter Space for Efficient Video-Language Understanding.

[BibT_eX]

[DOI]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

2024

CM-MaskSD: Cross-Modality Masked Self-Distillation for Referring Image Segmentation.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2024

The Instance-centric Transformer for the RVOS Track of LSVOS Challenge: 3rd Place Solution.

[BibT_eX]

[DOI]

CoRR, 2024

2nd Place Solution for MeViS Track in CVPR 2024 PVUW Workshop: Motion Expression guided Video Segmentation.

[BibT_eX]

[DOI]

CoRR, 2024

Fuse & Calibrate: A bi-directional Vision-Language Guided Framework for Referring Image Segmentation.

[BibT_eX]

[DOI]

CoRR, 2024

Calibration & Reconstruction: Deep Integrated Language for Referring Image Segmentation.

[BibT_eX]

[DOI]

CoRR, 2024

Calibration & Reconstruction: Deeply Integrated Language for Referring Image Segmentation.

[BibT_eX]

[DOI]

Proceedings of the 2024 International Conference on Multimedia Retrieval, 2024

COSA: Concatenated Sample Pretrained Vision-Language Foundation Model.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Fuse and Calibrate: A Bi-directional Vision-Language Guided Framework for Referring Image Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Advanced Intelligent Computing Technology and Applications, 2024

PVUW 2024 Challenge on Complex Video Understanding: Methods and Results.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024 Workshops, 2024

LSVOS Challenge Report: Large-Scale Complex and Long Video Object Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024 Workshops, 2024

SC- Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Unveiling Parts Beyond Objects: Towards Finer-Granularity Referring Expression Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Beyond Literal Descriptions: Understanding and Locating Open-World Objects Aligned with Human Intentions.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2024

2023

EAVL: Explicitly Align Vision and Language for Referring Image Segmentation.

[BibT_eX]

[DOI]

CoRR, 2023

MMNet: Multi-Mask Network for Referring Image Segmentation.

[BibT_eX]

[DOI]

CoRR, 2023

VLAB: Enhancing Video Language Pre-training by Feature Adapting and Blending.

[BibT_eX]

[DOI]

CoRR, 2023

CM-MaskSD: Cross-Modality Masked Self-Distillation for Referring Image Segmentation.

[BibT_eX]

[DOI]

CoRR, 2023

MAMO: Fine-Grained Vision-Language Representations Learning with Masked Multimodal Modeling.

[BibT_eX]

[DOI]

Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2023

Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

CSDNet: Contrastive Similarity Distillation Network for Multi-lingual Image-Text Retrieval.

[BibT_eX]

[DOI]

Proceedings of the Image and Graphics - 12th International Conference, 2023

WL-MSR: Watch and Listen for Multimodal Subtitle Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

2022

An Efficient Sampling-Based Attention Network for Semantic Segmentation.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2022

MAMO: Masked Multimodal Modeling for Fine-Grained Vision-Language Representation Learning.

[BibT_eX]

[DOI]

CoRR, 2022

2021

Exploiting Spatial-Temporal Semantic Consistency for Video Scene Parsing.

[BibT_eX]

[DOI]

CoRR, 2021

Global-Local Propagation Network for RGB-D Semantic Segmentation.

[BibT_eX]

[DOI]

CoRR, 2021

Dynamic Warping Network for Semantic Video Segmentation.

[BibT_eX]

[DOI]

Complex., 2021

Consistent-Separable Feature Representation for Semantic Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020

Non-Autoregressive Image Captioning with Counterfactuals-Critical Multi-Agent Learning.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, 2020

2019

Image fusion method based on simultaneous sparse representation with non-subsampled contourlet transform.

[BibT_eX]

[DOI]

IET Comput. Vis., 2019

2017

Validation of the merged co-variation signal in interacting protein pairs by mirror-dendrogram.

[BibT_eX]

[DOI]

Int. J. Data Min. Bioinform., 2017

Xingjian He

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...