Xing Sun

Orcid: 0000-0001-8132-9083

Affiliations:
  • Tencent Youtu Lab, Shanghai, China
  • University of Hong Kong, Department of Electrical and Electronic Engineering, Hong Kong (PhD 2016)


According to our database1, Xing Sun authored at least 123 papers between 2016 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
ASPD: Unlocking Adaptive Serial-Parallel Decoding by Exploring Intrinsic Parallelism in LLMs.
CoRR, August, 2025

SinKD: Sinkhorn Distance Minimization for Knowledge Distillation.
IEEE Trans. Neural Networks Learn. Syst., July, 2025

DREAM: Document Reconstruction via End-to-end Autoregressive Model.
CoRR, July, 2025

DeepTalk: Towards Seamless and Smart Speech Interaction with Adaptive Modality-Specific MoE.
CoRR, June, 2025

Incentivizing Reasoning for Advanced Instruction-Following of Large Language Models.
CoRR, June, 2025

TACO: Think-Answer Consistency for Optimized Long-Chain Reasoning and Efficient Data Learning via Reinforcement Learning in LVLMs.
CoRR, May, 2025

VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model.
CoRR, May, 2025

LLaVA-RadZ: Can Multimodal Large Language Models Effectively Tackle Zero-shot Radiology Recognition?
CoRR, March, 2025

FlowAgent: Achieving Compliance and Flexibility for Workflow Agents.
CoRR, February, 2025

Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuracy.
CoRR, February, 2025

LUCY: Linguistic Understanding and Control Yielding Early Stage of Her.
CoRR, January, 2025

VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction.
CoRR, January, 2025

Unleashing the Power of Data Tsunami: A Comprehensive Survey on Data Assessment and Selection for Instruction Tuning of Language Models.
Trans. Mach. Learn. Res., 2025

Distilling consistent relations for multi-source domain adaptive person re-identification.
Pattern Recognit., 2025

Learning Interleaved Image-Text Comprehension in Vision-Language Large Models.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

RocketEval: Efficient automated LLM evaluation via grading checklist.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Knowledge Transfer Across Modalities for Weakly Supervised Point Cloud Semantic Segmentation.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

FIPO: Free-form Instruction-oriented Prompt Optimization with Preference Dataset and Modular Fine-tuning Schema.
Proceedings of the 31st International Conference on Computational Linguistics, 2025

RoleMRC: A Fine-Grained Composite Benchmark for Role-Playing and Instruction-Following.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

Tell Me What You Don't Know: Enhancing Refusal Capabilities of Role-Playing Agents via Representation Space Analysis and Editing.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

Probability-Density-aware Semi-supervised Learning.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024
Turning a CLIP Model Into a Scene Text Spotter.
IEEE Trans. Pattern Anal. Mach. Intell., September, 2024

Multi-dataset Detection with Transformers.
Int. J. Comput. Vis., July, 2024

T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs.
CoRR, 2024

MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs.
CoRR, 2024

Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM.
CoRR, 2024

Tell Me What You Don't Know: Enhancing Refusal Capabilities of Role-Playing Agents via Representation Space Analysis and Editing.
CoRR, 2024

CJEval: A Benchmark for Assessing Large Language Models Using Chinese Junior High School Exam Data.
CoRR, 2024

Leveraging Open Knowledge for Advancing Task Expertise in Large Language Models.
CoRR, 2024

VITA: Towards Open-Source Interactive Omni Multimodal LLM.
CoRR, 2024

VEGA: Learning Interleaved Image-Text Comprehension in Vision-Language Large Models.
CoRR, 2024

FinVerse: An Autonomous Agent System for Versatile Financial Analysis.
CoRR, 2024

Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis.
CoRR, 2024

RESTORE: Towards Feature Shift for Vision-Language Prompt Learning.
CoRR, 2024

FIPO: Free-form Instruction-oriented Prompt Optimization with Preference Dataset and Modular Fine-tuning Schema.
CoRR, 2024

Woodpecker: hallucination correction for multimodal large language models.
Sci. China Inf. Sci., 2024

Multimodal Inplace Prompt Tuning for Open-set Object Detection.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Cantor: Inspiring Multimodal Chain-of-Thought of MLLM.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Eliminating Biased Length Reliance of Direct Preference Optimization via Down-Sampled KL Divergence.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Multimodal Label Relevance Ranking via Reinforcement Learning.
Proceedings of the Computer Vision - ECCV 2024, 2024

Aligning and Prompting Everything All at Once for Universal Visual Perception.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

HRVDA: High-Resolution Visual Document Assistant.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Enhancing Visual Document Understanding with Contrastive Learning in Large Visual-Language Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

A General and Efficient Training for Transformer via Token Expansion.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Sinkhorn Distance Minimization for Knowledge Distillation.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

Visual Hallucination Elevates Speech Recognition.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

Grab What You Need: Rethinking Complex Table Structure Recognition with Flexible Components Deliberation.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

SPD-DDPM: Denoising Diffusion Probabilistic Models in the Symmetric Positive Definite Space.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

SoftCLIP: Softer Cross-Modal Alignment Makes CLIP Stronger.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Reciprocal normalization for domain adaptation.
Pattern Recognit., August, 2023

Co-Salient Object Detection With Co-Representation Purification.
IEEE Trans. Pattern Anal. Mach. Intell., July, 2023

A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise.
CoRR, 2023

MMICT: Boosting Multi-Modal Fine-Tuning with In-Context Examples.
CoRR, 2023

Towards Robust Text Retrieval with Progressive Learning.
CoRR, 2023

Unified and Dynamic Graph for Temporal Character Grouping in Long Videos.
CoRR, 2023

MemoChat: Tuning LLMs to Use Memos for Consistent Long-Range Open-Domain Conversation.
CoRR, 2023

A Survey on Multimodal Large Language Models.
CoRR, 2023

MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models.
CoRR, 2023

Looking and Listening: Audio Guided Text Recognition.
CoRR, 2023

SoftCLIP: Softer Cross-modal Alignment Makes CLIP Stronger.
CoRR, 2023

Graph-Based Self-Learning for Robust Person Re-identification.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023

CAPro: Webly Supervised Learning with Cross-modality Aligned Prototypes.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Mitigating Memorization of Noisy Labels via Regularization between Representations.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

ICDAR 2023 Competition on Structured Text Extraction from Visually-Rich Document Images.
Proceedings of the Document Analysis and Recognition - ICDAR 2023, 2023

Coarse-to-Fine: Learning Compact Discriminative Representation for Single-Stage Image Retrieval.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

D3G: Exploring Gaussian Prior for Temporal Sentence Grounding with Glance Annotation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Span-level Aspect-based Sentiment Analysis via Table Filling.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

2022
Conditional Feature Embedding by Visual Clue Correspondence Graph for Person Re-Identification.
IEEE Trans. Image Process., 2022

Conditional Feature Learning Based Transformer for Text-Based Person Search.
IEEE Trans. Image Process., 2022

Self-supervised Models are Good Teaching Assistants for Vision Transformers.
Proceedings of the International Conference on Machine Learning, 2022

PAC-Net: Highlight Your Video via History Preference Modeling.
Proceedings of the Computer Vision - ECCV 2022, 2022

DisCo: Remedying Self-supervised Learning on Lightweight Models with Distilled Contrastive Learning.
Proceedings of the Computer Vision - ECCV 2022, 2022

Efficient Decoder-Free Object Detection with Transformers.
Proceedings of the Computer Vision - ECCV 2022, 2022

Training-free Transformer Architecture Search.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

DIFNet: Boosting Visual Information Flow for Image Captioning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Evo-ViT: Slow-Fast Token Evolution for Dynamic Vision Transformer.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
High-Dimensional Dense Residual Convolutional Neural Network for Light Field Reconstruction.
IEEE Trans. Pattern Anal. Mach. Intell., 2021

Learning fused features with parallel training for person re-identification.
Knowl. Based Syst., 2021

RMNet: Equivalently Removing Residual Connection from Networks.
CoRR, 2021

Demystifying How Self-Supervised Features Improve Training from Noisy Labels.
CoRR, 2021

DisCo: Remedy Self-supervised Learning on Lightweight Models with Distilled Contrastive Learning.
CoRR, 2021

On Evolving Attention Towards Domain Adaptation.
CoRR, 2021

Part2Whole: Iteratively Enrich Detail for Cross-Modal Retrieval with Partial Query.
CoRR, 2021

On The Consistency Training for Open-Set Semi-Supervised Learning.
CoRR, 2021

Contextual Non-Local Alignment over Full-Scale Representation for Text-Based Person Search.
CoRR, 2021

Image generation and constrained two-stage feature fusion for person re-identification.
Appl. Intell., 2021

Discriminator-free Generative Adversarial Attack.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Dig into Multi-modal Cues for Video Retrieval with Hierarchical Alignment.
Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, 2021

Integrated Modalities And Multi-Level Granularity: Towards A Unified Video-Text Retrieval Framework.
Proceedings of the 2021 IEEE International Conference on Multimedia & Expo Workshops, 2021

Learning with Instance-Dependent Label Noise: A Sample Sieve Approach.
Proceedings of the 9th International Conference on Learning Representations, 2021

Learning to Know Where to See: A Visibility-Aware Approach for Occluded Person Re-identification.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Learning Canonical View Representation for 3D Shape Recognition with Arbitrary Views.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

PR-Net: Preference Reasoning for Personalized Video Highlight Detection.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Ask&Confirm: Active Detail Enriching for Cross-Modal Retrieval with Partial Query.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Temporal Modulation Network for Controllable Space-Time Video Super-Resolution.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Removing the Background by Adding the Background: Towards Background Robust Self-Supervised Video Representation Learning.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Learning 3D Shape Feature for Texture-Insensitive Person Re-Identification.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

One for More: Selecting Generalizable Samples for Generalizable ReID Model.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning.
CoRR, 2020

Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion.
CoRR, 2020

Devil's in the Detail: Graph-based Key-point Alignment and Embedding for Person Re-ID.
CoRR, 2020

DGD: Densifying the Knowledge of Neural Networks with Filter Grafting and Knowledge Distillation.
CoRR, 2020

Pruning Filter in Filter.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

NOH-NMS: Improving Pedestrian Detection by Nearby Objects Hallucination.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Do Not Disturb Me: Person Re-identification Under the Interference of Other Pedestrians.
Proceedings of the Computer Vision - ECCV 2020, 2020

Filter Grafting for Deep Neural Networks.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Viewpoint-Aware Loss with Angular Regularization for Person Re-Identification.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

Asymmetric Co-Teaching for Unsupervised Cross-Domain Person Re-Identification.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

Rethinking Temporal Fusion for Video-Based Person Re-Identification on Semantic and Time Aspect.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019
Computational Light Field Generation Using Deep Nonparametric Bayesian Learning.
IEEE Access, 2019

The Seventh Visual Object Tracking VOT2019 Challenge Results.
, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshops, 2019

Pyramidal Person Re-IDentification via Multi-Loss Dynamic Training.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018
A Coarse-to-fine Pyramidal Model for Person Re-identification via Multi-Loss Dynamic Training.
CoRR, 2018

2017
Computationally Efficient Hyperspectral Data Learning Based on the Doubly Stochastic Dirichlet Process.
IEEE Trans. Geosci. Remote. Sens., 2017

Human arm pose modeling with learned features using joint convolutional neural network.
Mach. Vis. Appl., 2017

2016
Unsupervised Tracking With the Doubly Stochastic Dirichlet Process Mixture Model.
IEEE Trans. Intell. Transp. Syst., 2016

Consistency Analysis for the Doubly Stochastic Dirichlet Process.
CoRR, 2016

Data-driven light field depth estimation using deep Convolutional Neural Networks.
Proceedings of the 2016 International Joint Conference on Neural Networks, 2016

Sparse Hierarchical Nonparametric Bayesian learning for light field representation and denoising.
Proceedings of the 2016 International Joint Conference on Neural Networks, 2016

Unsupervised tracking with a low computational cost using the doubly stochastic Dirichlet process mixture model.
Proceedings of the Image Processing: Machine Vision Applications IX, 2016


  Loading...