Xiaoshuai Sun

Orcid: 0000-0003-3912-9306

According to our database¹, Xiaoshuai Sun authored at least 274 papers between 2008 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Bibliography

2026

Evaluating and Mitigating Relationship Hallucinations in Large Vision-Language Models.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., June, 2026

Boosting Filter Optimization and Prompt-Guided Decoding for Mixed Degradation Image Restoration.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., June, 2026

An Extensive Benchmark for Single-Round and Multi-Round Instruction-Based Image Editing.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., May, 2026

Plan Before Search: Search Agents Need Plan.

[BibT_eX]

[DOI]

CoRR, May, 2026

Look on Demand: A Cognitive Scheduling Framework for Visual Evidence Acquisition in Multimodal Reasoning.

[BibT_eX]

[DOI]

CoRR, May, 2026

Evading Visual Aphasia: Contrastive Adaptive Semantic Token Pruning for Vision-Language Models.

[BibT_eX]

[DOI]

CoRR, May, 2026

Boosting Multi-Modal Large Language Model With Enhanced Visual Features.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., April, 2026

CoP: Chain of Perception for Referring 3D Instance Segmentation.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., April, 2026

Towards Parameter-Efficient Network Pruning with Re-Parameterized Adapter.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., April, 2026

Not All Attention is Needed: Parameter and Computation Efficient Tuning for Multi-modal Large Language Models via Effective Attention Skipping.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., March, 2026

Scaling the Long Video Understanding of Multimodal Large Language Models via Visual Memory Mechanism.

[BibT_eX]

[DOI]

CoRR, March, 2026

Persistent Story World Simulation with Continuous Character Customization.

[BibT_eX]

[DOI]

CoRR, March, 2026

Test-Time Computing for Referring Multimodal Large Language Models.

[BibT_eX]

[DOI]

CoRR, February, 2026

MICON-Bench: Benchmarking and Enhancing Multi-Image Context Image Generation in Unified Multimodal Models.

[BibT_eX]

[DOI]

CoRR, February, 2026

CSMCIR: CoT-Enhanced Symmetric Alignment with Memory Bank for Composed Image Retrieval.

[BibT_eX]

[DOI]

CoRR, January, 2026

3D-STMN++: Leveraging semantic proxies to enhance superpoint-text matching for 3D Referring Expression Segmentation.

[BibT_eX]

[DOI]

Pattern Recognit., 2026

Wavelet-based learning and optimized sampling for image deraining.

[BibT_eX]

[DOI]

Pattern Recognit., 2026

Domain incremental learning for object detection.

[BibT_eX]

[DOI]

Pattern Recognit., 2026

SFIR: Optimizing spatial and frequency domains for image restoration.

[BibT_eX]

[DOI]

Pattern Recognit., 2026

Zooming In on Fakes: A Novel Dataset for Localized AI-Generated Image Detection with Forgery Amplification Approach.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025

TraDiffusion: Trajectory-Based Training-Free Image Generation.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., December, 2025

Omni-Referring Image Segmentation.

[BibT_eX]

[DOI]

CoRR, December, 2025

M4-BLIP: Advancing Multi-Modal Media Manipulation Detection through Face-Enhanced Local Analysis.

[BibT_eX]

[DOI]

CoRR, December, 2025

Creating High-Quality 3D Content by Bridging the Gap between Text-to-2D and Text-to-3D Generation.

[BibT_eX]

[DOI]

ACM Trans. Multim. Comput. Commun. Appl., November, 2025

NICE: Improving Panoptic Narrative Detection and Segmentation With Cascading Collaborative Learning.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., October, 2025

Systematic Investigation of Sparse Perturbed Sharpness-Aware Minimization Optimizer.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., October, 2025

CIR-CoT: Towards Interpretable Composed Image Retrieval via End-to-End Chain-of-Thought Reasoning.

[BibT_eX]

[DOI]

CoRR, October, 2025

MoIL: Momentum Imitation Learning for Efficient Vision-Language Adaptation.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., July, 2025

RePrompt: Reasoning-Augmented Reprompting for Text-to-Image Generation via Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, May, 2025

Image Captioning via Dynamic Path Customization.

[BibT_eX]

[DOI]

IEEE Trans. Neural Networks Learn. Syst., April, 2025

Conditional Diffusion Models for Camouflaged and Salient Object Detection.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., April, 2025

JM3D & JM3D-LLM: Elevating 3D Representation With Joint Multi-Modal Cues.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., April, 2025

Correction: Continual Face Forgery Detection via Historical Distribution Preserving.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., April, 2025

Zooming In on Fakes: A Novel Dataset for Localized AI-Generated Image Detection with Forgery Amplification Approach.

[BibT_eX]

[DOI]

CoRR, April, 2025

An Efficient and Mixed Heterogeneous Model for Image Restoration.

[BibT_eX]

[DOI]

CoRR, April, 2025

Exploring the Collaborative Advantage of Low-level Information on Generalizable AI-Generated Image Detection.

[BibT_eX]

[DOI]

CoRR, April, 2025

Continual Face Forgery Detection via Historical Distribution Preserving.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., March, 2025

MLLM-Selector: Necessity and Diversity-driven High-Value Data Selection for Enhanced Visual Instruction Tuning.

[BibT_eX]

[DOI]

CoRR, March, 2025

ComfyGPT: A Self-Optimizing Multi-Agent System for Comprehensive ComfyUI Workflow Generation.

[BibT_eX]

[DOI]

CoRR, March, 2025

Grounded Chain-of-Thought for Multimodal Large Language Models.

[BibT_eX]

[DOI]

CoRR, March, 2025

Towards General Visual-Linguistic Face Forgery Detection(V2).

[BibT_eX]

[DOI]

CoRR, February, 2025

AdaFlow: Efficient Long Video Editing via Adaptive Attention Slimming And Keyframe Selection.

[BibT_eX]

[DOI]

CoRR, February, 2025

ME-FAS: Multimodal Text Enhancement for Cross-Domain Face Anti-Spoofing.

[BibT_eX]

[DOI]

IEEE Trans. Inf. Forensics Secur., 2025

M3ixup: A multi-modal data augmentation approach for image captioning.

[BibT_eX]

[DOI]

Pattern Recognit., 2025

Optical remote sensing image salient object detection via bidirectional cross-attention and attention restoration.

[BibT_eX]

[DOI]

Pattern Recognit., 2025

Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

HRSeg: High-Resolution Visual Perception and Enhancement for Reasoning Segmentation.

[BibT_eX]

[DOI]

Proceedings of the 33rd ACM International Conference on Multimedia, 2025

MIHBench: Benchmarking and Mitigating Multi-Image Hallucinations in Multimodal Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the 33rd ACM International Conference on Multimedia, 2025

InterID: Improving Multi-ID Interaction for Personalized Image Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Multimedia and Expo, 2025

Feast Your Eyes: Mixture-of-Resolution Adaptation for Multimodal Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

γ-MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Routing Experts: Learning to Route Dynamic Experts in Existing Multi-modal Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Aigi-Holmes: Towards Explainable and Generalizable AI-Generated Image Detection via Multimodal Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

FlashSloth : Lightning Multimodal Large Language Models via Embedded Visual Compression.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

ACL: Activating Capability of Linear Attention for Image Restoration.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Towards General Visual-Linguistic Face Forgery Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

IPDN: Image-enhanced Prompt Decoding Network for 3D Referring Expression Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

StoryWeaver: A Unified World Model for Knowledge-Enhanced Story Character Customization.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024

Towards Language-Guided Visual Recognition via Dynamic Convolutions.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., January, 2024

A Survivor in the Era of Large-Scale Pretraining: An Empirical Study of One-Stage Referring Expression Comprehension.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2024

Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings.

[BibT_eX]

[DOI]

CoRR, 2024

Mixed Degradation Image Restoration via Local Dynamic Optimization and Conditional Embedding.

[BibT_eX]

[DOI]

CoRR, 2024

Any-to-3D Generation via Hybrid Diffusion Supervision.

[BibT_eX]

[DOI]

CoRR, 2024

TraDiffusion: Trajectory-Based Training-Free Image Generation.

[BibT_eX]

[DOI]

CoRR, 2024

ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model.

[BibT_eX]

[DOI]

CoRR, 2024

Routing Experts: Learning to Route Dynamic Experts in Multi-modal Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

Evaluating and Analyzing Relationship Hallucinations in LVLMs.

[BibT_eX]

[DOI]

CoRR, 2024

DiffusionFace: Towards a Comprehensive Dataset for Diffusion-Based Face Forgery Analysis.

[BibT_eX]

[DOI]

CoRR, 2024

Not All Attention is Needed: Parameter and Computation Efficient Transfer Learning for Multi-modal Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

RG-SAN: Rule-Guided Spatial Awareness Network for End-to-End 3D Referring Expression Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

I2EBench: A Comprehensive Benchmark for Instruction-based Image Editing.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

DiffusionFake: Enhancing Generalization in Deepfake Detection via Guided Stable Diffusion.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

StealthDiffusion: Towards Evading Diffusion Forensic Detection through Diffusion Model.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

3D-GRES: Generalized 3D Referring Expression Segmentation.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Deep Instruction Tuning for Segment Anything Model.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

QueryMatch: A Query-based Contrastive Learning Framework for Weakly Supervised Visual Grounding.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

SAM as the Guide: Mastering Pseudo-Label Refinement in Semi-Supervised Referring Expression Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Evaluating and Analyzing Relationship Hallucinations in Large Vision-Language Models.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

X-Oscar: A Progressive Framework for High-quality Text-guided 3D Animatable Avatar Generation.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Fast Text-to-3D-Aware Face Generation and Manipulation via Direct Cross-modal Mapping and Geometric Regularization.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Towards Omni-supervised Referring Expression Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Multimedia and Expo, 2024

AnyTrans: Translate AnyText in the Image with Large Scale Models.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

Exploring Phrase-Level Grounding with Text-to-Image Diffusion Model.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Multi-branch Collaborative Learning Network for 3D Visual Grounding.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Towards Efficient Diffusion-Based Image Editing with Instant Attention Masks.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

3D-STMN: Dependency-Driven Superpoint-Text Matching Network for End-to-End 3D Referring Expression Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

Toward Open-Set Human Object Interaction Detection.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

X-RefSeg3D: Enhancing Referring 3D Instance Segmentation via Structured Cross-Modal Graph Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

Improving Panoptic Narrative Grounding by Harnessing Semantic Relationships and Visual Confirmation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

Towards local visual modeling for image captioning.

[BibT_eX]

[DOI]

Pattern Recognit., June, 2023

A Real-Time Global Inference Network for One-Stage Referring Expression Comprehension.

[BibT_eX]

[DOI]

IEEE Trans. Neural Networks Learn. Syst., 2023

Fast Monocular Depth Estimation via Side Prediction Aggregation with Continuous Spatial Refinement.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2023

Knowing What it is: Semantic-Enhanced Dual Attention Transformer.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2023

Multi-Branch Distance-Sensitive Self-Attention Network for Image Captioning.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2023

X-Dreamer: Creating High-quality 3D Content by Bridging the Domain Gap Between Text-to-2D and Text-to-3D Generation.

[BibT_eX]

[DOI]

CoRR, 2023

NICE: Improving Panoptic Narrative Detection and Segmentation with Cascading Collaborative Learning.

[BibT_eX]

[DOI]

CoRR, 2023

Towards General Visual-Linguistic Face Forgery Detection.

[BibT_eX]

[DOI]

CoRR, 2023

Adapting Pre-trained Language Models to Vision-Language Tasks via Dynamic Visual Prompting.

[BibT_eX]

[DOI]

CoRR, 2023

Towards End-to-end Semi-supervised Learning for One-stage Object Detection.

[BibT_eX]

[DOI]

CoRR, 2023

Towards Efficient Visual Adaption via Structural Re-parameterization.

[BibT_eX]

[DOI]

CoRR, 2023

HSM-QA: Question Answering System Based on Hierarchical Semantic Matching.

[BibT_eX]

[DOI]

IEEE Access, 2023

Parameter and Computation Efficient Transfer Learning for Vision-Language Pre-trained Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Semi-Supervised Panoptic Narrative Grounding.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

Beyond First Impressions: Integrating Joint Multi-modal Cues for Comprehensive 3D Representation.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

Beat: Bi-directional One-to-Many Embedding Alignment for Text-based Person Retrieval.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

PixelFace+: Towards Controllable Face Generation and Manipulation with Text Descriptions and Segmentation Masks.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

X-Mesh: Towards Fast and Accurate Text-driven 3D Stylization via Dynamic Textual Guidance.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

RefTeacher: A Strong Baseline for Semi-Supervised Referring Expression Comprehension.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

RefCLIP: A Universal Teacher for Weakly Supervised Referring Expression Comprehension.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Clover: Towards A Unified Video-Language Alignment and Fusion Model.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

End-to-End Zero-Shot HOI Detection via Vision and Language Knowledge Distillation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

Towards Real-Time Panoptic Narrative Grounding by an End-to-End Grounding Network.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022

Knowledge-Driven Generative Adversarial Network for Text-to-Image Synthesis.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2022

Towards Lightweight Transformer Via Group-Wise Transformation for Vision-and-Language Tasks.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2022

Knowing What to Learn: A Metric-Oriented Focal Mechanism for Image Captioning.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2022

Plenty is Plague: Fine-Grained Learning for Visual Question Answering.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2022

Fast Class-Wise Updating for Online Hashing.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2022

Modeling long-term video semantic distribution for temporal action proposal generation.

[BibT_eX]

[DOI]

Neurocomputing, 2022

Clover: Towards A Unified Video-Language Alignment and Fusion Model.

[BibT_eX]

[DOI]

CoRR, 2022

What Goes beyond Multi-modal Fusion in One-stage Referring Expression Comprehension: An Empirical Study.

[BibT_eX]

[DOI]

CoRR, 2022

End-to-End Zero-Shot HOI Detection via Vision and Language Knowledge Distillation.

[BibT_eX]

[DOI]

CoRR, 2022

PixelFolder: An Efficient Progressive Pixel Synthesis Network for Image Generation.

[BibT_eX]

[DOI]

CoRR, 2022

Differentiated Relevances Embedding for Group-based Referring Expression Comprehension.

[BibT_eX]

[DOI]

CoRR, 2022

Make Sharpness-Aware Minimization Stronger: A Sparsified Perturbation Approach.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Towards Open-Ended Text-to-Face Generation, Combination and Manipulation.

[BibT_eX]

[DOI]

Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Learning Dynamic Prior Knowledge for Text-to-Face Pixel Synthesis.

[BibT_eX]

[DOI]

Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text Retrieval.

[BibT_eX]

[DOI]

Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

SeqTR: A Simple Yet Universal Network for Visual Grounding.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

An Information Theoretic Approach for Attention-Driven Face Forgery Detection.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

PixelFolder: An Efficient Progressive Pixel Synthesis Network for Image Generation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

DIFNet: Boosting Visual Information Flow for Image Captioning.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Active Teacher for Semi-Supervised Object Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021

Deep Semantic Parsing of Freehand Sketches With Homogeneous Transformation, Soft-Weighted Loss, and Staged Learning.

[BibT_eX]

[DOI]

Ying Zheng

Hongxun Yao

Xiaoshuai Sun

IEEE Trans. Multim., 2021

Evolving Fully Automated Machine Learning via Life-Long Knowledge Anchors.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2021

Sketch-specific data augmentation for freehand sketch recognition.

[BibT_eX]

[DOI]

Neurocomputing, 2021

Towards Language-guided Visual Recognition via Dynamic Convolutions.

[BibT_eX]

[DOI]

CoRR, 2021

TRAR: Routing the Attention Spans in Transformer for Visual Question Answering.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

RSTNet: Captioning With Adaptive Attention on Visual and Non-Visual Words.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Dual-level Collaborative Transformer for Image Captioning.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

Improving Image Captioning by Leveraging Intra- and Inter-layer Global Representation in Transformer Network.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020

Similarity-Preserving Linkage Hashing for Online Image Retrieval.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2020

Deep Saliency Hashing for Fine-Grained Retrieval.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2020

TVENet: Temporal variance embedding network for fine-grained action representation.

[BibT_eX]

[DOI]

Pattern Recognit., 2020

Semi-Supervised Adversarial Monocular Depth Estimation.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2020

What is damaged: a benchmark dataset for abnormal traffic object classification.

[BibT_eX]

[DOI]

Multim. Tools Appl., 2020

Actionness-pooled Deep-convolutional Descriptor for fine-grained action recognition.

[BibT_eX]

[DOI]

Neurocomputing, 2020

Hadamard Matrix Guided Online Hashing.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., 2020

K-armed Bandit based Multi-Modal Network Architecture Search for Visual Question Answering.

[BibT_eX]

[DOI]

Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Exploring Language Prior for Mode-Sensitive Visual Attention Modeling.

[BibT_eX]

[DOI]

Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Cascade Grouped Attention Network for Referring Expression Segmentation.

[BibT_eX]

[DOI]

Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Attacking Image Captioning Towards Accuracy-Preserving Target Words Removal.

[BibT_eX]

[DOI]

Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Multi-Task Collaborative Network for Joint Referring Expression Comprehension and Segmentation.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

SSAH: Semi-Supervised Adversarial Deep Hashing with Self-Paced Hard Sample Generation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019

Discovering Latent Discriminative Patterns for Multi-Mode Event Representation.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2019

Correntropy-Induced Robust Low-Rank Hypergraph.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2019

Gradual recovery based occluded digit images recognition.

[BibT_eX]

[DOI]

Multim. Tools Appl., 2019

Action recognition with multi-scale trajectory-pooled 3D convolutional descriptors.

[BibT_eX]

[DOI]

Multim. Tools Appl., 2019

Robust ℓ2-Hypergraph and its applications.

[BibT_eX]

[DOI]

Inf. Sci., 2019

Unsupervised semantic deep hashing.

[BibT_eX]

[DOI]

Neurocomputing, 2019

Hadamard Codebook Based Deep Hashing.

[BibT_eX]

[DOI]

CoRR, 2019

Toward 3D Object Reconstruction from Stereo Images.

[BibT_eX]

[DOI]

CoRR, 2019

Semantic-aware Image Deblurring.

[BibT_eX]

[DOI]

CoRR, 2019

Scene-based Factored Attention for Image Captioning.

[BibT_eX]

[DOI]

CoRR, 2019

Supervised Online Hashing via Similarity Distribution Learning.

[BibT_eX]

[DOI]

CoRR, 2019

Pix2Vox: Context-aware 3D Reconstruction from Single and Multi-view Images.

[BibT_eX]

[DOI]

CoRR, 2019

Social Media Based Topic Modeling for Smart Campus: A Deep Topical Correlation Analysis Method.

[BibT_eX]

[DOI]

IEEE Access, 2019

Information Competing Process for Learning Diversified Representations.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Variational Structured Semantic Inference for Diverse Image Captioning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Multi-modal Multi-layer Fusion Network with Average Binary Center Loss for Face Anti-spoofing.

[BibT_eX]

[DOI]

Proceedings of the 27th ACM International Conference on Multimedia, 2019

Hypergraph Induced Convolutional Manifold Networks.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019

A Video Post-Filter Deblocking Method Based on Temporal Boosting Residual Networks.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Multimedia and Expo, 2019

Pix2Vox: Context-Aware 3D Reconstruction From Single and Multi-View Images.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Towards Cross-modality Topic Modelling via Deep Topical Correlation Analysis.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Dynamic Capsule Attention for Visual Question Answering.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

Free VQA Models from Knowledge Inertia by Pairwise Inconformity Learning.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

Towards Optimal Fine Grained Retrieval via Decorrelated Centralized Loss with Normalize-Scale Layer.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

Towards Optimal Discrete Online Hashing with Balanced Similarity.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018

Two-Stream 3-D convNet Fusion for Action Recognition in Videos With Arbitrary Size and Length.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2018

Distinctive action sketch for human action recognition.

[BibT_eX]

[DOI]

Signal Process., 2018

Event patches: Mining effective parts for event detection and understanding.

[BibT_eX]

[DOI]

Signal Process., 2018

Exploring part-aware segmentation for fine-grained visual categorization.

[BibT_eX]

[DOI]

Multim. Tools Appl., 2018

Rediscover flowers structurally.

[BibT_eX]

[DOI]

Multim. Tools Appl., 2018

Hierarchical semantic image matching using CNN feature pyramid.

[BibT_eX]

[DOI]

Comput. Vis. Image Underst., 2018

Semantic and Contrast-Aware Saliency.

[BibT_eX]

[DOI]

Xiaoshuai Sun

CoRR, 2018

The Effectiveness of Instance Normalization: a Strong Baseline for Single Image Dehazing.

[BibT_eX]

[DOI]

CoRR, 2018

Centralized Ranking Loss with Weakly Supervised Localization for Fine-Grained Object Retrieval.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, 2018

Add: Actionness-Pooled Deep-Convolutional Descriptor.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Multimedia and Expo, 2018

Cycle-Consistency Based Hierarchical Dense Semantic Correspondence.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Image Processing, 2018

Illustrate your travel notes: web-based story visualization.

[BibT_eX]

[DOI]

Proceedings of the 10th International Conference on Internet Multimedia Computing and Service, 2018

Weighted voxel: a novel voxel representation for 3D reconstruction.

[BibT_eX]

[DOI]

Proceedings of the 10th International Conference on Internet Multimedia Computing and Service, 2018

Restricted Boltzmann Machine Based Active Learning for Sparse Recommendation.

[BibT_eX]

[DOI]

Nguyen Quoc Viet Hung

Proceedings of the Database Systems for Advanced Applications, 2018

GroupCap: Group-Based Image Captioning With Structured Relevance and Diversity Constraints.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Strong Baseline for Single Image Dehazing with Deep Features and Instance Normalization.

[BibT_eX]

[DOI]

Proceedings of the British Machine Vision Conference 2018, 2018

2017

Dancelets Mining for Video Recommendation Based on Dance Styles.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2017

Hierarchical Latent Concept Discovery for Video Event Detection.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2017

Breaking video into pieces for action recognition.

[BibT_eX]

[DOI]

Multim. Tools Appl., 2017

Anomaly detection based on spatio-temporal sparse representation and visual attention analysis.

[BibT_eX]

[DOI]

Chen Wang

Hongxun Yao

Xiaoshuai Sun

Multim. Tools Appl., 2017

Exploiting the complementary strengths of multi-layer CNN features for image retrieval.

[BibT_eX]

[DOI]

Neurocomputing, 2017

Actor identification via mining representative actions.

[BibT_eX]

[DOI]

Neurocomputing, 2017

Shallow and Deep Model Investigation for Distinguishing Corn and Weeds.

[BibT_eX]

[DOI]

Proceedings of the Advances in Multimedia Information Processing - PCM 2017, 2017

Object Discovery and Cosegmentation Based on Dense Correspondences.

[BibT_eX]

[DOI]

Proceedings of the Advances in Multimedia Information Processing - PCM 2017, 2017

Multi-scale Discriminative Patches for Fined-Grained Visual Categorization.

[BibT_eX]

[DOI]

Proceedings of the Advances in Multimedia Information Processing - PCM 2017, 2017

Trajectory-Pooled 3D Convolutional Descriptors for Action Recognition.

[BibT_eX]

[DOI]

Proceedings of the Advances in Multimedia Information Processing - PCM 2017, 2017

Gated additive skip context connection for object detection.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Image Processing, 2017

Dancing like a superstar: Action guidance based on pose estimation and conditional pose alignment.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Image Processing, 2017

SPTF: A Scalable Probabilistic Tensor Factorization Model for Semantic-Aware Behavior Prediction.

[BibT_eX]

[DOI]

Quoc Viet Hung Nguyen

Proceedings of the 2017 IEEE International Conference on Data Mining, 2017

An Integrated Model for Effective Saliency Prediction.

[BibT_eX]

[DOI]

Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017

Web-Based Semantic Fragment Discovery for On-Line Lingual-Visual Similarity.

[BibT_eX]

[DOI]

Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017

2016

Robust spatial-temporal deep model for multimedia event detection.

[BibT_eX]

[DOI]

Litao Yu

Xiaoshuai Sun

Zi Huang

Neurocomputing, 2016

Unsupervised discovery of crowd activities by saliency-based clustering.

[BibT_eX]

[DOI]

Neurocomputing, 2016

Quartet-net Learning for Visual Instance Retrieval.

[BibT_eX]

[DOI]

Proceedings of the 2016 ACM Conference on Multimedia Conference, 2016

Mining representative actions for actor identification.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2015

深度学习中的自编码器的表达能力研究 (Representation Ability Research of Auto-encoders in Deep Learning).

[BibT_eX]

[DOI]

计算机科学, 2015

Strategy for dynamic 3D depth data matching towards robust action retrieval.

[BibT_eX]

[DOI]

Neurocomputing, 2015

Strategy for aesthetic photography recommendation via collaborative composition model.

[BibT_eX]

[DOI]

IET Comput. Vis., 2015

Part-Aware Segmentation for Fine-Grained Categorization.

[BibT_eX]

[DOI]

Proceedings of the Advances in Multimedia Information Processing - PCM 2015, 2015

"Clustering of Dancelets": Towards Video Recommendation Based on Dance Styles.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual ACM Conference on Multimedia Conference, MM '15, Brisbane, Australia, October 26, 2015

Distinctive action sketch.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Image Processing, 2015

Predicting discrete probability distribution of image emotions.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Image Processing, 2015

Dual-mode video stabilization based on adaptive motion clustering.

[BibT_eX]

[DOI]

Proceedings of the 7th International Conference on Internet Multimedia Computing and Service, 2015

Boost sparse coding based abnormal event detection via explicitly applying temporal continuity constraint.

[BibT_eX]

[DOI]

Proceedings of the 7th International Conference on Internet Multimedia Computing and Service, 2015

2014

Toward Statistical Modeling of Saccadic Eye-Movement and Visual Saliency.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2014

Where should I stand? Learning based human position recommendation for mobile photographing.

[BibT_eX]

[DOI]

Multim. Tools Appl., 2014

Using Label Propagation to Get Confidence Map for Segmentation.

[BibT_eX]

[DOI]

Haoran Li

Hongxun Yao

Xiaoshuai Sun

Proceedings of the Advances in Multimedia Information Processing - PCM 2014, 2014

Exploring Principles-of-Art Features For Image Emotion Recognition.

[BibT_eX]

[DOI]

Proceedings of the ACM International Conference on Multimedia, MM '14, Orlando, FL, USA, November 03, 2014

Exploring covert attention for generic boosting of saliency models.

[BibT_eX]

[DOI]

Xiaoshuai Sun

Hongxun Yao

Proceedings of the 2014 IEEE International Conference on Image Processing, 2014

Structure-aware multi-object discovery for weakly supervised tracking.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE International Conference on Image Processing, 2014

"Clustering by saliency" - Unsupervised discovery of crowd activities.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE International Conference on Image Processing, 2014

Discriminative Features for Bird Species Classification.

[BibT_eX]

[DOI]

Cheng Pang

Hongxun Yao

Xiaoshuai Sun

Proceedings of the International Conference on Internet Multimedia Computing and Service, 2014

2013

Bidirectional-isomorphic manifold learning at image semantic understanding & representation.

[BibT_eX]

[DOI]

Multim. Tools Appl., 2013

Visual attention modeling based on short-term environmental adaption.

[BibT_eX]

[DOI]

Xiaoshuai Sun

Hongxun Yao

Rongrong Ji

J. Vis. Commun. Image Represent., 2013

Video classification and recommendation based on affective analysis of viewers.

[BibT_eX]

[DOI]

Sicheng Zhao

Hongxun Yao

Xiaoshuai Sun

Neurocomputing, 2013

Flexible Presentation of Videos Based on Affective Content Analysis.

[BibT_eX]

[DOI]

Proceedings of the Advances in Multimedia Modeling, 19th International Conference, 2013

On dense sampling size.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Image Processing, 2013

Exploring Implicit Image Statistics for Visual Representativeness Modeling.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013

2012

Context-Aware Semi-Local Feature Detector.

[BibT_eX]

[DOI]

ACM Trans. Intell. Syst. Technol., 2012

Task-Dependent Visual-Codebook Compression.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2012

Action retrieval based on generalized dynamic depth data matching.

[BibT_eX]

[DOI]

Lujun Chen

Hongxun Yao

Xiaoshuai Sun

Proceedings of the 2012 Visual Communications and Image Processing, 2012

Action Segmentation in Dance Videos.

[BibT_eX]

[DOI]

Proceedings of the Advances in Multimedia Information Processing - PCM 2012, 2012

Real-Time Viewfinder Composition Assessment and Recommendation to Mobile Photographing.

[BibT_eX]

[DOI]

Proceedings of the Advances in Multimedia Information Processing - PCM 2012, 2012

Memorable basis: towards human-centralized sparse representation.

[BibT_eX]

[DOI]

Xiaoshuai Sun

Hongxun Yao

Proceedings of the 20th ACM Multimedia Conference, MM '12, Nara, Japan, October 29, 2012

Aesthetic composition represetation for portrait photographing recommendation.

[BibT_eX]

[DOI]

Proceedings of the 19th IEEE International Conference on Image Processing, 2012

What are we looking for: Towards statistical modeling of saccadic eye movements and visual saliency.

[BibT_eX]

[DOI]

Xiaoshuai Sun

Hongxun Yao

Rongrong Ji

Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012

2011

Actor-independent action search using spatiotemporal vocabulary with appearance hashing.

[BibT_eX]

[DOI]

Rongrong Ji

Hongxun Yao

Xiaoshuai Sun

Pattern Recognit., 2011

Video indexing and recommendation based on affective analysis of viewers.

[BibT_eX]

[DOI]

Proceedings of the 19th International Conference on Multimedia 2011, Scottsdale, AZ, USA, November 28, 2011

Unsupervised fast anomaly detection in crowds.

[BibT_eX]

[DOI]

Proceedings of the 19th International Conference on Multimedia 2011, Scottsdale, AZ, USA, November 28, 2011

Learning heterogeneous data for hierarchical web video classification.

[BibT_eX]

[DOI]

Proceedings of the 19th International Conference on Multimedia 2011, Scottsdale, AZ, USA, November 28, 2011

Sparse representation based visual element analysis.

[BibT_eX]

[DOI]

Proceedings of the 18th IEEE International Conference on Image Processing, 2011

Video stabilization based on saliency driven SIFT matching and discriminative RANSAC.

[BibT_eX]

[DOI]

Proceedings of the ICIMCS 2011, 2011

Contextual dictionaries for image super resolution.

[BibT_eX]

[DOI]

Proceedings of the ICIMCS 2011, 2011

A spatiotemporal context phrase description for general dynamic texture.

[BibT_eX]

[DOI]

Proceedings of the ICIMCS 2011, 2011

Affective Video Classification Based on Spatio-temporal Feature Fusion.

[BibT_eX]

[DOI]

Sicheng Zhao

Hongxun Yao

Xiaoshuai Sun

Proceedings of the Sixth International Conference on Image and Graphics, 2011

Saliency Detection: A Self-Adaption Sparse Representation Approach.

[BibT_eX]

[DOI]

Proceedings of the Sixth International Conference on Image and Graphics, 2011

2010

A rotation and scale invariant texture description approach.

[BibT_eX]

[DOI]

Proceedings of the Visual Communications and Image Processing 2010, 2010

Saliency detection based on short-term sparse representation.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Image Processing, 2010

Visual saliency as sequential eye fixation probability.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Image Processing, 2010

A robust texture descriptor using multifractal analysis with Gabor filter.

[BibT_eX]

[DOI]

Proceedings of the Second International Conference on Internet Multimedia Computing and Service, 2010

Visual topic model for web image annotation.

[BibT_eX]

[DOI]

Proceedings of the Second International Conference on Internet Multimedia Computing and Service, 2010

Mining actor correlations with hierarchical concurrence parsing.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2010

Towards semantic embedding in visual vocabulary.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, 2010

2009

Visual and textual fusion for semantically supervised region-based retrieval.

[BibT_eX]

[DOI]

Multim. Syst., 2009

Photo assessment based on computational visual attention model.

[BibT_eX]

[DOI]

Proceedings of the 17th International Conference on Multimedia 2009, 2009

What is a complete set of keywords for image description & annotation on the web.

[BibT_eX]

[DOI]

Proceedings of the 17th International Conference on Multimedia 2009, 2009

VisualCor system: search actor correlations in TV series.

[BibT_eX]

[DOI]

Proceedings of the First International Conference on Internet Multimedia Computing and Service, 2009

2008

Vision-Based Semi-supervised Homecare with Spatial Constraint.

[BibT_eX]

[DOI]

Proceedings of the Advances in Multimedia Information Processing, 2008

Attention-driven action retrieval with DTW-based 3d descriptor matching.

[BibT_eX]

[DOI]

Proceedings of the 16th International Conference on Multimedia 2008, 2008

Place retrieval with graph-based place-view model.

[BibT_eX]

[DOI]

Proceedings of the 1st ACM SIGMM International Conference on Multimedia Information Retrieval, 2008

Cross-media manifold learning for image retrieval & annotation.

[BibT_eX]

[DOI]

Proceedings of the 1st ACM SIGMM International Conference on Multimedia Information Retrieval, 2008

Directional correlation analysis of local Haar binary pattern for text detection.

[BibT_eX]

[DOI]

Proceedings of the 2008 IEEE International Conference on Multimedia and Expo, 2008

Text Particles Multi-band Fusion for Robust Text Detection.

[BibT_eX]

[DOI]

Proceedings of the Image Analysis and Recognition, 5th International Conference, 2008

Xiaoshuai Sun

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...