GestureHYDRA: Semantic Co-speech Gesture Synthesis via Hybrid Modality Diffusion Transformer and Cascaded-Synchronized Retrieval-Augmented Generation.

[BibT_eX]

[DOI]

Quanwei Yang

Luying Huang

CoRR, July, 2025

IGD: Instructional Graphic Design with Multimodal Layer Generation.

[BibT_eX]

[DOI]

CoRR, July, 2025

Test-Time Scaling with Reflective Generative Model.

[BibT_eX]

[DOI]

CoRR, July, 2025

Distilling Multi-Level Semantic Cues Across Multi-Modalities for Face Forgery Detection.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., May, 2025

ReferSAM: Unleashing Segment Anything Model for Referring Image Segmentation.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., May, 2025

From Evaluation to Defense: Advancing Safety in Video Large Language Models.

[BibT_eX]

[DOI]

CoRR, May, 2025

A Detail-Aware Transformer to Generalizable Face Forgery Detection.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., April, 2025

DHVT: Dynamic Hybrid Vision Transformer for Small Dataset Recognition.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., April, 2025

Self-refined variational transformer for image-conditioned layout generation.

[BibT_eX]

[DOI]

Int. J. Mach. Learn. Cybern., March, 2025

Mask2DiT: Dual Mask-based Diffusion Transformer for Multi-Scene Long Video Generation.

[BibT_eX]

[DOI]

CoRR, March, 2025

SpaceVLLM: Endowing Multimodal Large Language Model with Spatio-Temporal Video Grounding Capability.

[BibT_eX]

[DOI]

CoRR, March, 2025

What Is a Good Caption? A Comprehensive Visual Caption Benchmark for Evaluating Both Correctness and Coverage of MLLMs.

[BibT_eX]

[DOI]

CoRR, February, 2025

THGS: Lifelike Talking Human Avatar Synthesis From Monocular Video Via 3D Gaussian Splatting.

[BibT_eX]

[DOI]

Comput. Graph. Forum, February, 2025

Exploiting Pre-Trained Language Models for Black-Box Attack against Knowledge Graph Embeddings.

[BibT_eX]

[DOI]

ACM Trans. Knowl. Discov. Data, January, 2025

Leveraging Concise Concepts With Probabilistic Modeling for Interpretable Visual Recognition.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2025

High Fidelity Face Swapping via Facial Texture and Structure Consistency Mining.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2025

Denoised and Dynamic Alignment Enhancement for Zero-Shot Learning.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2025

IterMeme: Expert-Guided Multimodal LLM for Interactive Meme Creation with Layout-Aware Generation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence, 2025

CPL: Curriculum Pseudo Labeling for Weakly Supervised Temporal Forgery Localization.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

SynTab-LLaVA: Enhancing Multimodal Table Understanding with Decoupled Synthesis.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Mask^2DiT: Dual Mask-based Diffusion Transformer for Multi-Scene Long Video Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

PosterMaker: Towards High-Quality Product Poster Generation with Accurate Text Rendering.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

IDseq: Decoupled and Sequentially Detecting and Grounding Multi-Modal Media Manipulation.

[BibT_eX]

[DOI]

Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024

STIDNet: Identity-Aware Face Forgery Detection With Spatiotemporal Knowledge Distillation.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Soc. Syst., August, 2024

CDistNet: Perceiving Multi-domain Character Distance for Robust Text Recognition.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., February, 2024

Semantic-Enhanced Proxy-Guided Hashing for Long-Tailed Image Retrieval.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2024

Balanced Classification: A Unified Framework for Long-Tailed Object Detection.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2024

Towards Discriminative Feature Generation for Generalized Zero-Shot Learning.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2024

IEIRNet: Inconsistency Exploiting Based Identity Rectification for Face Forgery Detection.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2024

Exploring Bi-Level Inconsistency via Blended Images for Generalizable Face Forgery Detection.

[BibT_eX]

[DOI]

IEEE Trans. Inf. Forensics Secur., 2024

Cascade Semantic Prompt Alignment Network for Image Captioning.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., 2024

DCFP: Distribution Calibrated Filter Pruning for Lightweight and Accurate Long-Tail Semantic Segmentation.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., 2024

Generalizable Speech Spoofing Detection Against Silence Trimming With Data Augmentation and Multi-Task Meta-Learning.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2024

Symmetrical Siamese Network for pose-guided person synthesis.

[BibT_eX]

[DOI]

Comput. Vis. Image Underst., 2024

A Graph-Based Synthetic Data Pipeline for Scaling High-Quality Reasoning Instructions.

[BibT_eX]

[DOI]

CoRR, 2024

SVTRv2: CTC Beats Encoder-Decoder Models in Scene Text Recognition.

[BibT_eX]

[DOI]

CoRR, 2024

Boosting Semi-Supervised Scene Text Recognition via Viewing and Summarizing.

[BibT_eX]

[DOI]

CoRR, 2024

Pistis-RAG: A Scalable Cascading Framework Towards Trustworthy Retrieval-Augmented Generation.

[BibT_eX]

[DOI]

CoRR, 2024

Hallucination Mitigation Prompts Long-term Video Understanding.

[BibT_eX]

[DOI]

CoRR, 2024

Choose What You Need: Disentangled Representation Learning for Scene Text Recognition, Removal and Editing.

[BibT_eX]

[DOI]

CoRR, 2024

TALK-Act: Enhance Textural-Awareness for 2D Speaking Avatar Reenactment with Diffusion Model.

[BibT_eX]

[DOI]

Proceedings of the SIGGRAPH Asia 2024 Conference Papers, 2024

How Control Information Influences Multilingual Text Image Generation and Editing?

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

ShowMaker: Creating High-Fidelity 2D Human Video via Fine-Grained Diffusion Modeling.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Boosting Semi-Supervised Scene Text Recognition via Viewing and Summarizing.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Control-Talker: A Rapid-Customization Talking Head Generation Method for Multi-Condition Control and High-Texture Enhancement.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

LATextSpotter: Empowering Transformer Decoder with Length Perception Ability.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Circuits and Systems, 2024

Focus on the Whole Character: Discriminative Character Modeling for Scene Text Recognition.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024

Self-Supervised Pre-training with Symmetric Superimposition Modeling for Scene Text Recognition.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024

Leveraging Text Localization for Scene Text Removal via Text-Aware Masked Image Modeling.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

AlignZeg: Mitigating Objective Misalignment for Zero-Shot Semantic Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Choose What You Need: Disentangled Representation Learning for Scene Text Recognition, Removal and Editing.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

OTE: Exploring Accurate Scene Text Recognition Using One Token.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

DiffAM: Diffusion-Based Adversarial Makeup Transfer for Facial Privacy Protection.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Knowledge Context Modeling with Pre-trained Language Models for Contrastive Knowledge Graph Completion.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2024

Towards Balanced Alignment: Modal-Enhanced Semantic Modeling for Video Moment Retrieval.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

Discriminative Feature Mining Based on Frequency Information and Metric Learning for Face Forgery Detection.

[BibT_eX]

[DOI]

IEEE Trans. Knowl. Data Eng., December, 2023

Meta semi-supervised medical image segmentation with label hierarchy.

[BibT_eX]

[DOI]

Health Inf. Sci. Syst., December, 2023

Constructing Spatio-Temporal Graphs for Face Forgery Detection.

[BibT_eX]

[DOI]

ACM Trans. Web, August, 2023

ABINet++: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Spotting.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., June, 2023

Neighborhood-Adaptive Multi-Cluster Ranking for Deep Metric Learning.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., April, 2023

Multi-task hourglass network for online automatic diagnosis of developmental dysplasia of the hip.

[BibT_eX]

[DOI]

World Wide Web (WWW), March, 2023

Learning Pixel Affinity Pyramid for Arbitrary-Shaped Text Detection.

[BibT_eX]

[DOI]

ACM Trans. Multim. Comput. Commun. Appl., February, 2023

ADNet: Rethinking the Shrunk Polygon-Based Approach in Scene Text Detection.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2023

Learning Cross-Channel Representations for Semantic Segmentation.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2023

What is the Real Need for Scene Text Removal? Exploring the Background Integrity and Erasure Exhaustivity Properties.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2023

Prototypical Matching Networks for Video Object Segmentation.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2023

Learning Complete Topology-Aware Correlations Between Relations for Inductive Link Prediction.

[BibT_eX]

[DOI]

CoRR, 2023

MomentDiff: Generative Video Moment Retrieval from Random to Real.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Frequency-based Zero-Shot Learning with Phase Augmentation.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

Masked Text Modeling: A Self-Supervised Pre-training Method for Scene Text Detection.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

Symmetrical Linguistic Feature Distillation with CLIP for Scene Text Recognition.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

CARIS: Context-Aware Referring Image Segmentation.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

High Fidelity Face Swapping via Semantics Disentanglement and Structure Enhancement.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

Dual Dynamic Proxy Hashing Network for Long-tailed Image Retrieval.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

TextPainter: Multimodal Text Image Generation with Visual-harmony and Text-comprehension for Poster Design.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

RAIRNet: Region-Aware Identity Rectification for Face Forgery Detection.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

TPS++: Attention-Enhanced Thin-Plate Spline for Scene Text Recognition.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

Linguistic More: Taking a Further Step toward Efficient and Accurate Scene Text Recognition.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

Difference-Aware Iterative Reasoning Network for Key Relation Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Multimedia and Expo, 2023

Progressive Spatio-Temporal Prototype Matching for Text-Video Retrieval.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Learning Orthogonal Prototypes for Generalized Few-Shot Semantic Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

An Online/Offline Power Data Sharing System Based on Blockchain.

[BibT_eX]

[DOI]

Hongtao Xie

Yang Wang

Xin Tian

Proceedings of the 2023 International Conference on Communication Network and Machine Learning, 2023

Exploring Stroke-Level Modifications for Scene Text Editing.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022

Multimodal Learning for Temporally Coherent Talking Face Generation With Articulator Synergy.

[BibT_eX]

[DOI]

Lingyun Yu

Hongtao Xie

Yongdong Zhang

IEEE Trans. Multim., 2022

Boundary-Aware Arbitrary-Shaped Scene Text Detector With Learnable Embedding Network.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2022

Online Residual Quantization Via Streaming Data Correlation Preserving.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2022

Dynamic-Aware Federated Learning for Face Forgery Video Detection.

[BibT_eX]

[DOI]

ACM Trans. Intell. Syst. Technol., 2022

PETR: Rethinking the Capability of Transformer-Based Language Model in Scene Text Recognition.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2022

Deep Fourier Ranking Quantization for Semi-Supervised Image Retrieval.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2022

Bilateral Temporal Re-Aggregation for Weakly-Supervised Video Object Segmentation.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., 2022

Self-Supervised Synthesis Ranking for Deep Metric Learning.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., 2022

Semi-Supervised Text Detection With Accurate Pseudo-Labels.

[BibT_eX]

[DOI]

IEEE Signal Process. Lett., 2022

Attention-guided transformation-invariant attack for black-box adversarial examples.

[BibT_eX]

[DOI]

Int. J. Intell. Syst., 2022

Bridging the Gap Between Vision Transformers and Convolutional Neural Networks on Small Datasets.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

REMOT: A Region-to-Whole Framework for Realistic Human Motion Transfer.

[BibT_eX]

[DOI]

Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Boat in the Sky: Background Decoupling and Object-aware Pooling for Weakly Supervised Semantic Segmentation.

[BibT_eX]

[DOI]

Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Proxy Probing Decoder for Weakly Supervised Object Localization: A Baseline Investigation.

[BibT_eX]

[DOI]

Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Wavelet-enhanced Weakly Supervised Local Feature Learning for Face Forgery Detection.

[BibT_eX]

[DOI]

Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Dual Part Discovery Network for Zero-Shot Learning.

[BibT_eX]

[DOI]

Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Geometry Aligned Variational Transformer for Image-conditioned Layout Generation.

[BibT_eX]

[DOI]

Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Weakly Supervised Pediatric Bone Age Assessment Using Ultrasonic Images via Automatic Anatomical RoI Detection.

[BibT_eX]

[DOI]

Proceedings of the ICMR '22: International Conference on Multimedia Retrieval, Newark, NJ, USA, June 27, 2022

Detecting Tampered Scene Text in the Wild.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

Dual-Stream Knowledge-Preserving Hashing for Unsupervised Video Retrieval.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

Partial Class Activation Attention for Semantic Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Neighborhood-Adaptive Structure Augmented Metric Learning.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021

R-Net: A Relationship Network for Efficient and Accurate Scene Text Detection.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2021

Domain-Oriented Semantic Embedding for Zero-Shot Learning.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2021

A Mutually Attentive Co-Training Framework for Semi-Supervised Recognition.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2021

Hip Landmark Detection With Dependency Mining in Ultrasound Image.

[BibT_eX]

[DOI]

IEEE Trans. Medical Imaging, 2021

Self-Supervised Attention Mechanism for Pediatric Bone Age Assessment With Efficient Weak Annotation.

[BibT_eX]

[DOI]

Chuanbin Liu

Hongtao Xie

Yongdong Zhang

IEEE Trans. Medical Imaging, 2021

PRRNet: Pixel-Region relation network for face forgery detection.

[BibT_eX]

[DOI]

Pattern Recognit., 2021

Hierarchical multi-view context modelling for 3D object classification and retrieval.

[BibT_eX]

[DOI]

Inf. Sci., 2021

A Simple and Strong Baseline: Progressively Region-based Scene Text Removal Networks.

[BibT_eX]

[DOI]

CoRR, 2021

Cross-Modal Attention Consistency for Video-Audio Unsupervised Learning.

[BibT_eX]

[DOI]

CoRR, 2021

Memory Enhanced Embedding Learning for Cross-Modal Video-Text Retrieval.

[BibT_eX]

[DOI]

CoRR, 2021

TDI TextSpotter: Taking Data Imbalance into Account in Scene Text Spotting.

[BibT_eX]

[DOI]

Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Cluster and Scatter: A Multi-grained Active Semi-supervised Learning Framework for Scalable Person Re-identification.

[BibT_eX]

[DOI]

Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

End-to-end Boundary Exploration for Weakly-supervised Semantic Segmentation.

[BibT_eX]

[DOI]

Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Look Back Again: Dual Parallel Attention Network for Accurate and Robust Scene Text Recognition.

[BibT_eX]

[DOI]

Proceedings of the ICMR '21: International Conference on Multimedia Retrieval, 2021

Dynamic Inconsistency-aware DeepFake Video Detection.

[BibT_eX]

[DOI]

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, 2021

Global Characteristic Guided Landmark Detection for Genu Valgus and Varus Diagnosis.

[BibT_eX]

[DOI]

Proceedings of the Image and Graphics - 11th International Conference, 2021

From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Frequency-Aware Discriminative Feature Learning Supervised by Single-Center Loss for Face Forgery Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Query-Memory Re-Aggregation for Weakly-supervised Video Object Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

Semantic-guided Reinforced Region Embedding for Generalized Zero-Shot Learning.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020

Robust Deep Co-Saliency Detection With Group Semantic and Pyramid Attention.

[BibT_eX]

[DOI]

IEEE Trans. Neural Networks Learn. Syst., 2020

Bidirectional Attention-Recognition Model for Fine-Grained Object Classification.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2020

Misshapen Pelvis Landmark Detection With Local-Global Feature Learning for Diagnosing Developmental Dysplasia of the Hip.

[BibT_eX]

[DOI]

IEEE Trans. Medical Imaging, 2020

Mining Spatial-Temporal Similarity for Visual Tracking.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2020

Multi-Objective Matrix Normalization for Fine-Grained Visual Recognition.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2020

Global context and boundary structure-guided network for cross-modal organ segmentation.

[BibT_eX]

[DOI]

Inf. Process. Manag., 2020

Hierarchical Granularity Transfer Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Improving Brain Tumor Segmentation with Dilated Pseudo-3D Convolution and Multi-direction Fusion.

[BibT_eX]

[DOI]

Proceedings of the MultiMedia Modeling - 26th International Conference, 2020

Law Is Order: Protecting Multimedia Network Transmission by Game Theory and Mechanism Design.

[BibT_eX]

[DOI]

Chuanbin Liu

Youliang Tian

Hongtao Xie

Proceedings of the MultiMedia Modeling - 26th International Conference, 2020

CRNet: A Center-aware Representation for Detecting Text of Arbitrary Shapes.

[BibT_eX]

[DOI]

Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

March on Data Imperfections: Domain Division and Domain Generalization for Semantic Segmentation.

[BibT_eX]

[DOI]

Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Multi-Features Fusion and Decomposition for Age-Invariant Face Recognition.

[BibT_eX]

[DOI]

Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Learning Rich Attention for Pediatric Bone Age Assessment.

[BibT_eX]

[DOI]

Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2020, 2020

Real-World Automatic Makeup via Identity Preservation Makeup Net.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, 2020

Hierarchical Consistency and Refinement for Semi-supervised Medical Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Pattern Recognition. ICPR International Workshops and Challenges, 2020

ContourNet: Taking a Further Step Toward Accurate Arbitrary-Shaped Scene Text Detection.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Domain-Aware Visual Bias Eliminating for Generalized Zero-Shot Learning.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Graph Structured Network for Image-Text Matching.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Curriculum Learning for Natural Language Understanding.

[BibT_eX]

[DOI]

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

CircleNet for Hip Landmark Detection.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

Filtration and Distillation: Enhancing Region Attention for Fine-Grained Visual Categorization.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019

Convolutional Attention Networks for Scene Text Recognition.

[BibT_eX]

[DOI]

ACM Trans. Multim. Comput. Commun. Appl., 2019

Double-Bit Quantization and Index Hashing for Nearest Neighbor Search.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2019

Automated pulmonary nodule detection in CT images using deep convolutional neural networks.

[BibT_eX]

[DOI]

Pattern Recognit., 2019

Supervised deep hashing for image content security.

[BibT_eX]

[DOI]

Multim. Tools Appl., 2019

Name-face association with web facial image supervision.

[BibT_eX]

[DOI]

Multim. Syst., 2019

Distributed data-dependent locality sensitive hashing.

[BibT_eX]

[DOI]

Int. J. High Perform. Comput. Netw., 2019

Adaptive Alignment Network for Person Re-identification.

[BibT_eX]

[DOI]

Proceedings of the MultiMedia Modeling - 25th International Conference, 2019

WaveCSN: Cascade Segmentation Network for Hip Landmark Detection.

[BibT_eX]

[DOI]

Proceedings of the MMAsia '19: ACM Multimedia Asia, Beijing, China, December 16-18, 2019, 2019

Adaptive Bilinear Pooling for Fine-grained Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the MMAsia '19: ACM Multimedia Asia, Beijing, China, December 16-18, 2019, 2019

Question-Aware Tube-Switch Network for Video Question Answering.

[BibT_eX]

[DOI]

Proceedings of the 27th ACM International Conference on Multimedia, 2019

Domain-Specific Embedding Network for Zero-Shot Recognition.

[BibT_eX]

[DOI]

Proceedings of the 27th ACM International Conference on Multimedia, 2019

ACE-Net: Biomedical Image Segmentation with Augmented Contracting and Expansive Paths.

[BibT_eX]

[DOI]

Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2019, 2019

Deep Cascaded Attention Network for Multi-task Brain Tumor Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2019, 2019

Misshapen Pelvis Landmark Detection by Spatial Local Correlation Mining for Diagnosing Developmental Dysplasia of the Hip.

[BibT_eX]

[DOI]

Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2019, 2019

Extract Bone Parts Without Human Prior: End-to-end Convolutional Neural Network for Pediatric Bone Age Assessment.

[BibT_eX]

[DOI]

Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2019, 2019

DSRN: A Deep Scale Relationship Network for Scene Text Detection.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019

Learning to Draw Text in Natural Images with Conditional Adversarial Networks.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019

Semi-supervised User Profiling with Heterogeneous Graph Attention Networks.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019

MLTS: A Multi-Language Scene Text Spotter.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Multimedia and Expo, 2019

Semantic-Embedding and Shape-Aware U-Net for Ultrasound Eyeball Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Multimedia and Expo, 2019

Accurate Segmentation of Synaptic Cleft with Contour Growing Concatenated with a Convnet.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE International Conference on Image Processing, 2019

Robust Deep Co-Saliency Detection with Group Semantic.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018

A Fast Uyghur Text Detector for Complex Background Images.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2018

Supervised Hash Coding With Deep Neural Network for Environment Perception of Intelligent Vehicles.

[BibT_eX]

[DOI]

IEEE Trans. Intell. Transp. Syst., 2018

Effective Uyghur Language Text Detection in Complex Background Images for Traffic Prompt Identification.

[BibT_eX]

[DOI]

IEEE Trans. Intell. Transp. Syst., 2018

Uyghur Text Matching in Graphic Images for Biomedical Semantic Analysis.

[BibT_eX]

[DOI]

Neuroinformatics, 2018

CA3Net: Contextual-Attentional Attribute-Appearance Network for Person Re-Identification.

[BibT_eX]

[DOI]

CoRR, 2018

Potential of Attention Mechanism for Classification of Optical Coherence Tomography Images.

[BibT_eX]

[DOI]

Proceedings of the IEEE Visual Communications and Image Processing, 2018

Temporal-Contextual Attention Network for Video-Based Person Re-identification.

[BibT_eX]

[DOI]

Proceedings of the Advances in Multimedia Information Processing - PCM 2018, 2018

Uyghur Text Localization with Fast Component Detection.

[BibT_eX]

[DOI]

Proceedings of the MultiMedia Modeling - 24th International Conference, 2018

CA3Net: Contextual-Attentional Attribute-Appearance Network for Person Re-Identification.

[BibT_eX]

[DOI]

Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018

Attention and Language Ensemble for Scene Text Recognition with Convolutional Sequence Modeling.

[BibT_eX]

[DOI]

Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018

Deep Convolutional Nets for Pulmonary Nodule Detection and Classification.

[BibT_eX]

[DOI]

Proceedings of the Knowledge Science, Engineering and Management, 2018

2017

Triple-Bit Quantization with Asymmetric Distance for Image Content Security.

[BibT_eX]

[DOI]

Degang Xu

Hongtao Xie

Chenggang Yan

Mach. Vis. Appl., 2017

Robust and parallel Uyghur text localization in complex background images.

[BibT_eX]

[DOI]

Mach. Vis. Appl., 2017

Residual domain dictionary learning for compressed sensing video recovery.

[BibT_eX]

[DOI]

Multim. Tools Appl., 2017

Detecting Uyghur text in complex background images with convolutional neural network.

[BibT_eX]

[DOI]

Multim. Tools Appl., 2017

RICS-DFA: a space and time-efficient signature matching algorithm with Reduced Input Character Set.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2017

Uyghur Language Text Detection in Complex Background Images Using Enhanced MSERs.

[BibT_eX]

[DOI]

Proceedings of the MultiMedia Modeling - 23rd International Conference, 2017

CPMF: A collective pairwise matrix factorization model for upcoming event recommendation.

[BibT_eX]

[DOI]

Proceedings of the 2017 International Joint Conference on Neural Networks, 2017

Supervised deep quantization for efficient image search.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Multimedia & Expo Workshops, 2017

Double-bit quantization and weighting for nearest neighbor search.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

A new dataset for hand gesture estimation.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE Global Conference on Signal and Information Processing, 2017

2016

Triple-Bit Quantization with Asymmetric Distance for Nearest Neighbor Search.

[BibT_eX]

[DOI]

Proceedings of the Advances in Multimedia Information Processing - PCM 2016, 2016

Context-Oriented Name-Face Association in Web Videos.

[BibT_eX]

[DOI]

Proceedings of the Advances in Multimedia Information Processing - PCM 2016, 2016

Robust Uyghur Text Localization in Complex Background Images.

[BibT_eX]

[DOI]

Proceedings of the Advances in Multimedia Information Processing - PCM 2016, 2016

2015

Corrigendum to "Fast and scalable lock methods for video coding on many-core architecture" [J. Visual Communication and Image Representation 25(7) (2014) 1758-1762].

[BibT_eX]

[DOI]

J. Vis. Commun. Image Represent., 2015

Fast Search with Data-Oriented Multi-Index Hashing for Multimedia Data.

[BibT_eX]

[DOI]

KSII Trans. Internet Inf. Syst., 2015

Fast approximate matching of binary codes with distinctive bits.

[BibT_eX]

[DOI]

Chenggang Clarence Yan

Frontiers Comput. Sci., 2015

Hierarchical Encoding of Binary Descriptors for Image Matching.

[BibT_eX]

[DOI]

Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, 2015

Data-oriented multi-index hashing.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Multimedia and Expo, 2015

2014

Contextual Query Expansion for Image Retrieval.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2014

Extracting salient region for pornographic image detection.

[BibT_eX]

[DOI]

Chenggang Clarence Yan

J. Vis. Commun. Image Represent., 2014

Fast and scalable lock methods for video coding on many-core architecture.

[BibT_eX]

[DOI]

J. Vis. Commun. Image Represent., 2014

Fusing audio vocabulary with visual features for pornographic video detection.

[BibT_eX]

[DOI]

Future Gener. Comput. Syst., 2014

Data-Dependent Locality Sensitive Hashing.

[BibT_eX]

[DOI]

Proceedings of the Advances in Multimedia Information Processing - PCM 2014, 2014

Fast Search of Binary Codes with Distinctive Bits.

[BibT_eX]

[DOI]

Proceedings of the Advances in Multimedia Information Processing - PCM 2014, 2014

Video to Article Hyperlinking by Multiple Tag Property Exploration.

[BibT_eX]

[DOI]

Proceedings of the MultiMedia Modeling - 20th Anniversary International Conference, 2014

The study of methods for post-pruning decision trees based on comprehensive evaluation standard.

[BibT_eX]

[DOI]

Hongtao Xie

Fuhua Shang

Proceedings of the 11th International Conference on Fuzzy Systems and Knowledge Discovery, 2014

Distributed online similarity search in high dimensional space.

[BibT_eX]

[DOI]

Baohui Li

Hongtao Xie

Kefu Xu

Proceedings of the International Conference on Big Data and Smart Computing, BIGCOMP 2014, 2014

2013

Robust common visual pattern discovery using graph matching.

[BibT_eX]

[DOI]

J. Vis. Commun. Image Represent., 2013

2012

Application research of Web3D technology in three-dimensional show of oil well pitshaft information.

[BibT_eX]

[DOI]

Hongtao Xie

Proceedings of the 9th International Conference on Fuzzy Systems and Knowledge Discovery, 2012

2011

Efficient Feature Detection and Effective Post-Verification for Large Scale Near-Duplicate Image Search.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2011

Common visual pattern discovery via graph matching.

[BibT_eX]

[DOI]

Proceedings of the 19th International Conference on Multimedia 2011, Scottsdale, AZ, USA, November 28, 2011

Pairwise weak geometric consistency for large scale image search.

[BibT_eX]

[DOI]

Proceedings of the 1st International Conference on Multimedia Retrieval, 2011

Local geometric consistency constraint for image retrieval.

[BibT_eX]

[DOI]

Proceedings of the 18th IEEE International Conference on Image Processing, 2011

2010

GPU-based fast scale invariant interest point detector.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2010

Effective and Efficient Image Copy Detection Based on GPU.

[BibT_eX]

[DOI]

Proceedings of the Trends and Topics in Computer Vision, 2010

2008

Stereo effect of image converted from planar.

[BibT_eX]

[DOI]

Inf. Sci., 2008

Hongtao Xie

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...