Hongtao Xie
Orcid: 0000-0002-6249-5315Affiliations:
- University of Science and Technology of China, Hefei, China
According to our database1,
Hongtao Xie authored at least 165 papers
between 2014 and 2026.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on orcid.org
On csauthors.net:
Bibliography
2026
IEEE Trans. Pattern Anal. Mach. Intell., May, 2026
Int. J. Mach. Learn. Cybern., May, 2026
CoRR, March, 2026
RegionRAG: Region-level Retrieval-Augmented Generation for Visual Document Understanding.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026
2025
ShowTable: Unlocking Creative Table Visualization with Collaborative Reflection and Refinement.
CoRR, December, 2025
CoRR, November, 2025
CoRR, October, 2025
Int. J. Mach. Learn. Cybern., August, 2025
Distilling Multi-Level Semantic Cues Across Multi-Modalities for Face Forgery Detection.
IEEE Trans. Circuits Syst. Video Technol., May, 2025
IEEE Trans. Circuits Syst. Video Technol., May, 2025
CoRR, May, 2025
IEEE Trans. Circuits Syst. Video Technol., April, 2025
IEEE Trans. Pattern Anal. Mach. Intell., April, 2025
Int. J. Mach. Learn. Cybern., March, 2025
Mask<sup>2</sup>DiT: Dual Mask-based Diffusion Transformer for Multi-Scene Long Video Generation.
CoRR, March, 2025
What Is a Good Caption? A Comprehensive Visual Caption Benchmark for Evaluating Both Correctness and Coverage of MLLMs.
CoRR, February, 2025
Exploiting Pre-Trained Language Models for Black-Box Attack against Knowledge Graph Embeddings.
ACM Trans. Knowl. Discov. Data, January, 2025
Leveraging Concise Concepts With Probabilistic Modeling for Interpretable Visual Recognition.
IEEE Trans. Multim., 2025
IEEE Trans. Multim., 2025
IEEE Trans. Image Process., 2025
IterMeme: Expert-Guided Multimodal LLM for Interactive Meme Creation with Layout-Aware Generation.
Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence, 2025
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025
CLIP-Adapted Region-to-Text Learning for Generative Open-Vocabulary Semantic Segmentation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025
Mask^2DiT: Dual Mask-based Diffusion Transformer for Multi-Scene Long Video Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025
Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025
PosterMaker: Towards High-Quality Product Poster Generation with Accurate Text Rendering.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025
IDseq: Decoupled and Sequentially Detecting and Grounding Multi-Modal Media Manipulation.
Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025
2024
STIDNet: Identity-Aware Face Forgery Detection With Spatiotemporal Knowledge Distillation.
IEEE Trans. Comput. Soc. Syst., August, 2024
Int. J. Comput. Vis., February, 2024
IEEE Trans. Multim., 2024
IEEE Trans. Multim., 2024
IEEE Trans. Multim., 2024
IEIRNet: Inconsistency Exploiting Based Identity Rectification for Face Forgery Detection.
IEEE Trans. Multim., 2024
Exploring Bi-Level Inconsistency via Blended Images for Generalizable Face Forgery Detection.
IEEE Trans. Inf. Forensics Secur., 2024
IEEE Trans. Circuits Syst. Video Technol., 2024
DCFP: Distribution Calibrated Filter Pruning for Lightweight and Accurate Long-Tail Semantic Segmentation.
IEEE Trans. Circuits Syst. Video Technol., 2024
Generalizable Speech Spoofing Detection Against Silence Trimming With Data Augmentation and Multi-Task Meta-Learning.
IEEE ACM Trans. Audio Speech Lang. Process., 2024
A Graph-Based Synthetic Data Pipeline for Scaling High-Quality Reasoning Instructions.
CoRR, 2024
CoRR, 2024
Choose What You Need: Disentangled Representation Learning for Scene Text Recognition, Removal and Editing.
CoRR, 2024
Control-Talker: A Rapid-Customization Talking Head Generation Method for Multi-Condition Control and High-Texture Enhancement.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024
Focus on the Whole Character: Discriminative Character Modeling for Scene Text Recognition.
Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024
Self-Supervised Pre-training with Symmetric Superimposition Modeling for Scene Text Recognition.
Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024
Leveraging Text Localization for Scene Text Removal via Text-Aware Masked Image Modeling.
Proceedings of the Computer Vision - ECCV 2024, 2024
Proceedings of the Computer Vision - ECCV 2024, 2024
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
Knowledge Context Modeling with Pre-trained Language Models for Contrastive Knowledge Graph Completion.
Proceedings of the Findings of the Association for Computational Linguistics, 2024
Towards Balanced Alignment: Modal-Enhanced Semantic Modeling for Video Moment Retrieval.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024
2023
Discriminative Feature Mining Based on Frequency Information and Metric Learning for Face Forgery Detection.
IEEE Trans. Knowl. Data Eng., December, 2023
ABINet++: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Spotting.
IEEE Trans. Pattern Anal. Mach. Intell., June, 2023
IEEE Trans. Circuits Syst. Video Technol., April, 2023
Multi-task hourglass network for online automatic diagnosis of developmental dysplasia of the hip.
World Wide Web (WWW), March, 2023
ACM Trans. Multim. Comput. Commun. Appl., February, 2023
IEEE Trans. Multim., 2023
IEEE Trans. Multim., 2023
What is the Real Need for Scene Text Removal? Exploring the Background Integrity and Erasure Exhaustivity Properties.
IEEE Trans. Image Process., 2023
IEEE Trans. Image Process., 2023
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023
Proceedings of the 31st ACM International Conference on Multimedia, 2023
Masked Text Modeling: A Self-Supervised Pre-training Method for Scene Text Detection.
Proceedings of the 31st ACM International Conference on Multimedia, 2023
Proceedings of the 31st ACM International Conference on Multimedia, 2023
Proceedings of the 31st ACM International Conference on Multimedia, 2023
Proceedings of the 31st ACM International Conference on Multimedia, 2023
Proceedings of the 31st ACM International Conference on Multimedia, 2023
TextPainter: Multimodal Text Image Generation with Visual-harmony and Text-comprehension for Poster Design.
Proceedings of the 31st ACM International Conference on Multimedia, 2023
Proceedings of the 31st ACM International Conference on Multimedia, 2023
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023
Linguistic More: Taking a Further Step toward Efficient and Accurate Scene Text Recognition.
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023
Proceedings of the IEEE International Conference on Multimedia and Expo, 2023
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023
2022
Multimodal Learning for Temporally Coherent Talking Face Generation With Articulator Synergy.
IEEE Trans. Multim., 2022
Boundary-Aware Arbitrary-Shaped Scene Text Detector With Learnable Embedding Network.
IEEE Trans. Multim., 2022
IEEE Trans. Multim., 2022
ACM Trans. Intell. Syst. Technol., 2022
PETR: Rethinking the Capability of Transformer-Based Language Model in Scene Text Recognition.
IEEE Trans. Image Process., 2022
IEEE Trans. Image Process., 2022
IEEE Trans. Circuits Syst. Video Technol., 2022
IEEE Trans. Circuits Syst. Video Technol., 2022
IEEE Signal Process. Lett., 2022
Bridging the Gap Between Vision Transformers and Convolutional Neural Networks on Small Datasets.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022
Boat in the Sky: Background Decoupling and Object-aware Pooling for Weakly Supervised Semantic Segmentation.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022
Proxy Probing Decoder for Weakly Supervised Object Localization: A Baseline Investigation.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022
Wavelet-enhanced Weakly Supervised Local Feature Learning for Face Forgery Detection.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022
Weakly Supervised Pediatric Bone Age Assessment Using Ultrasonic Images via Automatic Anatomical RoI Detection.
Proceedings of the ICMR '22: International Conference on Multimedia Retrieval, Newark, NJ, USA, June 27, 2022
Proceedings of the Computer Vision - ECCV 2022, 2022
Proceedings of the Computer Vision - ECCV 2022, 2022
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022
2021
IEEE Trans. Multim., 2021
IEEE Trans. Multim., 2021
IEEE Trans. Medical Imaging, 2021
Self-Supervised Attention Mechanism for Pediatric Bone Age Assessment With Efficient Weak Annotation.
IEEE Trans. Medical Imaging, 2021
Pattern Recognit., 2021
Hierarchical multi-view context modelling for 3D object classification and retrieval.
Inf. Sci., 2021
A Simple and Strong Baseline: Progressively Region-based Scene Text Removal Networks.
CoRR, 2021
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021
Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, 2021
Proceedings of the Image and Graphics - 11th International Conference, 2021
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021
Frequency-Aware Discriminative Feature Learning Supervised by Single-Center Loss for Face Forgery Detection.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021
Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021
2020
IEEE Trans. Neural Networks Learn. Syst., 2020
IEEE Trans. Multim., 2020
Misshapen Pelvis Landmark Detection With Local-Global Feature Learning for Diagnosing Developmental Dysplasia of the Hip.
IEEE Trans. Medical Imaging, 2020
IEEE Trans. Image Process., 2020
IEEE Trans. Image Process., 2020
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020
Law Is Order: Protecting Multimedia Network Transmission by Game Theory and Mechanism Design.
Proceedings of the MultiMedia Modeling - 26th International Conference, 2020
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020
Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2020, 2020
Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, 2020
Proceedings of the Pattern Recognition. ICPR International Workshops and Challenges, 2020
ContourNet: Taking a Further Step Toward Accurate Arbitrary-Shaped Scene Text Detection.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020
Filtration and Distillation: Enhancing Region Attention for Fine-Grained Visual Categorization.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020
2019
ACM Trans. Multim. Comput. Commun. Appl., 2019
IEEE Trans. Multim., 2019
Automated pulmonary nodule detection in CT images using deep convolutional neural networks.
Pattern Recognit., 2019
Proceedings of the MMAsia '19: ACM Multimedia Asia, Beijing, China, December 16-18, 2019, 2019
Proceedings of the MMAsia '19: ACM Multimedia Asia, Beijing, China, December 16-18, 2019, 2019
Proceedings of the 27th ACM International Conference on Multimedia, 2019
Misshapen Pelvis Landmark Detection by Spatial Local Correlation Mining for Diagnosing Developmental Dysplasia of the Hip.
Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2019, 2019
Extract Bone Parts Without Human Prior: End-to-end Convolutional Neural Network for Pediatric Bone Age Assessment.
Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2019, 2019
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019
Proceedings of the IEEE International Conference on Multimedia and Expo, 2019
Proceedings of the IEEE International Conference on Multimedia and Expo, 2019
Accurate Segmentation of Synaptic Cleft with Contour Growing Concatenated with a Convnet.
Proceedings of the 2019 IEEE International Conference on Image Processing, 2019
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019
2018
Supervised Hash Coding With Deep Neural Network for Environment Perception of Intelligent Vehicles.
IEEE Trans. Intell. Transp. Syst., 2018
Effective Uyghur Language Text Detection in Complex Background Images for Traffic Prompt Identification.
IEEE Trans. Intell. Transp. Syst., 2018
Neuroinformatics, 2018
Potential of Attention Mechanism for Classification of Optical Coherence Tomography Images.
Proceedings of the IEEE Visual Communications and Image Processing, 2018
Proceedings of the MultiMedia Modeling - 24th International Conference, 2018
Attention and Language Ensemble for Scene Text Recognition with Convolutional Sequence Modeling.
Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018
Proceedings of the Knowledge Science, Engineering and Management, 2018
2017
Mach. Vis. Appl., 2017
Mach. Vis. Appl., 2017
Detecting Uyghur text in complex background images with convolutional neural network.
Multim. Tools Appl., 2017
Proceedings of the MultiMedia Modeling - 23rd International Conference, 2017
Proceedings of the 2017 IEEE International Conference on Multimedia & Expo Workshops, 2017
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017
Proceedings of the 2017 IEEE Global Conference on Signal and Information Processing, 2017
2015
Frontiers Comput. Sci., 2015
Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, 2015
2014
J. Vis. Commun. Image Represent., 2014