Zhaoxin Fan

Orcid: 0000-0002-6324-1712

According to our database1, Zhaoxin Fan authored at least 87 papers between 2016 and 2026.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2026
Phys-EdiGAN: A privacy-preserving method for editing physiological signals in facial videos.
Pattern Recognit., 2026

Unveiling hidden vulnerabilities in digital human generation via adversarial attacks.
Pattern Recognit., 2026

2025
Never Compromise to Vulnerabilities: A Comprehensive Survey on AI Governance.
CoRR, August, 2025

Mem4D: Decoupling Static and Dynamic Memory for Dynamic Scene Reconstruction.
CoRR, August, 2025

Pose-RFT: Enhancing MLLMs for 3D Pose Generation via Hybrid Action Reinforcement Fine-Tuning.
CoRR, August, 2025

Undress to Redress: A Training-Free Framework for Virtual Try-On.
CoRR, August, 2025

MonoDream: Monocular Vision-Language Navigation with Panoramic Dreaming.
CoRR, August, 2025

MemOS: A Memory OS for AI System.
CoRR, July, 2025

SyncTalk++: High-Fidelity and Efficient Synchronized Talking Heads Synthesis Using Gaussian Splatting.
CoRR, June, 2025

RoboPARA: Dual-Arm Robot Planning with Parallel Allocation and Recomposition Across Tasks.
CoRR, June, 2025

DS-TTS: Zero-Shot Speaker Style Adaptation from Voice Clips via Dynamic Dual-Style Feature Modulation.
CoRR, June, 2025

BitHydra: Towards Bit-flip Inference Cost Attack against Large Language Models.
CoRR, May, 2025

AsynFusion: Towards Asynchronous Latent Consistency Models for Decoupled Whole-Body Audio-Driven Avatars.
CoRR, May, 2025

MatchDance: Collaborative Mamba-Transformer Architecture Matching for High-Quality 3D Dance Synthesis.
CoRR, May, 2025

TinyAlign: Boosting Lightweight Vision-Language Models by Mitigating Modal Alignment Bottlenecks.
CoRR, May, 2025

Black-box Adversaries from Latent Space: Unnoticeable Attacks on Human Pose and Shape Estimation.
CoRR, May, 2025

Aux-Think: Exploring Reasoning Strategies for Data-Efficient Vision-Language Navigation.
CoRR, May, 2025

AP-CAP: Advancing High-Quality Data Synthesis for Animal Pose Estimation via a Controllable Image Generation Pipeline.
CoRR, April, 2025

Unicorn: Text-Only Data Synthesis for Vision Language Model Training.
CoRR, March, 2025

STAMICS: Splat, Track And Map with Integrated Consistency and Semantics for Dense RGB-D SLAM.
CoRR, March, 2025

ExGes: Expressive Human Motion Retrieval and Modulation for Audio-Driven Gesture Synthesis.
CoRR, March, 2025

DH-RAG: A Dynamic Historical Context-Powered Retrieval-Augmented Generation Method for Multi-Turn Dialogue.
CoRR, February, 2025

VarGes: Improving Variation in Co-Speech 3D Gesture Generation via StyleCLIPS.
CoRR, February, 2025

TinyLLaVA-Video: A Simple Framework of Small-scale Large Multimodal Models for Video Understanding.
CoRR, January, 2025

ThicknessVAE: Learning a Lateral Prior for Clothed Human Body Reconstruction.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

JTD-UAV: MLLM-Enhanced Joint Tracking and Description Framework for Anti-UAV Systems.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

MambaVO: Deep Visual Odometry Based on Sequential Matching Refinement and Training Smoothing.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

DualTalk: Dual-Speaker Interaction for 3D Talking Head Conversations.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Idea23D: Collaborative LMM Agents Enable 3D Model Generation from Interleaved Multimodal Inputs.
Proceedings of the 31st International Conference on Computational Linguistics, 2025

MoC: Mixtures of Text Chunking Learners for Retrieval-Augmented Generation System.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024
MonoSIM: Simulating Learning Behaviors of Heterogeneous Point Cloud Object Detectors for Monocular 3-D Object Detection.
IEEE Trans. Instrum. Meas., 2024

A novel transformer autoencoder for multi-modal emotion recognition with incomplete data.
Neural Networks, 2024

EraseAnything: Enabling Concept Erasure in Rectified Flow Transformers.
CoRR, 2024

Dust to Tower: Coarse-to-Fine Photo-Realistic Scene Reconstruction from Sparse Uncalibrated Images.
CoRR, 2024

CoheDancers: Enhancing Interactive Group Dance Generation through Music-Driven Coherence Decomposition.
CoRR, 2024

Score and Distribution Matching Policy: Advanced Accelerated Visuomotor Policies via Matched Distillation.
CoRR, 2024

Moderating the Generalization of Score-based Generative Model.
CoRR, 2024

CARP: Visuomotor Policy Learning via Coarse-to-Fine Autoregressive Prediction.
CoRR, 2024

LaDTalk: Latent Denoising for Synthesizing Talking Head Videos with High Frequency Details.
CoRR, 2024

VGG-Tex: A Vivid Geometry-Guided Facial Texture Estimation Model for High Fidelity Monocular 3D Face Reconstruction.
CoRR, 2024

Meta-Learning Empowered Meta-Face: Personalized Speaking Style Adaptation for Audio-Driven 3D Talking Face Animation.
CoRR, 2024

GLDiTalker: Speech-Driven 3D Facial Animation with Graph Latent Diffusion Transformer.
CoRR, 2024

MLPHand: Real Time Multi-View 3D Hand Mesh Reconstruction via MLP Modeling.
CoRR, 2024

A Comprehensive Taxonomy and Analysis of Talking Head Synthesis: Techniques for Portrait Generation, Driving Mechanisms, and Editing.
CoRR, 2024

Idea-2-3D: Collaborative LMM Agents Enable 3D Model Generation from Interleaved Multimodal Inputs.
CoRR, 2024

Ultraman: Single Image 3D Human Reconstruction with Ultra Speed and Detail.
CoRR, 2024

AS-FIBA: Adaptive Selective Frequency-Injection for Backdoor Attack on Deep Face Restoration.
CoRR, 2024

Enhancing Weakly Supervised 3D Medical Image Segmentation through Probabilistic-aware Learning.
CoRR, 2024

Multi-dimensional Fusion and Consistency for Semi-supervised Medical Image Segmentation.
Proceedings of the MultiMedia Modeling - 30th International Conference, 2024

STDG: Semi-Teacher-Student Training Paradigm for Depth-guided One-stage Scene Graph Generation.
Proceedings of the 2024 International Conference on Multimedia Retrieval, 2024

BeatDance: A Beat-Based Model-Agnostic Contrastive Learning Framework for Music-Dance Retrieval.
Proceedings of the 2024 International Conference on Multimedia Retrieval, 2024

CoDancers: Music-Driven Coherent Group Dance Generation with Choreographic Unit.
Proceedings of the 2024 International Conference on Multimedia Retrieval, 2024

ACR-Pose: Adversarial Canonical Representation Reconstruction Network for Category Level 6D Object Pose Estimation.
Proceedings of the 2024 International Conference on Multimedia Retrieval, 2024

PoseRec: 3D Human Pose Driven Online Advertisement Recommendation for Micro-videos.
Proceedings of the 2024 International Conference on Multimedia Retrieval, 2024

ESTGN: Enhanced Self-Mined Text Guided Super-Resolution Network for Superior Image Super Resolution.
Proceedings of the IEEE International Conference on Acoustics, 2024

MLPHand: Real Time Multi-view 3D Hand Reconstruction via MLP Modeling.
Proceedings of the Computer Vision - ECCV 2024, 2024

SyncTalk: The Devil is in the Synchronization for Talking Head Synthesis.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Everything2Motion: Synchronizing Diverse Inputs via a Unified Framework for Human Motion Synthesis.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Deep semantic-aware remote sensing image deblurring.
Signal Process., October, 2023

Deep Learning on Monocular Object Pose Detection and Tracking: A Comprehensive Overview.
ACM Comput. Surv., 2023

STDG: Semi-Teacher-Student Training Paradigram for Depth-guided One-stage Scene Graph Generation.
CoRR, 2023

Benchmarking Ultra-High-Definition Image Reflection Removal.
CoRR, 2023

DenseMP: Unsupervised Dense Pre-training for Few-shot Medical Image Segmentation.
CoRR, 2023

SelfTalk: A Self-Supervised Commutative Training Diagram to Comprehend 3D Talking Faces.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Reconstruction-Aware Prior Distillation for Semi-supervised Point Cloud Completion.
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

GIDP: Learning a Good Initialization and Inducing Descriptor Post-enhancing for Large-scale Place Recognition.
Proceedings of the IEEE International Conference on Robotics and Automation, 2023

D-IF: Uncertainty-aware Human Digitization via Implicit Distribution Field.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

EmoTalk: Speech-Driven Emotional Disentanglement for 3D Face Animation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Robust Single Image Reflection Removal Against Adversarial Attacks.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
SHLE: Devices Tracking and Depth Filtering for Stereo-based Height Limit Estimation.
CoRR, 2022

FuRPE: Learning Full-body Reconstruction from Part Experts.
CoRR, 2022

Human Pose Driven Object Effects Recommendation.
CoRR, 2022

MonoPCNS: Monocular 3D Object Detection via Point Cloud Network Simulation.
CoRR, 2022

PilotAttnNet: Multi-modal Attention Network for End-to-End Steering Control.
Proceedings of the Pattern Recognition and Computer Vision - 5th Chinese Conference, 2022

Unsupervised Multi-Task Learning for 3D Subtomogram Image Alignment, Clustering and Segmentation.
Proceedings of the 2022 IEEE International Conference on Image Processing, 2022

RPR-Net: A Point Cloud-Based Rotation-Aware Large Scale Place Recognition Network.
Proceedings of the Computer Vision - ECCV 2022 Workshops, 2022

Object Level Depth Reconstruction for Category Level 6D Object Pose Estimation from Monocular RGB Image.
Proceedings of the Computer Vision - ECCV 2022, 2022

SVT-Net: Super Light-Weight Sparse Voxel Transformer for Large Scale Place Recognition.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
ACR-Pose: Adversarial Canonical Representation Reconstruction Network for Category Level 6D Object Pose Estimation.
CoRR, 2021

Attentive Rotation Invariant Convolution for Point Cloud-based Large Scale Place Recognition.
CoRR, 2021

SVT-Net: A Super Light-Weight Network for Large Scale Place Recognition using Sparse Voxel Transformers.
CoRR, 2021

MPDNet: A 3D Missing Part Detection Network Based on Point Cloud Segmentation.
Proceedings of the IEEE International Conference on Acoustics, 2021

2020
A Graph-based One-Shot Learning Method for Point Cloud Recognition.
Comput. Graph. Forum, 2020

SRNet: A 3D Scene Recognition Network using Static Graph and Dense Semantic Fusion.
Comput. Graph. Forum, 2020

DAGC: Employing Dual Attention and Graph Convolution for Point Cloud based Place Recognition.
Proceedings of the 2020 on International Conference on Multimedia Retrieval, 2020

PointFPN: A Frustum-based Feature Pyramid Network for 3D Object Detection.
Proceedings of the 32nd IEEE International Conference on Tools with Artificial Intelligence, 2020

2016
A Text Clustering Approach of Chinese News Based on Neural Network Language Model.
Int. J. Parallel Program., 2016


  Loading...