Zhaoxin Fan

Orcid: 0000-0002-6324-1712

According to our database¹, Zhaoxin Fan authored at least 131 papers between 2016 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

A Survey of Talking Head Synthesis Techniques: Portrait Generation, Driving Mechanisms, and Editing.

[BibT_eX]

[DOI]

ACM Comput. Surv., May, 2026

CUBic: Coordinated Unified Bimanual Perception and Control Framework.

[BibT_eX]

[DOI]

CoRR, May, 2026

State Beyond Appearance: Diagnosing and Improving State Consistency in Dial-Based Measurement Reading.

[BibT_eX]

[DOI]

CoRR, April, 2026

HalluSAE: Detecting Hallucinations in Large Language Models via Sparse Auto-Encoders.

[BibT_eX]

[DOI]

CoRR, April, 2026

NTIRE 2026 The Second Challenge on Day and Night Raindrop Removal for Dual-Focused Images: Methods and Results.

[BibT_eX]

[DOI]

Paula Garrido Mellado

CoRR, April, 2026

Mosaic: Multimodal Jailbreak against Closed-Source VLMs via Multi-View Ensemble Optimization.

[BibT_eX]

[DOI]

CoRR, April, 2026

HVG-3D: Bridging Real and Simulation Domains for 3D-Conditional Hand-Object Interaction Video Synthesis.

[BibT_eX]

[DOI]

CoRR, April, 2026

Ultraman: ultra-fast and high-resolution texture generation for 3D human reconstruction from a single image.

[BibT_eX]

[DOI]

Mach. Vis. Appl., March, 2026

Z-Erase: Enabling Concept Erasure in Single-Stream Diffusion Transformers.

[BibT_eX]

[DOI]

CoRR, March, 2026

2K Retrofit: Entropy-Guided Efficient Sparse Refinement for High-Resolution 3D Geometry Prediction.

[BibT_eX]

[DOI]

CoRR, March, 2026

VAMPO: Policy Optimization for Improving Visual Dynamics in Video Action Models.

[BibT_eX]

[DOI]

CoRR, March, 2026

Lyapunov Probes for Hallucination Detection in Large Foundation Models.

[BibT_eX]

[DOI]

CoRR, March, 2026

EraseAnything++: Enabling Concept Erasure in Rectified Flow Transformers Leveraging Multi-Object Optimization.

[BibT_eX]

[DOI]

CoRR, March, 2026

GRPCI: Harnessing Temporal-Spatial Dynamics for Graph Representation Learning.

[BibT_eX]

[DOI]

IEEE Trans. Knowl. Data Eng., February, 2026

MapDream: Task-Driven Map Learning for Vision-Language Navigation.

[BibT_eX]

[DOI]

CoRR, February, 2026

GPO: Growing Policy Optimization for Legged Robot Locomotion and Whole-Body Control.

[BibT_eX]

[DOI]

CoRR, January, 2026

Inside Out: Evolving User-Centric Core Memory Trees for Long-Term Personalized Dialogue Systems.

[BibT_eX]

[DOI]

CoRR, January, 2026

DeepSynth-Eval: Objectively Evaluating Information Consolidation in Deep Survey Writing.

[BibT_eX]

[DOI]

CoRR, January, 2026

Phys-EdiGAN: A privacy-preserving method for editing physiological signals in facial videos.

[BibT_eX]

[DOI]

Pattern Recognit., 2026

Jailbreak attack with multimodal virtual scenario hypnosis for vision-language models.

[BibT_eX]

[DOI]

Pattern Recognit., 2026

Understanding the adversarial robustness of deep learning-based single-pixel imaging.

[BibT_eX]

[DOI]

Pattern Recognit., 2026

Unveiling hidden vulnerabilities in digital human generation via adversarial attacks.

[BibT_eX]

[DOI]

Pattern Recognit., 2026

Enhancing weakly supervised 3D medical image segmentation through probabilistic-aware learning.

[BibT_eX]

[DOI]

Pattern Recognit., 2026

Segment and pick any fruit: Text-prompted robotic harvesting.

[BibT_eX]

[DOI]

Pattern Recognit., 2026

R-FGDepth: Towards foundation models for recurrent depth learning with frequency-Guided initialization and refinement.

[BibT_eX]

[DOI]

Zhaoxin Fan

Gen Li

Zhongkai Zhou

Pattern Recognit., 2026

CoSE: connectivity-oriented semantic enhancement for mitigating hallucinations in multimodal LLMs.

[BibT_eX]

[DOI]

Inf. Fusion, 2026

Entropy-optimized contrastive decoding for hallucination suppression in vision-language-action models.

[BibT_eX]

[DOI]

Neurocomputing, 2026

Hoodie: Hierarchical point cloud and latent code diffusion for joint and conditional generation.

[BibT_eX]

[DOI]

Neurocomputing, 2026

DSSmoothing: Toward Certified Dataset Ownership Verification for Pre-trained Language Models via Dual-Space Smoothing.

[BibT_eX]

[DOI]

Proceedings of the ACM Web Conference 2026, 2026

Beyond Over-Editing: Important Weight Constrained Knowledge Editing in Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the 2026 International Conference on Multimedia Retrieval, 2026

PSRNet: Progressive Semantic Refinement for Human Parsing via Text Conditioning and Embedding-Based Calibration.

[BibT_eX]

[DOI]

Proceedings of the 2026 International Conference on Multimedia Retrieval, 2026

MonoDream: Monocular Vision-Language Navigation with Panoramic Dreaming.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

Mem4D: Decoupling Static and Dynamic Memory for Dynamic Scene Reconstruction.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025

ActAvatar: Temporally-Aware Precise Action Control for Talking Avatars.

[BibT_eX]

[DOI]

CoRR, December, 2025

Progress-Think: Semantic Progress Reasoning for Vision-Language Navigation.

[BibT_eX]

[DOI]

CoRR, November, 2025

The Achilles' Heel of LLMs: How Altering a Handful of Neurons Can Cripple Language Abilities.

[BibT_eX]

[DOI]

CoRR, October, 2025

When LLM Agents Meet Graph Optimization: An Automated Data Quality Improvement Approach.

[BibT_eX]

[DOI]

CoRR, October, 2025

Revoking Amnesia: RL-based Trajectory Optimization to Resurrect Erased Concepts in Diffusion Models.

[BibT_eX]

[DOI]

CoRR, October, 2025

Erased, But Not Forgotten: Erased Rectified Flow Transformers Still Remain Unsafe Under Concept Attack.

[BibT_eX]

[DOI]

CoRR, October, 2025

EMG-UP: Unsupervised Personalization in Cross-User EMG Gesture Recognition.

[BibT_eX]

[DOI]

CoRR, September, 2025

Long-VLA: Unleashing Long-Horizon Capability of Vision Language Action Model for Robot Manipulation.

[BibT_eX]

[DOI]

CoRR, August, 2025

Can Structured Templates Facilitate LLMs in Tackling Harder Tasks? : An Exploration of Scaling Laws by Difficulty.

[BibT_eX]

[DOI]

CoRR, August, 2025

HieroAction: Hierarchically Guided VLM for Fine-Grained Action Analysis.

[BibT_eX]

[DOI]

CoRR, August, 2025

Never Compromise to Vulnerabilities: A Comprehensive Survey on AI Governance.

[BibT_eX]

[DOI]

CoRR, August, 2025

Pose-RFT: Enhancing MLLMs for 3D Pose Generation via Hybrid Action Reinforcement Fine-Tuning.

[BibT_eX]

[DOI]

CoRR, August, 2025

Undress to Redress: A Training-Free Framework for Virtual Try-On.

[BibT_eX]

[DOI]

CoRR, August, 2025

MemOS: A Memory OS for AI System.

[BibT_eX]

[DOI]

CoRR, July, 2025

SyncTalk++: High-Fidelity and Efficient Synchronized Talking Heads Synthesis Using Gaussian Splatting.

[BibT_eX]

[DOI]

CoRR, June, 2025

RoboPARA: Dual-Arm Robot Planning with Parallel Allocation and Recomposition Across Tasks.

[BibT_eX]

[DOI]

CoRR, June, 2025

DS-TTS: Zero-Shot Speaker Style Adaptation from Voice Clips via Dynamic Dual-Style Feature Modulation.

[BibT_eX]

[DOI]

CoRR, June, 2025

BitHydra: Towards Bit-flip Inference Cost Attack against Large Language Models.

[BibT_eX]

[DOI]

CoRR, May, 2025

MatchDance: Collaborative Mamba-Transformer Architecture Matching for High-Quality 3D Dance Synthesis.

[BibT_eX]

[DOI]

CoRR, May, 2025

TinyAlign: Boosting Lightweight Vision-Language Models by Mitigating Modal Alignment Bottlenecks.

[BibT_eX]

[DOI]

CoRR, May, 2025

Black-box Adversaries from Latent Space: Unnoticeable Attacks on Human Pose and Shape Estimation.

[BibT_eX]

[DOI]

CoRR, May, 2025

Aux-Think: Exploring Reasoning Strategies for Data-Efficient Vision-Language Navigation.

[BibT_eX]

[DOI]

CoRR, May, 2025

AP-CAP: Advancing High-Quality Data Synthesis for Animal Pose Estimation via a Controllable Image Generation Pipeline.

[BibT_eX]

[DOI]

CoRR, April, 2025

Unicorn: Text-Only Data Synthesis for Vision Language Model Training.

[BibT_eX]

[DOI]

CoRR, March, 2025

STAMICS: Splat, Track And Map with Integrated Consistency and Semantics for Dense RGB-D SLAM.

[BibT_eX]

[DOI]

CoRR, March, 2025

ExGes: Expressive Human Motion Retrieval and Modulation for Audio-Driven Gesture Synthesis.

[BibT_eX]

[DOI]

CoRR, March, 2025

DH-RAG: A Dynamic Historical Context-Powered Retrieval-Augmented Generation Method for Multi-Turn Dialogue.

[BibT_eX]

[DOI]

CoRR, February, 2025

TinyLLaVA-Video: A Simple Framework of Small-scale Large Multimodal Models for Video Understanding.

[BibT_eX]

[DOI]

CoRR, January, 2025

VarGes: Improving Variation in Co-Speech 3D Gesture Generation via StyleCLIPS.

[BibT_eX]

[DOI]

Comput. Vis. Media, 2025

AsynFusion: Towards Asynchronous Latent Consistency Models for Decoupled Whole-Body Audio-Driven Avatars.

[BibT_eX]

[DOI]

Proceedings of the Pattern Recognition and Computer Vision - 8th Chinese Conference, 2025

CoheDancers: Enhancing Interactive Group Dance Generation through Music-Driven Coherence Decomposition.

[BibT_eX]

[DOI]

Proceedings of the 33rd ACM International Conference on Multimedia, 2025

Flexible Multi-view Clustering with Dynamic Views Generation.

[BibT_eX]

[DOI]

Proceedings of the 33rd ACM International Conference on Multimedia, 2025

GLDiTalker: Speech-Driven 3D Facial Animation with Graph Latent Diffusion Transformer.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence, 2025

EraseAnything: Enabling Concept Erasure in Rectified Flow Transformers.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Meta-Learning Empowered Meta-Face: Personalized Speaking Style Adaptation for Audio-Driven 3D Talking Face Animation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Multimedia and Expo, 2025

Twin Progressive Generative Adversarial Network For High-Resolution Image Inpainting.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Multimedia and Expo, 2025

Moderating the Generalization of Score-Based Generative Model.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

CARP: Visuomotor Policy Learning via Coarse-to-Fine Autoregressive Prediction.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

ThicknessVAE: Learning a Lateral Prior for Clothed Human Body Reconstruction.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

JTD-UAV: MLLM-Enhanced Joint Tracking and Description Framework for Anti-UAV Systems.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

MambaVO: Deep Visual Odometry Based on Sequential Matching Refinement and Training Smoothing.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

DualTalk: Dual-Speaker Interaction for 3D Talking Head Conversations.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Idea23D: Collaborative LMM Agents Enable 3D Model Generation from Interleaved Multimodal Inputs.

[BibT_eX]

[DOI]

Proceedings of the 31st International Conference on Computational Linguistics, 2025

MoC: Mixtures of Text Chunking Learners for Retrieval-Augmented Generation System.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024

MonoSIM: Simulating Learning Behaviors of Heterogeneous Point Cloud Object Detectors for Monocular 3-D Object Detection.

[BibT_eX]

[DOI]

IEEE Trans. Instrum. Meas., 2024

A novel transformer autoencoder for multi-modal emotion recognition with incomplete data.

[BibT_eX]

[DOI]

Neural Networks, 2024

EraseAnything: Enabling Concept Erasure in Rectified Flow Transformers.

[BibT_eX]

[DOI]

CoRR, 2024

Dust to Tower: Coarse-to-Fine Photo-Realistic Scene Reconstruction from Sparse Uncalibrated Images.

[BibT_eX]

[DOI]

CoRR, 2024

CoheDancers: Enhancing Interactive Group Dance Generation through Music-Driven Coherence Decomposition.

[BibT_eX]

[DOI]

CoRR, 2024

Score and Distribution Matching Policy: Advanced Accelerated Visuomotor Policies via Matched Distillation.

[BibT_eX]

[DOI]

CoRR, 2024

LaDTalk: Latent Denoising for Synthesizing Talking Head Videos with High Frequency Details.

[BibT_eX]

[DOI]

CoRR, 2024

VGG-Tex: A Vivid Geometry-Guided Facial Texture Estimation Model for High Fidelity Monocular 3D Face Reconstruction.

[BibT_eX]

[DOI]

CoRR, 2024

Meta-Learning Empowered Meta-Face: Personalized Speaking Style Adaptation for Audio-Driven 3D Talking Face Animation.

[BibT_eX]

[DOI]

CoRR, 2024

MLPHand: Real Time Multi-View 3D Hand Mesh Reconstruction via MLP Modeling.

[BibT_eX]

[DOI]

CoRR, 2024

A Comprehensive Taxonomy and Analysis of Talking Head Synthesis: Techniques for Portrait Generation, Driving Mechanisms, and Editing.

[BibT_eX]

[DOI]

CoRR, 2024

Idea-2-3D: Collaborative LMM Agents Enable 3D Model Generation from Interleaved Multimodal Inputs.

[BibT_eX]

[DOI]

CoRR, 2024

Ultraman: Single Image 3D Human Reconstruction with Ultra Speed and Detail.

[BibT_eX]

[DOI]

CoRR, 2024

AS-FIBA: Adaptive Selective Frequency-Injection for Backdoor Attack on Deep Face Restoration.

[BibT_eX]

[DOI]

CoRR, 2024

Enhancing Weakly Supervised 3D Medical Image Segmentation through Probabilistic-aware Learning.

[BibT_eX]

[DOI]

CoRR, 2024

Multi-dimensional Fusion and Consistency for Semi-supervised Medical Image Segmentation.

[BibT_eX]

[DOI]

Yixing Lu

Zhaoxin Fan

Min Xu

Proceedings of the MultiMedia Modeling - 30th International Conference, 2024

STDG: Semi-Teacher-Student Training Paradigm for Depth-guided One-stage Scene Graph Generation.

[BibT_eX]

[DOI]

Proceedings of the 2024 International Conference on Multimedia Retrieval, 2024

BeatDance: A Beat-Based Model-Agnostic Contrastive Learning Framework for Music-Dance Retrieval.

[BibT_eX]

[DOI]

Proceedings of the 2024 International Conference on Multimedia Retrieval, 2024

CoDancers: Music-Driven Coherent Group Dance Generation with Choreographic Unit.

[BibT_eX]

[DOI]

Proceedings of the 2024 International Conference on Multimedia Retrieval, 2024

ACR-Pose: Adversarial Canonical Representation Reconstruction Network for Category Level 6D Object Pose Estimation.

[BibT_eX]

[DOI]

Proceedings of the 2024 International Conference on Multimedia Retrieval, 2024

PoseRec: 3D Human Pose Driven Online Advertisement Recommendation for Micro-videos.

[BibT_eX]

[DOI]

Proceedings of the 2024 International Conference on Multimedia Retrieval, 2024

ESTGN: Enhanced Self-Mined Text Guided Super-Resolution Network for Superior Image Super Resolution.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

MLPHand: Real Time Multi-view 3D Hand Reconstruction via MLP Modeling.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

SyncTalk: The Devil is in the Synchronization for Talking Head Synthesis.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Everything2Motion: Synchronizing Diverse Inputs via a Unified Framework for Human Motion Synthesis.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

Deep semantic-aware remote sensing image deblurring.

[BibT_eX]

[DOI]

Signal Process., October, 2023

Deep Learning on Monocular Object Pose Detection and Tracking: A Comprehensive Overview.

[BibT_eX]

[DOI]

ACM Comput. Surv., 2023

STDG: Semi-Teacher-Student Training Paradigram for Depth-guided One-stage Scene Graph Generation.

[BibT_eX]

[DOI]

CoRR, 2023

Benchmarking Ultra-High-Definition Image Reflection Removal.

[BibT_eX]

[DOI]

CoRR, 2023

DenseMP: Unsupervised Dense Pre-training for Few-shot Medical Image Segmentation.

[BibT_eX]

[DOI]

CoRR, 2023

SelfTalk: A Self-Supervised Commutative Training Diagram to Comprehend 3D Talking Faces.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

Reconstruction-Aware Prior Distillation for Semi-supervised Point Cloud Completion.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

GIDP: Learning a Good Initialization and Inducing Descriptor Post-enhancing for Large-scale Place Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Robotics and Automation, 2023

D-IF: Uncertainty-aware Human Digitization via Implicit Distribution Field.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

EmoTalk: Speech-Driven Emotional Disentanglement for 3D Face Animation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Robust Single Image Reflection Removal Against Adversarial Attacks.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

SHLE: Devices Tracking and Depth Filtering for Stereo-based Height Limit Estimation.

[BibT_eX]

[DOI]

CoRR, 2022

FuRPE: Learning Full-body Reconstruction from Part Experts.

[BibT_eX]

[DOI]

CoRR, 2022

Human Pose Driven Object Effects Recommendation.

[BibT_eX]

[DOI]

CoRR, 2022

MonoPCNS: Monocular 3D Object Detection via Point Cloud Network Simulation.

[BibT_eX]

[DOI]

CoRR, 2022

PilotAttnNet: Multi-modal Attention Network for End-to-End Steering Control.

[BibT_eX]

[DOI]

Proceedings of the Pattern Recognition and Computer Vision - 5th Chinese Conference, 2022

Unsupervised Multi-Task Learning for 3D Subtomogram Image Alignment, Clustering and Segmentation.

[BibT_eX]

[DOI]

Proceedings of the 2022 IEEE International Conference on Image Processing, 2022

RPR-Net: A Point Cloud-Based Rotation-Aware Large Scale Place Recognition Network.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022 Workshops, 2022

Object Level Depth Reconstruction for Category Level 6D Object Pose Estimation from Monocular RGB Image.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

SVT-Net: Super Light-Weight Sparse Voxel Transformer for Large Scale Place Recognition.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021

ACR-Pose: Adversarial Canonical Representation Reconstruction Network for Category Level 6D Object Pose Estimation.

[BibT_eX]

[DOI]

CoRR, 2021

Attentive Rotation Invariant Convolution for Point Cloud-based Large Scale Place Recognition.

[BibT_eX]

[DOI]

CoRR, 2021

SVT-Net: A Super Light-Weight Network for Large Scale Place Recognition using Sparse Voxel Transformers.

[BibT_eX]

[DOI]

CoRR, 2021

MPDNet: A 3D Missing Part Detection Network Based on Point Cloud Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

2020

A Graph-based One-Shot Learning Method for Point Cloud Recognition.

[BibT_eX]

[DOI]

Comput. Graph. Forum, 2020

SRNet: A 3D Scene Recognition Network using Static Graph and Dense Semantic Fusion.

[BibT_eX]

[DOI]

Comput. Graph. Forum, 2020

DAGC: Employing Dual Attention and Graph Convolution for Point Cloud based Place Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2020 on International Conference on Multimedia Retrieval, 2020

PointFPN: A Frustum-based Feature Pyramid Network for 3D Object Detection.

[BibT_eX]

[DOI]

Proceedings of the 32nd IEEE International Conference on Tools with Artificial Intelligence, 2020

2016

A Text Clustering Approach of Chinese News Based on Neural Network Language Model.

[BibT_eX]

[DOI]

Int. J. Parallel Program., 2016

Zhaoxin Fan

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...