Zhaoxin Fan
Orcid: 0000-0002-6324-1712
According to our database1,
Zhaoxin Fan authored at least 131 papers
between 2016 and 2026.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
2026
A Survey of Talking Head Synthesis Techniques: Portrait Generation, Driving Mechanisms, and Editing.
ACM Comput. Surv., May, 2026
CoRR, May, 2026
State Beyond Appearance: Diagnosing and Improving State Consistency in Dial-Based Measurement Reading.
CoRR, April, 2026
HalluSAE: Detecting Hallucinations in Large Language Models via Sparse Auto-Encoders.
CoRR, April, 2026
NTIRE 2026 The Second Challenge on Day and Night Raindrop Removal for Dual-Focused Images: Methods and Results.
CoRR, April, 2026
Mosaic: Multimodal Jailbreak against Closed-Source VLMs via Multi-View Ensemble Optimization.
CoRR, April, 2026
HVG-3D: Bridging Real and Simulation Domains for 3D-Conditional Hand-Object Interaction Video Synthesis.
CoRR, April, 2026
Ultraman: ultra-fast and high-resolution texture generation for 3D human reconstruction from a single image.
Mach. Vis. Appl., March, 2026
CoRR, March, 2026
2K Retrofit: Entropy-Guided Efficient Sparse Refinement for High-Resolution 3D Geometry Prediction.
CoRR, March, 2026
CoRR, March, 2026
CoRR, March, 2026
EraseAnything++: Enabling Concept Erasure in Rectified Flow Transformers Leveraging Multi-Object Optimization.
CoRR, March, 2026
IEEE Trans. Knowl. Data Eng., February, 2026
CoRR, February, 2026
CoRR, January, 2026
Inside Out: Evolving User-Centric Core Memory Trees for Long-Term Personalized Dialogue Systems.
CoRR, January, 2026
DeepSynth-Eval: Objectively Evaluating Information Consolidation in Deep Survey Writing.
CoRR, January, 2026
Phys-EdiGAN: A privacy-preserving method for editing physiological signals in facial videos.
Pattern Recognit., 2026
Jailbreak attack with multimodal virtual scenario hypnosis for vision-language models.
Pattern Recognit., 2026
Understanding the adversarial robustness of deep learning-based single-pixel imaging.
Pattern Recognit., 2026
Unveiling hidden vulnerabilities in digital human generation via adversarial attacks.
Pattern Recognit., 2026
Enhancing weakly supervised 3D medical image segmentation through probabilistic-aware learning.
Pattern Recognit., 2026
Pattern Recognit., 2026
R-FGDepth: Towards foundation models for recurrent depth learning with frequency-Guided initialization and refinement.
Pattern Recognit., 2026
CoSE: connectivity-oriented semantic enhancement for mitigating hallucinations in multimodal LLMs.
Inf. Fusion, 2026
Entropy-optimized contrastive decoding for hallucination suppression in vision-language-action models.
Neurocomputing, 2026
Hoodie: Hierarchical point cloud and latent code diffusion for joint and conditional generation.
Neurocomputing, 2026
DSSmoothing: Toward Certified Dataset Ownership Verification for Pre-trained Language Models via Dual-Space Smoothing.
Proceedings of the ACM Web Conference 2026, 2026
Beyond Over-Editing: Important Weight Constrained Knowledge Editing in Large Language Models.
Proceedings of the 2026 International Conference on Multimedia Retrieval, 2026
PSRNet: Progressive Semantic Refinement for Human Parsing via Text Conditioning and Embedding-Based Calibration.
Proceedings of the 2026 International Conference on Multimedia Retrieval, 2026
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026
2025
CoRR, December, 2025
CoRR, November, 2025
The Achilles' Heel of LLMs: How Altering a Handful of Neurons Can Cripple Language Abilities.
CoRR, October, 2025
When LLM Agents Meet Graph Optimization: An Automated Data Quality Improvement Approach.
CoRR, October, 2025
Revoking Amnesia: RL-based Trajectory Optimization to Resurrect Erased Concepts in Diffusion Models.
CoRR, October, 2025
Erased, But Not Forgotten: Erased Rectified Flow Transformers Still Remain Unsafe Under Concept Attack.
CoRR, October, 2025
CoRR, September, 2025
Long-VLA: Unleashing Long-Horizon Capability of Vision Language Action Model for Robot Manipulation.
CoRR, August, 2025
Can Structured Templates Facilitate LLMs in Tackling Harder Tasks? : An Exploration of Scaling Laws by Difficulty.
CoRR, August, 2025
CoRR, August, 2025
CoRR, August, 2025
Pose-RFT: Enhancing MLLMs for 3D Pose Generation via Hybrid Action Reinforcement Fine-Tuning.
CoRR, August, 2025
SyncTalk++: High-Fidelity and Efficient Synchronized Talking Heads Synthesis Using Gaussian Splatting.
CoRR, June, 2025
RoboPARA: Dual-Arm Robot Planning with Parallel Allocation and Recomposition Across Tasks.
CoRR, June, 2025
DS-TTS: Zero-Shot Speaker Style Adaptation from Voice Clips via Dynamic Dual-Style Feature Modulation.
CoRR, June, 2025
CoRR, May, 2025
MatchDance: Collaborative Mamba-Transformer Architecture Matching for High-Quality 3D Dance Synthesis.
CoRR, May, 2025
TinyAlign: Boosting Lightweight Vision-Language Models by Mitigating Modal Alignment Bottlenecks.
CoRR, May, 2025
Black-box Adversaries from Latent Space: Unnoticeable Attacks on Human Pose and Shape Estimation.
CoRR, May, 2025
Aux-Think: Exploring Reasoning Strategies for Data-Efficient Vision-Language Navigation.
CoRR, May, 2025
AP-CAP: Advancing High-Quality Data Synthesis for Animal Pose Estimation via a Controllable Image Generation Pipeline.
CoRR, April, 2025
CoRR, March, 2025
STAMICS: Splat, Track And Map with Integrated Consistency and Semantics for Dense RGB-D SLAM.
CoRR, March, 2025
ExGes: Expressive Human Motion Retrieval and Modulation for Audio-Driven Gesture Synthesis.
CoRR, March, 2025
DH-RAG: A Dynamic Historical Context-Powered Retrieval-Augmented Generation Method for Multi-Turn Dialogue.
CoRR, February, 2025
TinyLLaVA-Video: A Simple Framework of Small-scale Large Multimodal Models for Video Understanding.
CoRR, January, 2025
Comput. Vis. Media, 2025
AsynFusion: Towards Asynchronous Latent Consistency Models for Decoupled Whole-Body Audio-Driven Avatars.
Proceedings of the Pattern Recognition and Computer Vision - 8th Chinese Conference, 2025
CoheDancers: Enhancing Interactive Group Dance Generation through Music-Driven Coherence Decomposition.
Proceedings of the 33rd ACM International Conference on Multimedia, 2025
Proceedings of the 33rd ACM International Conference on Multimedia, 2025
GLDiTalker: Speech-Driven 3D Facial Animation with Graph Latent Diffusion Transformer.
Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence, 2025
Proceedings of the Forty-second International Conference on Machine Learning, 2025
Meta-Learning Empowered Meta-Face: Personalized Speaking Style Adaptation for Audio-Driven 3D Talking Face Animation.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2025
Twin Progressive Generative Adversarial Network For High-Resolution Image Inpainting.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2025
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025
JTD-UAV: MLLM-Enhanced Joint Tracking and Description Framework for Anti-UAV Systems.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025
MambaVO: Deep Visual Odometry Based on Sequential Matching Refinement and Training Smoothing.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025
Idea23D: Collaborative LMM Agents Enable 3D Model Generation from Interleaved Multimodal Inputs.
Proceedings of the 31st International Conference on Computational Linguistics, 2025
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025
2024
MonoSIM: Simulating Learning Behaviors of Heterogeneous Point Cloud Object Detectors for Monocular 3-D Object Detection.
IEEE Trans. Instrum. Meas., 2024
A novel transformer autoencoder for multi-modal emotion recognition with incomplete data.
Neural Networks, 2024
Dust to Tower: Coarse-to-Fine Photo-Realistic Scene Reconstruction from Sparse Uncalibrated Images.
CoRR, 2024
CoheDancers: Enhancing Interactive Group Dance Generation through Music-Driven Coherence Decomposition.
CoRR, 2024
Score and Distribution Matching Policy: Advanced Accelerated Visuomotor Policies via Matched Distillation.
CoRR, 2024
LaDTalk: Latent Denoising for Synthesizing Talking Head Videos with High Frequency Details.
CoRR, 2024
VGG-Tex: A Vivid Geometry-Guided Facial Texture Estimation Model for High Fidelity Monocular 3D Face Reconstruction.
CoRR, 2024
Meta-Learning Empowered Meta-Face: Personalized Speaking Style Adaptation for Audio-Driven 3D Talking Face Animation.
CoRR, 2024
CoRR, 2024
A Comprehensive Taxonomy and Analysis of Talking Head Synthesis: Techniques for Portrait Generation, Driving Mechanisms, and Editing.
CoRR, 2024
Idea-2-3D: Collaborative LMM Agents Enable 3D Model Generation from Interleaved Multimodal Inputs.
CoRR, 2024
CoRR, 2024
AS-FIBA: Adaptive Selective Frequency-Injection for Backdoor Attack on Deep Face Restoration.
CoRR, 2024
Enhancing Weakly Supervised 3D Medical Image Segmentation through Probabilistic-aware Learning.
CoRR, 2024
Multi-dimensional Fusion and Consistency for Semi-supervised Medical Image Segmentation.
Proceedings of the MultiMedia Modeling - 30th International Conference, 2024
STDG: Semi-Teacher-Student Training Paradigm for Depth-guided One-stage Scene Graph Generation.
Proceedings of the 2024 International Conference on Multimedia Retrieval, 2024
BeatDance: A Beat-Based Model-Agnostic Contrastive Learning Framework for Music-Dance Retrieval.
Proceedings of the 2024 International Conference on Multimedia Retrieval, 2024
Proceedings of the 2024 International Conference on Multimedia Retrieval, 2024
ACR-Pose: Adversarial Canonical Representation Reconstruction Network for Category Level 6D Object Pose Estimation.
Proceedings of the 2024 International Conference on Multimedia Retrieval, 2024
Proceedings of the 2024 International Conference on Multimedia Retrieval, 2024
ESTGN: Enhanced Self-Mined Text Guided Super-Resolution Network for Superior Image Super Resolution.
Proceedings of the IEEE International Conference on Acoustics, 2024
Proceedings of the Computer Vision - ECCV 2024, 2024
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
Everything2Motion: Synchronizing Diverse Inputs via a Unified Framework for Human Motion Synthesis.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024
2023
Deep Learning on Monocular Object Pose Detection and Tracking: A Comprehensive Overview.
ACM Comput. Surv., 2023
STDG: Semi-Teacher-Student Training Paradigram for Depth-guided One-stage Scene Graph Generation.
CoRR, 2023
CoRR, 2023
SelfTalk: A Self-Supervised Commutative Training Diagram to Comprehend 3D Talking Faces.
Proceedings of the 31st ACM International Conference on Multimedia, 2023
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023
GIDP: Learning a Good Initialization and Inducing Descriptor Post-enhancing for Large-scale Place Recognition.
Proceedings of the IEEE International Conference on Robotics and Automation, 2023
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
2022
CoRR, 2022
CoRR, 2022
Proceedings of the Pattern Recognition and Computer Vision - 5th Chinese Conference, 2022
Unsupervised Multi-Task Learning for 3D Subtomogram Image Alignment, Clustering and Segmentation.
Proceedings of the 2022 IEEE International Conference on Image Processing, 2022
Proceedings of the Computer Vision - ECCV 2022 Workshops, 2022
Object Level Depth Reconstruction for Category Level 6D Object Pose Estimation from Monocular RGB Image.
Proceedings of the Computer Vision - ECCV 2022, 2022
SVT-Net: Super Light-Weight Sparse Voxel Transformer for Large Scale Place Recognition.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022
2021
ACR-Pose: Adversarial Canonical Representation Reconstruction Network for Category Level 6D Object Pose Estimation.
CoRR, 2021
Attentive Rotation Invariant Convolution for Point Cloud-based Large Scale Place Recognition.
CoRR, 2021
SVT-Net: A Super Light-Weight Network for Large Scale Place Recognition using Sparse Voxel Transformers.
CoRR, 2021
Proceedings of the IEEE International Conference on Acoustics, 2021
2020
Comput. Graph. Forum, 2020
Comput. Graph. Forum, 2020
DAGC: Employing Dual Attention and Graph Convolution for Point Cloud based Place Recognition.
Proceedings of the 2020 on International Conference on Multimedia Retrieval, 2020
Proceedings of the 32nd IEEE International Conference on Tools with Artificial Intelligence, 2020
2016
Int. J. Parallel Program., 2016