Xinhan Di
Orcid: 0009-0001-8855-8628
According to our database1,
Xinhan Di authored at least 49 papers
between 2016 and 2026.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
2026
OCC-MLLM-CoT: Self-correction enhanced occlusion recognition with large language models via 3D-aware supervision, chain-of-thoughts guidance.
Image Vis. Comput., 2026
2025
LD-LAudio-V1: Video-to-Long-Form-Audio Generation Extension with Dual Lightweight Adapters.
CoRR, August, 2025
Preview WB-DH: Towards Whole Body Digital Human Bench for the Generation of Whole-body Talking Avatar Videos.
CoRR, August, 2025
Enhancing Math Reasoning in Small-sized LLMs via Preview Difficulty-Aware Intervention.
CoRR, August, 2025
JWB-DH-V1: Benchmark for Joint Whole-Body Talking Avatar and Speech Generation Version 1.
CoRR, July, 2025
CoRR, May, 2025
Towards Film-Making Production Dialogue, Narration, Monologue Adaptive Moving Dubbing Benchmarks.
CoRR, May, 2025
OCC-MLLM-CoT-Alpha: Towards Multi-stage Occlusion Recognition Based on Large Language Models via 3D-Aware Supervision and Chain-of-Thoughts Guidance.
CoRR, April, 2025
DeepDubber-V1: Towards High Quality and Dialogue, Narration, Monologue Adaptive Movie Dubbing Via Multi-Modal Chain-of-Thoughts Reasoning Guidance.
CoRR, March, 2025
DeepAudio-V1:Towards Multi-Modal Multi-Stage End-to-End Video to Speech and Audio Generation.
CoRR, March, 2025
CoRR, March, 2025
Enhance Generation Quality of Flow Matching V2A Model via Multi-Step CoT-Like Guidance and Combined Preference Optimization.
CoRR, March, 2025
CoRR, January, 2025
DualDub: Video-to-Soundtrack Generation via Joint Speech and Background Audio Synthesis.
Proceedings of the 33rd ACM International Conference on Multimedia, 2025
Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025
Attentional Triple-Encoder Network in Spatiospectral Domains for Medical Image Segmentation.
Proceedings of the IEEE Conference on Artificial Intelligence, 2025
Multiple Consistency-guided Test-Time Adaptation for Contrastive Audio-Language Models with Unlabeled Audio.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025
Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025
2024
Hand-Object Pose Estimation and Reconstruction Based on Signed Distance Field and Multiscale Feature Interaction.
IEEE Trans. Ind. Informatics, September, 2024
Towards Intrinsic Self-Correction Enhancement in Monte Carlo Tree Search Boosted Reasoning via Iterative Preference Learning.
CoRR, 2024
Low-Rank Adaptation with Task-Relevant Feature Enhancement for Fine-tuning Language Models.
CoRR, 2024
YingSound: Video-Guided Sound Effects Generation with Multi-modal Chain-of-Thought Controls.
CoRR, 2024
Multi-Stage Graph Learning for fMRI Analysis to Diagnose Neuro-Developmental Disorders.
CoRR, 2024
OCC-MLLM-Alpha:Empowering Multi-modal Large Language Model for the Understanding of Occluded Objects with Self-Supervised Test-Time Learning.
CoRR, 2024
OCC-MLLM:Empowering Multimodal Large Language Model For the Understanding of Occluded Objects.
CoRR, 2024
Towards Full-parameter and Parameter-efficient Self-learning For Endoscopic Camera Depth Estimation.
CoRR, 2024
Self-Supervised Learning of Deviation in Latent Representation for Co-speech Gesture Video Generation.
CoRR, 2024
Bailing-TTS: Chinese Dialectal Speech Synthesis Towards Human-like Spontaneous Representation.
CoRR, 2024
2023
An Attention-Based Signed Distance Field Estimation Method for Hand-Object Reconstruction.
Proceedings of the IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops, 2023
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
2022
CoRR, 2022
Proceedings of the Computer Vision - ECCV 2022 Workshops, 2022
2021
Multi-Agent Reinforcement Learning of 3D Furniture Layout Simulation in Indoor Graphics Scenes.
CoRR, 2021
CoRR, 2021
2020
CoRR, 2020
The Direction-Aware, Learnable, Additive Kernels and the Adversarial Network for Deep Floor Plan Recognition.
CoRR, 2020
Proceedings of the 2020 International Joint Conference on Neural Networks, 2020
Proceedings of the Computer Vision - ECCV 2020 Workshops, 2020
2019
2018
Towards Adversarial Training with Moderate Performance Improvement for Neural Network Classification.
CoRR, 2018
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018
2017
Proceedings of the 2017 IEEE International Conference on Computer Vision Workshops, 2017
Max-Boost-GAN: Max Operation to Boost Generative Ability of Generative Adversarial Networks.
Proceedings of the 2017 IEEE International Conference on Computer Vision Workshops, 2017
2016
Proceedings of the Computer Vision - ECCV 2016 Workshops, 2016