Xinhan Di

Orcid: 0009-0001-8855-8628

According to our database1, Xinhan Di authored at least 48 papers between 2016 and 2025.

Collaborative distances:
  • Dijkstra number2 of five.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
LD-LAudio-V1: Video-to-Long-Form-Audio Generation Extension with Dual Lightweight Adapters.
CoRR, August, 2025

Preview WB-DH: Towards Whole Body Digital Human Bench for the Generation of Whole-body Talking Avatar Videos.
CoRR, August, 2025

Enhancing Math Reasoning in Small-sized LLMs via Preview Difficulty-Aware Intervention.
CoRR, August, 2025

JWB-DH-V1: Benchmark for Joint Whole-Body Talking Avatar and Speech Generation Version 1.
CoRR, July, 2025

DualDub: Video-to-Soundtrack Generation via Joint Speech and Background Audio Synthesis.
CoRR, July, 2025

Towards Video to Piano Music Generation with Chain-of-Perform Support Benchmarks.
CoRR, May, 2025

MM-MovieDubber: Towards Multi-Modal Learning for Multi-Modal Movie Dubbing.
CoRR, May, 2025

Towards Film-Making Production Dialogue, Narration, Monologue Adaptive Moving Dubbing Benchmarks.
CoRR, May, 2025

OCC-MLLM-CoT-Alpha: Towards Multi-stage Occlusion Recognition Based on Large Language Models via 3D-Aware Supervision and Chain-of-Thoughts Guidance.
CoRR, April, 2025

DeepDubber-V1: Towards High Quality and Dialogue, Narration, Monologue Adaptive Movie Dubbing Via Multi-Modal Chain-of-Thoughts Reasoning Guidance.
CoRR, March, 2025

DeepAudio-V1:Towards Multi-Modal Multi-Stage End-to-End Video to Speech and Audio Generation.
CoRR, March, 2025

DeepSound-V1: Start to Think Step-by-Step in the Audio Generation from Videos.
CoRR, March, 2025

Enhance Generation Quality of Flow Matching V2A Model via Multi-Step CoT-Like Guidance and Combined Preference Optimization.
CoRR, March, 2025

Attentional Triple-Encoder Network in Spatiospectral Domains for Medical Image Segmentation.
CoRR, March, 2025

Enhancing Reasoning through Process Supervision with Monte Carlo Tree Search.
CoRR, January, 2025

Multiple Consistency-guided Test-Time Adaptation for Contrastive Audio-Language Models with Unlabeled Audio.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

HieraFashDiff: Hierarchical Fashion Design with Multi-stage Diffusion Models.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024
Hand-Object Pose Estimation and Reconstruction Based on Signed Distance Field and Multiscale Feature Interaction.
IEEE Trans. Ind. Informatics, September, 2024

Towards Intrinsic Self-Correction Enhancement in Monte Carlo Tree Search Boosted Reasoning via Iterative Preference Learning.
CoRR, 2024

Low-Rank Adaptation with Task-Relevant Feature Enhancement for Fine-tuning Language Models.
CoRR, 2024

YingSound: Video-Guided Sound Effects Generation with Multi-modal Chain-of-Thought Controls.
CoRR, 2024

Multi-Stage Graph Learning for fMRI Analysis to Diagnose Neuro-Developmental Disorders.
CoRR, 2024

OCC-MLLM-Alpha:Empowering Multi-modal Large Language Model for the Understanding of Occluded Objects with Self-Supervised Test-Time Learning.
CoRR, 2024

OCC-MLLM:Empowering Multimodal Large Language Model For the Understanding of Occluded Objects.
CoRR, 2024

Towards Full-parameter and Parameter-efficient Self-learning For Endoscopic Camera Depth Estimation.
CoRR, 2024

Self-Supervised Learning of Deviation in Latent Representation for Co-speech Gesture Video Generation.
CoRR, 2024

Bailing-TTS: Chinese Dialectal Speech Synthesis Towards Human-like Spontaneous Representation.
CoRR, 2024

2023
An Attention-Based Signed Distance Field Estimation Method for Hand-Object Reconstruction.
Proceedings of the IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops, 2023

Dual Attention Poser: Dual Path Body Tracking Based on Attention.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
Hierarchical Reinforcement Learning for Furniture Layout in Virtual Indoor Scenes.
CoRR, 2022

LWA-HAND: Lightweight Attention Hand for Interacting Hand Reconstruction.
Proceedings of the Computer Vision - ECCV 2022 Workshops, 2022

2021
Multi-Agent Reinforcement Learning of 3D Furniture Layout Simulation in Indoor Graphics Scenes.
CoRR, 2021

Deep Reinforcement Learning for Producing Furniture Layout in Indoor Scenes.
CoRR, 2021

2020
End-to-end Generative Floor-plan and Layout with Attributes and Relation Graph.
CoRR, 2020

Deep Layout of Custom-size Furniture through Multiple-domain Learning.
CoRR, 2020

Structural Plan of Indoor Scenes with Personalized Preferences.
CoRR, 2020

Towards Adversarial Planning for Indoor Scenes with Rotation.
CoRR, 2020

The Direction-Aware, Learnable, Additive Kernels and the Adversarial Network for Deep Floor Plan Recognition.
CoRR, 2020

Mutual Information Maximization in Graph Neural Networks.
Proceedings of the 2020 International Joint Conference on Neural Networks, 2020

Structural Plan of Indoor Scenes with Personalized Preferences.
Proceedings of the Computer Vision - ECCV 2020 Workshops, 2020

2019
Neighborhood Enlargement in Graph Neural Networks.
CoRR, 2019

2018
Ambient Hidden Space of Generative Adversarial Networks.
CoRR, 2018

Towards Adversarial Training with Moderate Performance Improvement for Neural Network Classification.
CoRR, 2018

PointCNN: Convolution On X-Transformed Points.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

2017
3D Reconstruction of Simple Objects from A Single View Silhouette Image.
CoRR, 2017

Multiplicative Noise Channel in Generative Adversarial Networks.
Proceedings of the 2017 IEEE International Conference on Computer Vision Workshops, 2017

Max-Boost-GAN: Max Operation to Boost Generative Ability of Generative Adversarial Networks.
Proceedings of the 2017 IEEE International Conference on Computer Vision Workshops, 2017

2016
Deep Shape from a Low Number of Silhouettes.
Proceedings of the Computer Vision - ECCV 2016 Workshops, 2016


  Loading...