Xinhan Di

Orcid: 0009-0001-8855-8628

According to our database¹, Xinhan Di authored at least 50 papers between 2016 and 2026.

Collaborative distances:

Dijkstra number² of five.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

OCC-MLLM-CoT: Self-correction enhanced occlusion recognition with large language models via 3D-aware supervision, chain-of-thoughts guidance.

[BibT_eX]

[DOI]

Image Vis. Comput., 2026

OCC-MLLM-V1: Occlusion reasoning with commonsense-guided Multi-modal LLM based agent via internal Chain-of-Thoughts (CoTs).

[BibT_eX]

[DOI]

Comput. Vis. Image Underst., 2026

2025

LD-LAudio-V1: Video-to-Long-Form-Audio Generation Extension with Dual Lightweight Adapters.

[BibT_eX]

[DOI]

CoRR, August, 2025

Preview WB-DH: Towards Whole Body Digital Human Bench for the Generation of Whole-body Talking Avatar Videos.

[BibT_eX]

[DOI]

CoRR, August, 2025

Enhancing Math Reasoning in Small-sized LLMs via Preview Difficulty-Aware Intervention.

[BibT_eX]

[DOI]

Xinhan Di

JoyJiaoW

CoRR, August, 2025

JWB-DH-V1: Benchmark for Joint Whole-Body Talking Avatar and Speech Generation Version 1.

[BibT_eX]

[DOI]

Xinhan Di

Kristin Qi

Pengqian Yu

CoRR, July, 2025

Towards Video to Piano Music Generation with Chain-of-Perform Support Benchmarks.

[BibT_eX]

[DOI]

CoRR, May, 2025

Towards Film-Making Production Dialogue, Narration, Monologue Adaptive Moving Dubbing Benchmarks.

[BibT_eX]

[DOI]

CoRR, May, 2025

OCC-MLLM-CoT-Alpha: Towards Multi-stage Occlusion Recognition Based on Large Language Models via 3D-Aware Supervision and Chain-of-Thoughts Guidance.

[BibT_eX]

[DOI]

Chaoyi Wang

Baoqing Li

Xinhan Di

CoRR, April, 2025

DeepDubber-V1: Towards High Quality and Dialogue, Narration, Monologue Adaptive Movie Dubbing Via Multi-Modal Chain-of-Thoughts Reasoning Guidance.

[BibT_eX]

[DOI]

CoRR, March, 2025

DeepAudio-V1:Towards Multi-Modal Multi-Stage End-to-End Video to Speech and Audio Generation.

[BibT_eX]

[DOI]

CoRR, March, 2025

DeepSound-V1: Start to Think Step-by-Step in the Audio Generation from Videos.

[BibT_eX]

[DOI]

CoRR, March, 2025

Enhance Generation Quality of Flow Matching V2A Model via Multi-Step CoT-Like Guidance and Combined Preference Optimization.

[BibT_eX]

[DOI]

CoRR, March, 2025

Enhancing Reasoning through Process Supervision with Monte Carlo Tree Search.

[BibT_eX]

[DOI]

CoRR, January, 2025

DualDub: Video-to-Soundtrack Generation via Joint Speech and Background Audio Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 33rd ACM International Conference on Multimedia, 2025

MM-MovieDubber: Towards Multi-Modal Learning for Multi-Modal Movie Dubbing.

[BibT_eX]

[DOI]

Proceedings of the 26th Annual Conference of the International Speech Communication Association, 2025

Attentional Triple-Encoder Network in Spatiospectral Domains for Medical Image Segmentation.

[BibT_eX]

[DOI]

Kristin Qi

Xinhan Di

Proceedings of the IEEE Conference on Artificial Intelligence, 2025

Multiple Consistency-guided Test-Time Adaptation for Contrastive Audio-Language Models with Unlabeled Audio.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

HieraFashDiff: Hierarchical Fashion Design with Multi-stage Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024

Hand-Object Pose Estimation and Reconstruction Based on Signed Distance Field and Multiscale Feature Interaction.

[BibT_eX]

[DOI]

IEEE Trans. Ind. Informatics, September, 2024

Towards Intrinsic Self-Correction Enhancement in Monte Carlo Tree Search Boosted Reasoning via Iterative Preference Learning.

[BibT_eX]

[DOI]

CoRR, 2024

Low-Rank Adaptation with Task-Relevant Feature Enhancement for Fine-tuning Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

YingSound: Video-Guided Sound Effects Generation with Multi-modal Chain-of-Thought Controls.

[BibT_eX]

[DOI]

CoRR, 2024

Multi-Stage Graph Learning for fMRI Analysis to Diagnose Neuro-Developmental Disorders.

[BibT_eX]

[DOI]

CoRR, 2024

OCC-MLLM-Alpha:Empowering Multi-modal Large Language Model for the Understanding of Occluded Objects with Self-Supervised Test-Time Learning.

[BibT_eX]

[DOI]

Shuxin Yang

Xinhan Di

CoRR, 2024

OCC-MLLM:Empowering Multimodal Large Language Model For the Understanding of Occluded Objects.

[BibT_eX]

[DOI]

Wenmo Qiu

Xinhan Di

CoRR, 2024

Towards Full-parameter and Parameter-efficient Self-learning For Endoscopic Camera Depth Estimation.

[BibT_eX]

[DOI]

CoRR, 2024

Self-Supervised Learning of Deviation in Latent Representation for Co-speech Gesture Video Generation.

[BibT_eX]

[DOI]

CoRR, 2024

Bailing-TTS: Chinese Dialectal Speech Synthesis Towards Human-like Spontaneous Representation.

[BibT_eX]

[DOI]

CoRR, 2024

2023

An Attention-Based Signed Distance Field Estimation Method for Hand-Object Reconstruction.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops, 2023

Dual Attention Poser: Dual Path Body Tracking Based on Attention.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

Hierarchical Reinforcement Learning for Furniture Layout in Virtual Indoor Scenes.

[BibT_eX]

[DOI]

Xinhan Di

Pengqian Yu

CoRR, 2022

LWA-HAND: Lightweight Attention Hand for Interacting Hand Reconstruction.

[BibT_eX]

[DOI]

Xinhan Di

Pengqian Yu

Proceedings of the Computer Vision - ECCV 2022 Workshops, 2022

2021

Multi-Agent Reinforcement Learning of 3D Furniture Layout Simulation in Indoor Graphics Scenes.

[BibT_eX]

[DOI]

Xinhan Di

Pengqian Yu

CoRR, 2021

Deep Reinforcement Learning for Producing Furniture Layout in Indoor Scenes.

[BibT_eX]

[DOI]

Xinhan Di

Pengqian Yu

CoRR, 2021

2020

End-to-end Generative Floor-plan and Layout with Attributes and Relation Graph.

[BibT_eX]

[DOI]

CoRR, 2020

Deep Layout of Custom-size Furniture through Multiple-domain Learning.

[BibT_eX]

[DOI]

CoRR, 2020

Structural Plan of Indoor Scenes with Personalized Preferences.

[BibT_eX]

[DOI]

CoRR, 2020

Towards Adversarial Planning for Indoor Scenes with Rotation.

[BibT_eX]

[DOI]

CoRR, 2020

The Direction-Aware, Learnable, Additive Kernels and the Adversarial Network for Deep Floor Plan Recognition.

[BibT_eX]

[DOI]

CoRR, 2020

Mutual Information Maximization in Graph Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 2020 International Joint Conference on Neural Networks, 2020

Structural Plan of Indoor Scenes with Personalized Preferences.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020 Workshops, 2020

2019

Neighborhood Enlargement in Graph Neural Networks.

[BibT_eX]

[DOI]

CoRR, 2019

2018

Ambient Hidden Space of Generative Adversarial Networks.

[BibT_eX]

[DOI]

Xinhan Di

Pengqian Yu

Meng Tian

CoRR, 2018

Towards Adversarial Training with Moderate Performance Improvement for Neural Network Classification.

[BibT_eX]

[DOI]

Xinhan Di

Pengqian Yu

Meng Tian

CoRR, 2018

PointCNN: Convolution On X-Transformed Points.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

2017

3D Reconstruction of Simple Objects from A Single View Silhouette Image.

[BibT_eX]

[DOI]

Xinhan Di

Pengqian Yu

CoRR, 2017

Multiplicative Noise Channel in Generative Adversarial Networks.

[BibT_eX]

[DOI]

Xinhan Di

Pengqian Yu

Proceedings of the 2017 IEEE International Conference on Computer Vision Workshops, 2017

Max-Boost-GAN: Max Operation to Boost Generative Ability of Generative Adversarial Networks.

[BibT_eX]

[DOI]

Xinhan Di

Pengqian Yu

Proceedings of the 2017 IEEE International Conference on Computer Vision Workshops, 2017

2016

Deep Shape from a Low Number of Silhouettes.

[BibT_eX]

[DOI]

Xinhan Di

Rozenn Dahyot

Mukta Prasad

Proceedings of the Computer Vision - ECCV 2016 Workshops, 2016

Xinhan Di

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...