Wenhao Chai

Orcid: 0000-0003-2611-0008

According to our database¹, Wenhao Chai authored at least 76 papers between 2022 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2026

Visual tracking of dynamic defective contour based on fused long short-term memory model.

[BibT_eX]

[DOI]

Expert Syst. Appl., 2026

2025

A Survey of Deep Learning in Sports Applications: Perception, Comprehension, and Decision.

[BibT_eX]

[DOI]

IEEE Trans. Vis. Comput. Graph., October, 2025

UniHPR: Unified Human Pose Representation via Singular Value Contrastive Learning.

[BibT_eX]

[DOI]

CoRR, October, 2025

AutoCode: LLMs as Problem Setters for Competitive Programming.

[BibT_eX]

[DOI]

CoRR, October, 2025

MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization.

[BibT_eX]

[DOI]

CoRR, October, 2025

VideoNSA: Native Sparse Attention Scales Video Understanding.

[BibT_eX]

[DOI]

CoRR, October, 2025

Dense Video Understanding with Gated Residual Tokenization.

[BibT_eX]

[DOI]

CoRR, September, 2025

Pose-Guided Transformer for Fine-Grained Action Quality Assessment.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., August, 2025

AuroraLong: Bringing RNNs Back to Efficient Open-Ended Video Understanding.

[BibT_eX]

[DOI]

CoRR, July, 2025

ToSA: Token Merging with Spatial Awareness.

[BibT_eX]

[DOI]

CoRR, June, 2025

LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive Programming?

[BibT_eX]

[DOI]

CoRR, June, 2025

Muddit: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model.

[BibT_eX]

[DOI]

CoRR, May, 2025

GAM-Agent: Game-Theoretic and Uncertainty-Aware Collaboration for Complex Visual Reasoning.

[BibT_eX]

[DOI]

CoRR, May, 2025

TEMPURA: Temporal Event Masked Prediction and Understanding for Reasoning in Action.

[BibT_eX]

[DOI]

CoRR, May, 2025

Video-MMLU: A Massive Multi-Discipline Lecture Understanding Benchmark.

[BibT_eX]

[DOI]

CoRR, April, 2025

An Empirical Study of GPT-4o Image Generation Capabilities.

[BibT_eX]

[DOI]

CoRR, April, 2025

EMMOE: A Comprehensive Benchmark for Embodied Mobile Manipulation in Open Environments.

[BibT_eX]

[DOI]

Katherine Rose Driggs-Campbell

Gaoang Wang

CoRR, March, 2025

DiffPO: Diffusion-styled Preference Optimization for Efficient Inference-Time Alignment of Large Language Models.

[BibT_eX]

[DOI]

CoRR, March, 2025

Multimodal Representation Alignment for Image Generation: Text-Image Interleaved Control Is Easier Than You Think.

[BibT_eX]

[DOI]

CoRR, February, 2025

Pointmap Association and Piecewise-Plane Constraint for Consistent and Compact 3D Gaussian Segmentation Field.

[BibT_eX]

[DOI]

CoRR, February, 2025

PackDiT: Joint Human Motion and Text Generation via Mutual Prompting.

[BibT_eX]

[DOI]

CoRR, January, 2025

Efficient Transfer From Image-Based Large Multimodal Models to Video Tasks.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2025

PAD: Personalized Alignment of LLMs at Decoding-time.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark.

[BibT_eX]

[DOI]

Christopher D. Manning

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

MambaMOT: State-Space Model as Motion Predictor for Multi-Object Tracking.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

MonoTAKD: Teaching Assistant Knowledge Distillation for Monocular 3D Object Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Science-T2I: Addressing Scientific Illusions in Image Synthesis.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Zero-shot 3D Question Answering via Voxel-based Dynamic Token Compression.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

CityGen: Infinite and Controllable City Layout Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2025

DiffPO: Diffusion-styled Preference Optimization for Inference Time Alignment of Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

AGLLDiff: Guiding Diffusion Models Towards Unsupervised Training-free Real-world Low-light Image Enhancement.

[BibT_eX]

[DOI]

Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

PromptHaze: Prompting Real-world Dehazing via Depth Anything Model.

[BibT_eX]

[DOI]

Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024

Deep Learning Methods for Small Molecule Drug Discovery: A Survey.

[BibT_eX]

[DOI]

IEEE Trans. Artif. Intell., February, 2024

DiffFashion: Reference-Based Fashion Design With Structure-Aware Transfer by Diffusion Models.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2024

SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory.

[BibT_eX]

[DOI]

CoRR, 2024

Ego3DT: Tracking Every 3D Object in Ego-centric Videos.

[BibT_eX]

[DOI]

CoRR, 2024

PAD: Personalized Alignment at Decoding-Time.

[BibT_eX]

[DOI]

CoRR, 2024

AGLLDiff: Guiding Diffusion Models Towards Unsupervised Training-free Real-world Low-light Image Enhancement.

[BibT_eX]

[DOI]

CoRR, 2024

STEVE Series: Step-by-Step Construction of Agent Systems in Minecraft.

[BibT_eX]

[DOI]

CoRR, 2024

CityCraft: A Real Crafter for 3D City Generation.

[BibT_eX]

[DOI]

CoRR, 2024

MovieChat+: Question-aware Sparse Memory for Long Video Question Answering.

[BibT_eX]

[DOI]

CoRR, 2024

MonoTAKD: Teaching Assistant Knowledge Distillation for Monocular 3D Object Detection.

[BibT_eX]

[DOI]

CoRR, 2024

Do We Really Need a Complex Agent System? Distill Embodied Agent into a Single Model.

[BibT_eX]

[DOI]

CoRR, 2024

VersaT2I: Improving Text-to-Image Models with Versatile Reward.

[BibT_eX]

[DOI]

CoRR, 2024

Exploring Learning-based Motion Models in Multi-Object Tracking.

[BibT_eX]

[DOI]

CoRR, 2024

Hierarchical Auto-Organizing System for Open-Ended Multi-Agent Navigation.

[BibT_eX]

[DOI]

CoRR, 2024

Efficient Domain Adaptation via Generative Prior for 3D Infant Pose Estimation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision Workshops, 2024

Back to Optimization: Diffusion-based Zero-Shot 3D Human Pose Estimation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024

MPM: A Unified 2D-3D Human Pose Representation via Masked Pose Modeling.

[BibT_eX]

[DOI]

Proceedings of the Pattern Recognition and Computer Vision - 7th Chinese Conference, 2024

An Efficient Multi-prior Hybrid Approach for Consistent 3D Generation from Single Images.

[BibT_eX]

[DOI]

Proceedings of the 6th ACM International Conference on Multimedia in Asia, 2024

Ego3DT: Tracking Every 3D Object in Ego-centric Videos.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

LLaVA-Ultra: Large Chinese Language and Vision Assistant for Ultrasound.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Boosting Online 3D Multi-Object Tracking through Camera-Radar Cross Check.

[BibT_eX]

[DOI]

Proceedings of the IEEE Intelligent Vehicles Symposium, 2024

Blind Inpainting with Object-Aware Discrimination for Artificial Marker Removal.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

See and Think: Embodied Agent in Virtual Environment.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

RT-Pose: A 4D Radar Tensor-Based 3D Human Pose Estimation and Localization Benchmark.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

NTIRE 2024 Image Shadow Removal Challenge Report.

[BibT_eX]

[DOI]

Florin-Alexandru Vasluianu

Santosh Kumar Vipparthi

Ahmad 'Athif Mohd Faudzi

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

MovieChat: From Dense Token to Sparse Memory for Long Video Understanding.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Learning Diffusion Texture Priors for Image Restoration.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

UniAP: Towards Universal Animal Perception in Vision via Few-Shot Learning.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

CityGen: Infinite and Controllable 3D City Layout Generation.

[BibT_eX]

[DOI]

CoRR, 2023

UniHPE: Towards Unified Human Pose Estimation via Contrastive Learning.

[BibT_eX]

[DOI]

CoRR, 2023

See and Think: Embodied Agent in Virtual Environment.

[BibT_eX]

[DOI]

CoRR, 2023

Devil in the Number: Towards Robust Multi-modality Data Filter.

[BibT_eX]

[DOI]

CoRR, 2023

Chasing Consistency in Text-to-3D Generation from a Single Image.

[BibT_eX]

[DOI]

CoRR, 2023

MovieChat: From Dense Token to Sparse Memory for Long Video Understanding.

[BibT_eX]

[DOI]

CoRR, 2023

Five A<sup>+</sup> Network: You Only Need 9K Parameters for Underwater Image Enhancement.

[BibT_eX]

[DOI]

CoRR, 2023

User-Aware Prefix-Tuning Is a Good Learner for Personalized Image Captioning.

[BibT_eX]

[DOI]

Proceedings of the Pattern Recognition and Computer Vision - 6th Chinese Conference, 2023

Sequential Affinity Learning for Video Restoration.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

PoSynDA: Multi-Hypothesis Pose Synthesis Domain Adaptation for Robust 3D Human Pose Estimation.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

Global Adaptation meets Local Generalization: Unsupervised Domain Adaptation for 3D Human Pose Estimation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

StableVideo: Text-driven Consistency-aware Diffusion Video Editing.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Image Reference-guided Fashion Design with Structure-aware Transfer by Diffusion Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Five A+ Network: You Only Need 9K Parameters for Underwater Image Enhancement.

[BibT_eX]

[DOI]

Proceedings of the 34th British Machine Vision Conference 2023, 2023

2022

Optimal sensor placement of bridge structure based on sensitivity-effective independence method.

[BibT_eX]

[DOI]

IET Circuits Devices Syst., 2022

Weakly Supervised Two-Stage Training Scheme for Deep Video Fight Detection Model.

[BibT_eX]

[DOI]

Volodymyr V. Kindratenko

Proceedings of the 34th IEEE International Conference on Tools with Artificial Intelligence, 2022

Wenhao Chai

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...