Wenhao Chai

Orcid: 0000-0003-2611-0008

According to our database1, Wenhao Chai authored at least 71 papers between 2022 and 2026.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2026
Visual tracking of dynamic defective contour based on fused long short-term memory model.
Expert Syst. Appl., 2026

2025
Pose-Guided Transformer for Fine-Grained Action Quality Assessment.
IEEE Trans. Circuits Syst. Video Technol., August, 2025

AuroraLong: Bringing RNNs Back to Efficient Open-Ended Video Understanding.
CoRR, July, 2025

ToSA: Token Merging with Spatial Awareness.
CoRR, June, 2025

LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive Programming?
CoRR, June, 2025

Muddit: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model.
CoRR, May, 2025

GAM-Agent: Game-Theoretic and Uncertainty-Aware Collaboration for Complex Visual Reasoning.
CoRR, May, 2025

TEMPURA: Temporal Event Masked Prediction and Understanding for Reasoning in Action.
CoRR, May, 2025

Video-MMLU: A Massive Multi-Discipline Lecture Understanding Benchmark.
CoRR, April, 2025

An Empirical Study of GPT-4o Image Generation Capabilities.
CoRR, April, 2025

EMMOE: A Comprehensive Benchmark for Embodied Mobile Manipulation in Open Environments.
CoRR, March, 2025

DiffPO: Diffusion-styled Preference Optimization for Efficient Inference-Time Alignment of Large Language Models.
CoRR, March, 2025

Multimodal Representation Alignment for Image Generation: Text-Image Interleaved Control Is Easier Than You Think.
CoRR, February, 2025

Pointmap Association and Piecewise-Plane Constraint for Consistent and Compact 3D Gaussian Segmentation Field.
CoRR, February, 2025

PackDiT: Joint Human Motion and Text Generation via Mutual Prompting.
CoRR, January, 2025

Efficient Transfer From Image-Based Large Multimodal Models to Video Tasks.
IEEE Trans. Multim., 2025

PAD: Personalized Alignment of LLMs at Decoding-time.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

MambaMOT: State-Space Model as Motion Predictor for Multi-Object Tracking.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

MonoTAKD: Teaching Assistant Knowledge Distillation for Monocular 3D Object Detection.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Science-T2I: Addressing Scientific Illusions in Image Synthesis.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Zero-shot 3D Question Answering via Voxel-based Dynamic Token Compression.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

CityGen: Infinite and Controllable City Layout Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2025

DiffPO: Diffusion-styled Preference Optimization for Inference Time Alignment of Large Language Models.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

AGLLDiff: Guiding Diffusion Models Towards Unsupervised Training-free Real-world Low-light Image Enhancement.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

PromptHaze: Prompting Real-world Dehazing via Depth Anything Model.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024
Deep Learning Methods for Small Molecule Drug Discovery: A Survey.
IEEE Trans. Artif. Intell., February, 2024

DiffFashion: Reference-Based Fashion Design With Structure-Aware Transfer by Diffusion Models.
IEEE Trans. Multim., 2024

SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory.
CoRR, 2024

Ego3DT: Tracking Every 3D Object in Ego-centric Videos.
CoRR, 2024

PAD: Personalized Alignment at Decoding-Time.
CoRR, 2024

AGLLDiff: Guiding Diffusion Models Towards Unsupervised Training-free Real-world Low-light Image Enhancement.
CoRR, 2024

STEVE Series: Step-by-Step Construction of Agent Systems in Minecraft.
CoRR, 2024

CityCraft: A Real Crafter for 3D City Generation.
CoRR, 2024

MovieChat+: Question-aware Sparse Memory for Long Video Question Answering.
CoRR, 2024

MonoTAKD: Teaching Assistant Knowledge Distillation for Monocular 3D Object Detection.
CoRR, 2024

Do We Really Need a Complex Agent System? Distill Embodied Agent into a Single Model.
CoRR, 2024

VersaT2I: Improving Text-to-Image Models with Versatile Reward.
CoRR, 2024

Exploring Learning-based Motion Models in Multi-Object Tracking.
CoRR, 2024

Hierarchical Auto-Organizing System for Open-Ended Multi-Agent Navigation.
CoRR, 2024

Efficient Domain Adaptation via Generative Prior for 3D Infant Pose Estimation.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision Workshops, 2024

Back to Optimization: Diffusion-based Zero-Shot 3D Human Pose Estimation.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024

MPM: A Unified 2D-3D Human Pose Representation via Masked Pose Modeling.
Proceedings of the Pattern Recognition and Computer Vision - 7th Chinese Conference, 2024

An Efficient Multi-prior Hybrid Approach for Consistent 3D Generation from Single Images.
Proceedings of the 6th ACM International Conference on Multimedia in Asia, 2024

Ego3DT: Tracking Every 3D Object in Ego-centric Videos.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

LLaVA-Ultra: Large Chinese Language and Vision Assistant for Ultrasound.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Boosting Online 3D Multi-Object Tracking through Camera-Radar Cross Check.
Proceedings of the IEEE Intelligent Vehicles Symposium, 2024

Blind Inpainting with Object-Aware Discrimination for Artificial Marker Removal.
Proceedings of the IEEE International Conference on Acoustics, 2024

See and Think: Embodied Agent in Virtual Environment.
Proceedings of the Computer Vision - ECCV 2024, 2024

RT-Pose: A 4D Radar Tensor-Based 3D Human Pose Estimation and Localization Benchmark.
Proceedings of the Computer Vision - ECCV 2024, 2024

NTIRE 2024 Image Shadow Removal Challenge Report.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

MovieChat: From Dense Token to Sparse Memory for Long Video Understanding.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Learning Diffusion Texture Priors for Image Restoration.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

UniAP: Towards Universal Animal Perception in Vision via Few-Shot Learning.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
CityGen: Infinite and Controllable 3D City Layout Generation.
CoRR, 2023

UniHPE: Towards Unified Human Pose Estimation via Contrastive Learning.
CoRR, 2023

See and Think: Embodied Agent in Virtual Environment.
CoRR, 2023

Devil in the Number: Towards Robust Multi-modality Data Filter.
CoRR, 2023

Chasing Consistency in Text-to-3D Generation from a Single Image.
CoRR, 2023

MovieChat: From Dense Token to Sparse Memory for Long Video Understanding.
CoRR, 2023

A Survey of Deep Learning in Sports Applications: Perception, Comprehension, and Decision.
CoRR, 2023

Five A<sup>+</sup> Network: You Only Need 9K Parameters for Underwater Image Enhancement.
CoRR, 2023

User-Aware Prefix-Tuning Is a Good Learner for Personalized Image Captioning.
Proceedings of the Pattern Recognition and Computer Vision - 6th Chinese Conference, 2023

Sequential Affinity Learning for Video Restoration.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

PoSynDA: Multi-Hypothesis Pose Synthesis Domain Adaptation for Robust 3D Human Pose Estimation.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Global Adaptation meets Local Generalization: Unsupervised Domain Adaptation for 3D Human Pose Estimation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

StableVideo: Text-driven Consistency-aware Diffusion Video Editing.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Image Reference-guided Fashion Design with Structure-aware Transfer by Diffusion Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Five A+ Network: You Only Need 9K Parameters for Underwater Image Enhancement.
Proceedings of the 34th British Machine Vision Conference 2023, 2023

2022
Optimal sensor placement of bridge structure based on sensitivity-effective independence method.
IET Circuits Devices Syst., 2022

Weakly Supervised Two-Stage Training Scheme for Deep Video Fight Detection Model.
Proceedings of the 34th IEEE International Conference on Tools with Artificial Intelligence, 2022


  Loading...