Kai Wang

Orcid: 0000-0002-1171-0281

Affiliations:

China Unicom, Beijing, China
CloudMinds Technologies Inc., AI Department, Beijing, China
Huawei Central Research Institute, Beijing, China (2013-2016)
Nanyang Technological University, Singapore (PhD 2013)

According to our database¹, Kai Wang authored at least 72 papers between 2010 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Bibliography

2026

MediaClaw: Multimodal Intelligent-Agent Platform Technical Report.

[BibT_eX]

[DOI]

CoRR, May, 2026

Mixture of Heterogeneous Grouped Experts for Language Modeling.

[BibT_eX]

[DOI]

CoRR, April, 2026

TIR-Agent: Training an Explorative and Efficient Agent for Image Restoration.

[BibT_eX]

[DOI]

CoRR, March, 2026

Chain-of-Trajectories: Unlocking the Intrinsic Generative Optimality of Diffusion Models via Graph-Theoretic Planning.

[BibT_eX]

[DOI]

CoRR, March, 2026

HEAL: Hindsight Entropy-Assisted Learning for Reasoning Distillation.

[BibT_eX]

[DOI]

CoRR, March, 2026

PaLMR: Towards Faithful Visual Reasoning via Multimodal Process Alignment.

[BibT_eX]

[DOI]

CoRR, March, 2026

Beyond Geometry: Artistic Disparity Synthesis for Immersive 2D-to-3D.

[BibT_eX]

[DOI]

CoRR, March, 2026

Enhanced data techniques and optimization in conversational gesture generation.

[BibT_eX]

[DOI]

CCF Trans. Pervasive Comput. Interact., March, 2026

MeanCache: From Instantaneous to Average Velocity for Accelerating Flow Matching Inference.

[BibT_eX]

[DOI]

CoRR, January, 2026

KAConvNet: Kolmogorov-Arnold convolutional networks for vision recognition.

[BibT_eX]

[DOI]

Image Vis. Comput., 2026

HiMo-CLIP: Modeling Semantic Hierarchy and Monotonicity in Vision-Language Alignment.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025

PSTF-AttControl: Per-Subject-Tuning-Free Personalized Image Generation with Controllable Face Attributes.

[BibT_eX]

[DOI]

CoRR, October, 2025

iLearnRobot: An Interactive Learning-Based Multi-Modal Robot with Continuous Improvement.

[BibT_eX]

[DOI]

CoRR, July, 2025

SLearnLLM: A Self-Learning Framework for Efficient Domain-Specific Adaptation of Large Language Models.

[BibT_eX]

[DOI]

CoRR, May, 2025

Quantitative Analysis of Performance Drop in DeepSeek Model Quantization.

[BibT_eX]

[DOI]

CoRR, May, 2025

Safety Evaluation and Enhancement of DeepSeek Models in Chinese Contexts.

[BibT_eX]

[DOI]

CoRR, March, 2025

DAST: Difficulty-Adaptive Slow-Thinking for Large Reasoning Models.

[BibT_eX]

[DOI]

CoRR, March, 2025

Quantifying the Capability Boundary of DeepSeek Models: An Application-Driven Performance Analysis.

[BibT_eX]

[DOI]

CoRR, February, 2025

Safety Evaluation of DeepSeek Models in Chinese Contexts.

[BibT_eX]

[DOI]

CoRR, February, 2025

MITS: A large-scale multimodal benchmark dataset for Intelligent Traffic Surveillance.

[BibT_eX]

[DOI]

Image Vis. Comput., 2025

PSTF-AttControl: Per-subject-tuning-free personalized image generation with controllable face attributes.

[BibT_eX]

[DOI]

Image Vis. Comput., 2025

Digital twin-enhanced robotic system for remote diesel engine assembly defect inspection.

[BibT_eX]

[DOI]

Ind. Robot, 2025

Joint Deblurring and 3D Reconstruction for Macrophotography.

[BibT_eX]

[DOI]

Comput. Graph. Forum, 2025

TAD: A Large-Scale Benchmark for Traffic Accidents Detection From Video Surveillance.

[BibT_eX]

[DOI]

IEEE Access, 2025

LeMiCa: Lexicographic Minimax Path Caching for Efficient Diffusion-Based Video Generation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

CP3: Customizable 3D Pop-Out Effect Creation for Immersive Content Using Multimodal Models.

[BibT_eX]

[DOI]

Proceedings of the 33rd ACM International Conference on Multimedia, 2025

AV-DiT: Taming Image Diffusion Transformers for Efficient Joint Audio and Video Generation.

[BibT_eX]

[DOI]

Proceedings of the 33rd ACM International Conference on Multimedia, 2025

Data Leakage Detection in Large Vision-Language Models via Multimodal Perturbation.

[BibT_eX]

[DOI]

Proceedings of the Image and Graphics - 13th International Conference, 2025

Art3D-Fusion: A Hybrid Framework for Visual Synthesis with Artistic Control.

[BibT_eX]

[DOI]

Proceedings of the Image and Graphics - 13th International Conference, 2025

ILearnRobot: An Interactive Learning-Based Multi-modal Robot with Continuous Improvement.

[BibT_eX]

[DOI]

Proceedings of the Advanced Intelligent Computing Technology and Applications, 2025

DAST: Difficulty-Adaptive Slow-Thinking for Large Reasoning Models.

[BibT_eX]

[DOI]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

Fuzzy Reasoning Chain (FRC): An Innovative Reasoning Framework from Fuzziness to Clarity.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2025, 2025

Optimizing for the Shortest Path in Denoising Diffusion Model.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

CORTEX: A Capability-Driven Reasoning Framework for Zero-Shot Large Language Model Selection.

[BibT_eX]

[DOI]

Proceedings of the 2025 9th International Conference on Computer Science and Artificial Intelligence, 2025

Quantifying Capability Boundaries: An Application-Driven Analysis for Large Language Model Selection.

[BibT_eX]

[DOI]

Proceedings of the 2025 9th International Conference on Computer Science and Artificial Intelligence, 2025

2024

Hybrid attention transformer with re-parameterized large kernel convolution for image super-resolution.

[BibT_eX]

[DOI]

Image Vis. Comput., 2024

Piculet: Specialized Models-Guided Hallucination Decrease for MultiModal Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

Methodology of Adapting Large English Language Models for Specific Cultural Contexts.

[BibT_eX]

[DOI]

CoRR, 2024

CHiSafetyBench: A Chinese Hierarchical Safety Benchmark for Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

RAOD: A Benchmark for Road Abandoned Object Detection From Video Surveillance.

[BibT_eX]

[DOI]

IEEE Access, 2024

Optimized Conversational Gesture Generation with Enhanced Motion Feature Extraction and Cascaded Generator.

[BibT_eX]

[DOI]

Proceedings of the Natural Language Processing and Chinese Computing, 2024

What is the Best Model? Application-Driven Evaluation for Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Natural Language Processing and Chinese Computing, 2024

Reparameterization-Based Parameter-Efficient Fine-Tuning Methods for Large Language Models: A Systematic Survey.

[BibT_eX]

[DOI]

Proceedings of the Natural Language Processing and Chinese Computing, 2024

A Large Vision-Language Model based Environment Perception System for Visually Impaired People.

[BibT_eX]

[DOI]

Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2024

Query Expansion and Verification with Large Language Model for Information Retrieval.

[BibT_eX]

[DOI]

Proceedings of the Advanced Intelligent Computing Technology and Applications, 2024

Self-supervised Visual Anomaly Detection with Image Patch Generation and Comparison Networks.

[BibT_eX]

[DOI]

Proceedings of the Advanced Intelligent Computing Technology and Applications, 2024

Spatial-Temporal Transformer Network for Continuous Action Recognition in Industrial Assembly.

[BibT_eX]

[DOI]

Proceedings of the Advanced Intelligent Computing Technology and Applications, 2024

A Multimodal Benchmark Dataset and Model for Crop Disease Diagnosis.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

2023

Multimodal Activity Detection for Natural Interaction with Virtual Human.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops, 2023

2022

Vision-Based Defect Classification and Weight Estimation of Rice Kernels.

[BibT_eX]

[DOI]

IEEE Access, 2022

2021

Vision-based robotic grasping from object localization, object pose estimation to grasp estimation for parallel grippers: a review.

[BibT_eX]

[DOI]

Artif. Intell. Rev., 2021

A Novel Speech-Driven Lip-Sync Model with CNN and LSTM.

[BibT_eX]

[DOI]

Proceedings of the 14th International Congress on Image and Signal Processing, 2021

2020

A survey on face data augmentation for the training of deep neural networks.

[BibT_eX]

[DOI]

Xiang Wang

Kai Wang

Shiguo Lian

Neural Comput. Appl., 2020

2019

Vision-based Robotic Grasping from Object Localization, Pose Estimation, Grasp Detection to Motion Planning: A Review.

[BibT_eX]

[DOI]

Guoguang Du

Kai Wang

Shiguo Lian

CoRR, 2019

Synthetic Data Generation and Adaption for Object Detection in Smart Vending Machines.

[BibT_eX]

[DOI]

CoRR, 2019

A Survey on Face Data Augmentation.

[BibT_eX]

[DOI]

Xiang Wang

Kai Wang

Shiguo Lian

CoRR, 2019

Progressive sketching with instant previewing.

[BibT_eX]

[DOI]

Kai Wang

Jianmin Zheng

Hock Soon Seah

Comput. Graph., 2019

Deep Consistent Illumination in Augmented Reality.

[BibT_eX]

[DOI]

Xiang Wang

Kai Wang

Shiguo Lian

Proceedings of the IEEE International Symposium on Mixed and Augmented Reality, 2019

Video Synthesis of Human Upper Body with Realistic Face.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Mixed and Augmented Reality, 2019

Towards More Realistic Human-Robot Conversation: A Seq2Seq-based Body Gesture Interaction System.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2019

A Unified Framework for Mutual Improvement of SLAM and Semantic Segmentation.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Robotics and Automation, 2019

Real-Time 3D Object Detection and Tracking in Monocular Images of Cluttered Environment.

[BibT_eX]

[DOI]

Proceedings of the Image and Graphics - 10th International Conference, 2019

Deep Learning Based Wearable Assistive System for Visually Impaired People.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshops, 2019

A Realistic Face-to-Face Conversation System Based on Deep Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshops, 2019

2018

Deep Learning Based Robot for Automatically Picking Up Garbage on the Grass.

[BibT_eX]

[DOI]

IEEE Trans. Consumer Electron., 2018

Virtual-Blind-Road Following-Based Wearable Navigation Device for Blind People.

[BibT_eX]

[DOI]

IEEE Trans. Consumer Electron., 2018

Enhancing Sketching and Sculpting for Shape Modeling.

[BibT_eX]

[DOI]

Kai Wang

Jianmin Zheng

Hock Soon Seah

Proceedings of the 2018 International Conference on Cyberworlds, 2018

2017

Smart guiding glasses for visually impaired people in indoor environment.

[BibT_eX]

[DOI]

IEEE Trans. Consumer Electron., 2017

2015

An intelligent screen system for context-related scenery viewing in smart home.

[BibT_eX]

[DOI]

Kai Wang

Shiguo Lian

Zhaoxiang Liu

IEEE Trans. Consumer Electron., 2015

2014

Automatic user state recognition for hand gesture based low-cost television control system.

[BibT_eX]

[DOI]

Shiguo Lian

Wei Hu

Kai Wang

IEEE Trans. Consumer Electron., 2014

2013

Sketch-based 3D modeling and reconstruction

[BibT_eX]

[DOI]

Kai Wang

PhD thesis, 2013

2010

Reference Plane Assisted Sketching Interface for 3D Freeform Shape Design.

[BibT_eX]

[DOI]

Kai Wang

Jianmin Zheng

Hock Soon Seah

Proceedings of the 2010 International Conference on CyberWorlds, 2010

Kai Wang

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...