Kai Wang

Orcid: 0000-0002-1171-0281

Affiliations:
  • China Unicom, Beijing, China
  • CloudMinds Technologies Inc., AI Department, Beijing, China
  • Huawei Central Research Institute, Beijing, China (2013-2016)
  • Nanyang Technological University, Singapore (PhD 2013)


According to our database1, Kai Wang authored at least 71 papers between 2010 and 2026.

Collaborative distances:

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

Online presence:

On csauthors.net:

Bibliography

2026
Mixture of Heterogeneous Grouped Experts for Language Modeling.
CoRR, April, 2026

TIR-Agent: Training an Explorative and Efficient Agent for Image Restoration.
CoRR, March, 2026

Chain-of-Trajectories: Unlocking the Intrinsic Generative Optimality of Diffusion Models via Graph-Theoretic Planning.
CoRR, March, 2026

HEAL: Hindsight Entropy-Assisted Learning for Reasoning Distillation.
CoRR, March, 2026

PaLMR: Towards Faithful Visual Reasoning via Multimodal Process Alignment.
CoRR, March, 2026

Beyond Geometry: Artistic Disparity Synthesis for Immersive 2D-to-3D.
CoRR, March, 2026

Enhanced data techniques and optimization in conversational gesture generation.
CCF Trans. Pervasive Comput. Interact., March, 2026

MeanCache: From Instantaneous to Average Velocity for Accelerating Flow Matching Inference.
CoRR, January, 2026

KAConvNet: Kolmogorov-Arnold convolutional networks for vision recognition.
Image Vis. Comput., 2026

HiMo-CLIP: Modeling Semantic Hierarchy and Monotonicity in Vision-Language Alignment.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025
LeMiCa: Lexicographic Minimax Path Caching for Efficient Diffusion-Based Video Generation.
CoRR, November, 2025

PSTF-AttControl: Per-Subject-Tuning-Free Personalized Image Generation with Controllable Face Attributes.
CoRR, October, 2025

iLearnRobot: An Interactive Learning-Based Multi-Modal Robot with Continuous Improvement.
CoRR, July, 2025

SLearnLLM: A Self-Learning Framework for Efficient Domain-Specific Adaptation of Large Language Models.
CoRR, May, 2025

Quantitative Analysis of Performance Drop in DeepSeek Model Quantization.
CoRR, May, 2025

Safety Evaluation and Enhancement of DeepSeek Models in Chinese Contexts.
CoRR, March, 2025

DAST: Difficulty-Adaptive Slow-Thinking for Large Reasoning Models.
CoRR, March, 2025

Quantifying the Capability Boundary of DeepSeek Models: An Application-Driven Performance Analysis.
CoRR, February, 2025

Safety Evaluation of DeepSeek Models in Chinese Contexts.
CoRR, February, 2025

MITS: A large-scale multimodal benchmark dataset for Intelligent Traffic Surveillance.
Image Vis. Comput., 2025

PSTF-AttControl: Per-subject-tuning-free personalized image generation with controllable face attributes.
Image Vis. Comput., 2025

Digital twin-enhanced robotic system for remote diesel engine assembly defect inspection.
Ind. Robot, 2025

Joint Deblurring and 3D Reconstruction for Macrophotography.
Comput. Graph. Forum, 2025

TAD: A Large-Scale Benchmark for Traffic Accidents Detection From Video Surveillance.
IEEE Access, 2025

CP3: Customizable 3D Pop-Out Effect Creation for Immersive Content Using Multimodal Models.
Proceedings of the 33rd ACM International Conference on Multimedia, 2025

AV-DiT: Taming Image Diffusion Transformers for Efficient Joint Audio and Video Generation.
Proceedings of the 33rd ACM International Conference on Multimedia, 2025

Data Leakage Detection in Large Vision-Language Models via Multimodal Perturbation.
Proceedings of the Image and Graphics - 13th International Conference, 2025

Art3D-Fusion: A Hybrid Framework for Visual Synthesis with Artistic Control.
Proceedings of the Image and Graphics - 13th International Conference, 2025

ILearnRobot: An Interactive Learning-Based Multi-modal Robot with Continuous Improvement.
Proceedings of the Advanced Intelligent Computing Technology and Applications, 2025

DAST: Difficulty-Adaptive Slow-Thinking for Large Reasoning Models.
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

Fuzzy Reasoning Chain (FRC): An Innovative Reasoning Framework from Fuzziness to Clarity.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2025, 2025

Optimizing for the Shortest Path in Denoising Diffusion Model.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

CORTEX: A Capability-Driven Reasoning Framework for Zero-Shot Large Language Model Selection.
Proceedings of the 2025 9th International Conference on Computer Science and Artificial Intelligence, 2025

Quantifying Capability Boundaries: An Application-Driven Analysis for Large Language Model Selection.
Proceedings of the 2025 9th International Conference on Computer Science and Artificial Intelligence, 2025

2024
Hybrid attention transformer with re-parameterized large kernel convolution for image super-resolution.
Image Vis. Comput., 2024

Piculet: Specialized Models-Guided Hallucination Decrease for MultiModal Large Language Models.
CoRR, 2024

Methodology of Adapting Large English Language Models for Specific Cultural Contexts.
CoRR, 2024

CHiSafetyBench: A Chinese Hierarchical Safety Benchmark for Large Language Models.
CoRR, 2024

RAOD: A Benchmark for Road Abandoned Object Detection From Video Surveillance.
IEEE Access, 2024

Optimized Conversational Gesture Generation with Enhanced Motion Feature Extraction and Cascaded Generator.
Proceedings of the Natural Language Processing and Chinese Computing, 2024

What is the Best Model? Application-Driven Evaluation for Large Language Models.
Proceedings of the Natural Language Processing and Chinese Computing, 2024

Reparameterization-Based Parameter-Efficient Fine-Tuning Methods for Large Language Models: A Systematic Survey.
Proceedings of the Natural Language Processing and Chinese Computing, 2024

A Large Vision-Language Model based Environment Perception System for Visually Impaired People.
Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2024

Query Expansion and Verification with Large Language Model for Information Retrieval.
Proceedings of the Advanced Intelligent Computing Technology and Applications, 2024

Self-supervised Visual Anomaly Detection with Image Patch Generation and Comparison Networks.
Proceedings of the Advanced Intelligent Computing Technology and Applications, 2024

Spatial-Temporal Transformer Network for Continuous Action Recognition in Industrial Assembly.
Proceedings of the Advanced Intelligent Computing Technology and Applications, 2024

A Multimodal Benchmark Dataset and Model for Crop Disease Diagnosis.
Proceedings of the Computer Vision - ECCV 2024, 2024

2023
Multimodal Activity Detection for Natural Interaction with Virtual Human.
Proceedings of the IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops, 2023

2022
Vision-Based Defect Classification and Weight Estimation of Rice Kernels.
IEEE Access, 2022

2021
Vision-based robotic grasping from object localization, object pose estimation to grasp estimation for parallel grippers: a review.
Artif. Intell. Rev., 2021

A Novel Speech-Driven Lip-Sync Model with CNN and LSTM.
Proceedings of the 14th International Congress on Image and Signal Processing, 2021

2020
A survey on face data augmentation for the training of deep neural networks.
Neural Comput. Appl., 2020

2019
Vision-based Robotic Grasping from Object Localization, Pose Estimation, Grasp Detection to Motion Planning: A Review.
CoRR, 2019

Synthetic Data Generation and Adaption for Object Detection in Smart Vending Machines.
CoRR, 2019

A Survey on Face Data Augmentation.
CoRR, 2019

Progressive sketching with instant previewing.
Comput. Graph., 2019

Deep Consistent Illumination in Augmented Reality.
Proceedings of the IEEE International Symposium on Mixed and Augmented Reality, 2019

Video Synthesis of Human Upper Body with Realistic Face.
Proceedings of the IEEE International Symposium on Mixed and Augmented Reality, 2019

Towards More Realistic Human-Robot Conversation: A Seq2Seq-based Body Gesture Interaction System.
Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2019

A Unified Framework for Mutual Improvement of SLAM and Semantic Segmentation.
Proceedings of the International Conference on Robotics and Automation, 2019

Real-Time 3D Object Detection and Tracking in Monocular Images of Cluttered Environment.
Proceedings of the Image and Graphics - 10th International Conference, 2019

Deep Learning Based Wearable Assistive System for Visually Impaired People.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshops, 2019

A Realistic Face-to-Face Conversation System Based on Deep Neural Networks.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshops, 2019

2018
Deep Learning Based Robot for Automatically Picking Up Garbage on the Grass.
IEEE Trans. Consumer Electron., 2018

Virtual-Blind-Road Following-Based Wearable Navigation Device for Blind People.
IEEE Trans. Consumer Electron., 2018

Enhancing Sketching and Sculpting for Shape Modeling.
Proceedings of the 2018 International Conference on Cyberworlds, 2018

2017
Smart guiding glasses for visually impaired people in indoor environment.
IEEE Trans. Consumer Electron., 2017

2015
An intelligent screen system for context-related scenery viewing in smart home.
IEEE Trans. Consumer Electron., 2015

2014
Automatic user state recognition for hand gesture based low-cost television control system.
IEEE Trans. Consumer Electron., 2014

2013
Sketch-based 3D modeling and reconstruction
PhD thesis, 2013

2010
Reference Plane Assisted Sketching Interface for 3D Freeform Shape Design.
Proceedings of the 2010 International Conference on CyberWorlds, 2010


  Loading...