Keze Wang

Orcid: 0000-0002-7817-8306

According to our database1, Keze Wang authored at least 126 papers between 2013 and 2026.

Collaborative distances:

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

Online presence:

On csauthors.net:

Bibliography

2026
Exploring Talking Head Models with Adjacent Frame Prior for Speech-Preserving Facial Expression Manipulation.
ACM Trans. Multim. Comput. Commun. Appl., April, 2026

The Fourth Challenge on Image Super-Resolution (⨉4) at NTIRE 2026: Benchmark Results and Method Overview.
CoRR, April, 2026

ChunQiuTR: Time-Keyed Temporal Retrieval in Classical Chinese Annals.
CoRR, April, 2026

TAG: Target-Agnostic Guidance for Stable Object-Centric Inference in Vision-Language-Action Models.
CoRR, March, 2026

DreamSAC: Learning Hamiltonian World Models via Symmetry Exploration.
CoRR, March, 2026

A Scalable Curiosity-Driven Game-Theoretic Framework for Long-Tail Multi-Label Learning in Data Mining.
CoRR, February, 2026

AgriWorld:A World Tools Protocol Framework for Verifiable Agricultural Reasoning with Code-Executing LLM Agents.
CoRR, February, 2026

RADAR: Benchmarking Vision-Language-Action Generalization via Real-World Dynamics, Spatial-Physical Intelligence, and Autonomous Evaluation.
CoRR, February, 2026

Process-of-Thought Reasoning for Videos.
CoRR, February, 2026

Spectral Gating Networks.
CoRR, February, 2026

Rational ANOVA Networks.
CoRR, February, 2026

Why Keep Your Doubts to Yourself? Trading Visual Uncertainties in Multi-Agent Bandit Systems.
CoRR, January, 2026

ResAgent: Entropy-based Prior Point Discovery and Visual Reasoning for Referring Expression Segmentation.
CoRR, January, 2026

Weather-R1: Logically Consistent Reinforcement Fine-Tuning for Multimodal Reasoning in Meteorology.
CoRR, January, 2026

3D-Agent:Tri-Modal Multi-Agent Collaboration for Scalable 3D Object Annotation.
CoRR, January, 2026

Stable Language Guidance for Vision-Language-Action Models.
CoRR, January, 2026

Exploiting Temporal Audio-Visual Correlation Embedding for Audio-Driven One-Shot Talking Head Animation.
IEEE Trans. Multim., 2026

Toward Top-Down Reasoning: An Explainable Multi-Agent Approach for Visual Question Answering.
IEEE Trans. Multim., 2026

LLM-CAS: Dynamic Neuron Perturbation for Real-Time Hallucination Correction.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

Top-Down Semantic Refinement for Image Captioning.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

ORACLE: Optimizing Reasoning Abilities of Large Language Models via Constraint-Led Synthetic Data Elicitation.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

HiVA: Self-organized Hierarchical Variable Agent via Goal-driven Semantic-Topological Evolution.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

3DAlign-DAER: Dynamic Attention Policy and Efficient Retrieval Strategy for Fine-grained 3D-Text Alignment at Scale.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

Cost-Effective Communication: An Auction-based Method for Language Agent Interaction.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

RaCoT: Plug-and-Play Contrastive Example Generation Mechanism for Enhanced LLM Reasoning Reliability.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025
Robust Egocentric Referring Video Object Segmentation via Dual-Modal Causal Intervention.
CoRR, December, 2025

Self-Rewarded Multimodal Coherent Reasoning Across Diverse Visual Domains.
CoRR, December, 2025

CoAgent: Collaborative Planning and Consistency Agent for Coherent Video Generation.
CoRR, December, 2025

RevFFN: Memory-Efficient Full-Parameter Fine-Tuning of Mixture-of-Experts LLMs with Reversible Blocks.
CoRR, December, 2025

FlashVLM: Text-Guided Visual Token Selection for Large Multimodal Models.
CoRR, December, 2025

SirenPose: Dynamic Scene Reconstruction via Geometric Supervision.
CoRR, December, 2025

LLM-CAS: Dynamic Neuron Perturbation for Real-Time Hallucination Correction.
CoRR, December, 2025

PTTA: A Pure Text-to-Animation Framework for High-Quality Creation.
CoRR, December, 2025

Reflective Confidence: Correcting Reasoning Flaws via Online Self-Correction.
CoRR, December, 2025

GTMA: Dynamic Representation Optimization for OOD Vision-Language Models.
CoRR, December, 2025

Adaptive-VoCo: Complexity-Aware Visual Token Compression for Vision-Language Models.
CoRR, December, 2025

Large Language Models as Discounted Bayesian Filters.
CoRR, December, 2025

STORM: Search-Guided Generative World Models for Robotic Manipulation.
CoRR, December, 2025

Massive Editing for Large Language Models Based on Dynamic Weight Generation.
CoRR, December, 2025

Enhancing Visual Programming for Visual Reasoning via Probabilistic Graphs.
CoRR, December, 2025

HybridToken-VLM: Hybrid Token Compression for Vision-Language Models.
CoRR, December, 2025

MM-CoT:A Benchmark for Probing Visual Chain-of-Thought Reasoning in Multimodal Models.
CoRR, December, 2025

PhyDetEx: Detecting and Explaining the Physical Plausibility of T2V Models.
CoRR, December, 2025

Causal Invariance and Counterfactual Learning Driven Cooperative Game for Multi-Label Classification.
CoRR, December, 2025

ℰ<sub>0</sub>: Enhancing Generalization and Fine-Grained Control in VLA Models via Continuized Discrete Diffusion.
CoRR, November, 2025

MM-OPERA: Benchmarking Open-ended Association Reasoning for Large Vision-Language Models.
CoRR, October, 2025

Guardian: Decoupling Exploration from Safety in Reinforcement Learning.
CoRR, October, 2025

Agent-GSPO: Communication-Efficient Multi-Agent Systems via Group Sequence Policy Optimization.
CoRR, October, 2025

Backward-Friendly Optimization: Training Large Language Models with Approximate Gradients under Memory Constraints.
CoRR, October, 2025

MAT-Agent: Adaptive Multi-Agent Training Optimization.
CoRR, October, 2025

Learning Dynamics of VLM Finetuning.
CoRR, October, 2025

Failure-Driven Workflow Refinement.
CoRR, October, 2025

VideoVerse: How Far is Your T2V Generator from a World Model?
CoRR, October, 2025

CF-VLM:CounterFactual Vision-Language Fine-tuning.
CoRR, June, 2025

From Motion to Behavior: Hierarchical Modeling of Humanoid Generative Behavior Control.
CoRR, June, 2025

Continuous Value Assignment: A Doubly Robust Data Augmentation for Off-Policy Learning.
IEEE Trans. Neural Networks Learn. Syst., May, 2025

GAM-Agent: Game-Theoretic and Uncertainty-Aware Collaboration for Complex Visual Reasoning.
CoRR, May, 2025

TimeCausality: Evaluating the Causal Ability in Time Dimension for Vision Language Models.
CoRR, May, 2025

Exploiting Temporal Audio-Visual Correlation Embedding for Audio-Driven One-Shot Talking Head Animation.
CoRR, April, 2025

Kolmogorov-Arnold Fourier Networks.
CoRR, February, 2025

SQLNet: Scale-Modulated Query and Localization Network for Few-Shot Class-Agnostic Counting.
IEEE Trans. Image Process., 2025

DART: Dual Adaptive Refinement Transfer for Open-Vocabulary Multi-Label Recognition.
Proceedings of the 33rd ACM International Conference on Multimedia, 2025

AlphaAgent: LLM-Driven Alpha Mining with Regularized Exploration to Counteract Alpha Decay.
Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, V.2, 2025

High-Fidelity Face Swapping via Fine-grained Attribute Control with Diffusion Models.
Proceedings of the International Joint Conference on Neural Networks, 2025

KABB: Knowledge-Aware Bayesian Bandits for Dynamic Expert Coordination in Multi-Agent Systems.
Proceedings of the Forty-second International Conference on Machine Learning, 2025

OSC: Cognitive Orchestration through Dynamic Knowledge Alignment in Multi-Agent LLM Collaboration.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2025, 2025

DrDiff: Dynamic Routing Diffusion with Hierarchical Attention for Breaking the Efficiency-Quality Trade-off.
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

Towards More Efficient Post-training via Fourier Domain Adapter Framework.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2025, 2025

CCG: Rare-Label Prediction via Neural SEM-Driven Causal Game.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2025, 2025

Is this Generated Person Existed in Real-world? Fine-grained Detecting and Calibrating Abnormal Human-body.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Reproducible Vision-Language Models Meet Concepts Out of Pre-Training.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

SR-FoT: A Syllogistic-Reasoning Framework of Thought for Large Language Models Tackling Knowledge-based Reasoning Tasks.
Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024
Improving Network Interpretability via Explanation Consistency Evaluation.
IEEE Trans. Multim., 2024

Multi-Person 3D Pose Estimation With Occlusion Reasoning.
IEEE Trans. Multim., 2024

Category-Adaptive Cross-Modal Semantic Refinement and Transfer for Open-Vocabulary Multi-Label Recognition.
CoRR, 2024

On Training Data Influence of GPT Models.
CoRR, 2024

Gesture Generation Via Diffusion Model with Attention Mechanism.
Proceedings of the IEEE International Conference on Acoustics, 2024

Video Super-Resolution Transformer with Masked Inter&Intra-Frame Attention.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Adaptive Prompt Routing for Arbitrary Text Style Transfer with Pre-trained Language Models.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

Mimic: Speaking Style Disentanglement for Speech-Driven 3D Facial Animation.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

NeRF-VPT: Learning Novel View Representations with Neural Radiance Fields via View Prompt Tuning.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Towards Causality-Aware Inferring: A Sequential Discriminative Approach for Medical Diagnosis.
IEEE Trans. Pattern Anal. Mach. Intell., November, 2023

Towards Top-Down Reasoning: An Explainable Multi-Agent Approach for Visual Question Answering.
CoRR, 2023

VisualProg Distiller: Learning to Fine-tune Non-differentiable Visual Programming Frameworks.
CoRR, 2023

Towards CausalGPT: A Multi-Agent Approach for Faithful Knowledge Reasoning via Promoting Causal Consistency in LLMs.
CoRR, 2023

FIRE: Fine Implicit Reconstruction Enhancement with Detailed Body Part Labels and Geometric Features.
Proceedings of the Pattern Recognition and Computer Vision - 6th Chinese Conference, 2023

Interactive Learning for Interpretable Visual Recognition via Semantic-Aware Self-Teaching Framework.
Proceedings of the Pattern Recognition and Computer Vision - 6th Chinese Conference, 2023

2022
Knowledge-Routed Visual Question Reasoning: Challenges for Deep Representation Embedding.
IEEE Trans. Neural Networks Learn. Syst., 2022

TCGL: Temporal Contrastive Graph for Self-Supervised Video Representation Learning.
IEEE Trans. Image Process., 2022

Enhancing Prototypical Few-Shot Learning By Leveraging The Local-Level Strategy.
Proceedings of the IEEE International Conference on Acoustics, 2022

2021
Semantics-Aware Adaptive Knowledge Distillation for Sensor-to-Vision Action Recognition.
IEEE Trans. Image Process., 2021

CX-ToM: Counterfactual Explanations with Theory-of-Mind for Enhancing Human Trust in Image Recognition Models.
CoRR, 2021

Temporal Contrastive Graph for Self-supervised Video Representation Learning.
CoRR, 2021

Continuous Transition: Improving Sample Efficiency for Continuous Control Problems via MixUp.
Proceedings of the IEEE International Conference on Robotics and Automation, 2021

Solving Inefficiency of Self-supervised Representation Learning.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Linguistically Routing Capsule Network for Out-of-distribution Visual Question Answering.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Mind the Context: The Impact of Contextualization in Neural Module Networks for Grounding Visual Referring Expressions.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

2020
3D Human Pose Machines with Self-Supervised Learning.
IEEE Trans. Pattern Anal. Mach. Intell., 2020

Face Hallucination by Attentive Sequence Optimization with Reinforcement Learning.
IEEE Trans. Pattern Anal. Mach. Intell., 2020

Linguistically Driven Graph Capsule Network for Visual Question Reasoning.
CoRR, 2020

Learning Reinforced Agents with Counterfactual Simulation for Medical Automatic Diagnosis.
CoRR, 2020

Grammatically Recognizing Images with Tree Convolution.
Proceedings of the KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2020

2019
Cost-Effective Object Detection: Active Sample Mining With Switchable Selection Criteria.
IEEE Trans. Neural Networks Learn. Syst., 2019

Instance-aware representation learning and association for online multi-person tracking.
Pattern Recognit., 2019

Adaptively Connected Neural Networks.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018
Deep Co-Space: Sample Mining Across Feature Transformation for Semi-Supervised Learning.
IEEE Trans. Circuits Syst. Video Technol., 2018

Active Self-Paced Learning for Cost-Effective and Progressive Face Identification.
IEEE Trans. Pattern Anal. Mach. Intell., 2018

Embedding Temporally Consistent Depth Recovery for Real-time Dense Mapping in Visual-inertial Odometry.
Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2018

Convolutional Memory Blocks for Depth Data Representation Learning.
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, 2018

Towards Human-Machine Cooperation: Self-Supervised Sample Mining for Object Detection.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Flow Guided Recurrent Neural Encoder for Video Salient Object Detection.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

2017
Structure-Preserving Image Super-Resolution via Contextualized Multitask Learning.
IEEE Trans. Multim., 2017

Cost-Effective Active Learning for Deep Image Classification.
IEEE Trans. Circuits Syst. Video Technol., 2017

Structure-Preserving Image Super-resolution via Contextualized Multi-task Learning.
CoRR, 2017

Image Retrieval with Attribute-Associated Auxiliary References.
Proceedings of the 2017 International Conference on Digital Image Computing: Techniques and Applications, 2017

Fine-Grained Butterfly Recognition with Deep Residual Networks: A New Baseline and Benchmark.
Proceedings of the 2017 International Conference on Digital Image Computing: Techniques and Applications, 2017

Recurrent 3D Pose Sequence Machines.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

Face Recognition via Heuristic Deep Active Learning.
Proceedings of the Biometric Recognition - 12th Chinese Conference, 2017

2016
A Deep Structured Model with Radius-Margin Bound for 3D Human Activity Recognition.
Int. J. Comput. Vis., 2016

Human Pose Estimation from Depth Images via Inference Embedded Multi-task Learning.
Proceedings of the 2016 ACM Conference on Multimedia Conference, 2016

Learning a lightweight deep convolutional network for joint age and gender recognition.
Proceedings of the 23rd International Conference on Pattern Recognition, 2016

Local- and holistic-structure preserving image super resolution via deep joint component learning.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2016

Dictionary Pair Classifier Driven Convolutional Neural Networks for Object Detection.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

2015
PISA: Pixelwise Image Saliency by Aggregating Complementary Appearance Contrast Measures With Edge-Preserving Coherence.
IEEE Trans. Image Process., 2015

2014
3D Human Activity Recognition with Reconfigurable Convolutional Neural Networks.
Proceedings of the ACM International Conference on Multimedia, MM '14, Orlando, FL, USA, November 03, 2014

2013
PISA: Pixelwise Image Saliency by Aggregating Complementary Appearance Contrast Measures with Spatial Priors.
Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013


  Loading...