Kaitong Cai

According to our database1, Kaitong Cai authored at least 31 papers between 2025 and 2026.

Collaborative distances:

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
Aligning Multimodal Sequential Recommendations via Robust Direct Preference Optimization with Sparse MoE.
CoRR, March, 2026

AgriWorld:A World Tools Protocol Framework for Verifiable Agricultural Reasoning with Code-Executing LLM Agents.
CoRR, February, 2026

Process-of-Thought Reasoning for Videos.
CoRR, February, 2026

Spectral Gating Networks.
CoRR, February, 2026

Why Keep Your Doubts to Yourself? Trading Visual Uncertainties in Multi-Agent Bandit Systems.
CoRR, January, 2026

LLM-CAS: Dynamic Neuron Perturbation for Real-Time Hallucination Correction.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

Top-Down Semantic Refinement for Image Captioning.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

3DAlign-DAER: Dynamic Attention Policy and Efficient Retrieval Strategy for Fine-grained 3D-Text Alignment at Scale.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

Cost-Effective Communication: An Auction-based Method for Language Agent Interaction.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

RaCoT: Plug-and-Play Contrastive Example Generation Mechanism for Enhanced LLM Reasoning Reliability.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025
Self-Rewarded Multimodal Coherent Reasoning Across Diverse Visual Domains.
CoRR, December, 2025

CoAgent: Collaborative Planning and Consistency Agent for Coherent Video Generation.
CoRR, December, 2025

RevFFN: Memory-Efficient Full-Parameter Fine-Tuning of Mixture-of-Experts LLMs with Reversible Blocks.
CoRR, December, 2025

FlashVLM: Text-Guided Visual Token Selection for Large Multimodal Models.
CoRR, December, 2025

SirenPose: Dynamic Scene Reconstruction via Geometric Supervision.
CoRR, December, 2025

LLM-CAS: Dynamic Neuron Perturbation for Real-Time Hallucination Correction.
CoRR, December, 2025

PTTA: A Pure Text-to-Animation Framework for High-Quality Creation.
CoRR, December, 2025

STORM: Search-Guided Generative World Models for Robotic Manipulation.
CoRR, December, 2025

HybridToken-VLM: Hybrid Token Compression for Vision-Language Models.
CoRR, December, 2025

MM-CoT:A Benchmark for Probing Visual Chain-of-Thought Reasoning in Multimodal Models.
CoRR, December, 2025

Causal Invariance and Counterfactual Learning Driven Cooperative Game for Multi-Label Classification.
CoRR, December, 2025

Guardian: Decoupling Exploration from Safety in Reinforcement Learning.
CoRR, October, 2025

Backward-Friendly Optimization: Training Large Language Models with Approximate Gradients under Memory Constraints.
CoRR, October, 2025

Learning Dynamics of VLM Finetuning.
CoRR, October, 2025

Failure-Driven Workflow Refinement.
CoRR, October, 2025

CF-VLM:CounterFactual Vision-Language Fine-tuning.
CoRR, June, 2025

Kolmogorov-Arnold Fourier Networks.
CoRR, February, 2025

MAT-Agent: Adaptive Multi-Agent Training Optimization.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

OSC: Cognitive Orchestration through Dynamic Knowledge Alignment in Multi-Agent LLM Collaboration.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2025, 2025

DrDiff: Dynamic Routing Diffusion with Hierarchical Attention for Breaking the Efficiency-Quality Trade-off.
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

CCG: Rare-Label Prediction via Neural SEM-Driven Causal Game.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2025, 2025


  Loading...