Zhaokai Wang

According to our database1, Zhaokai Wang authored at least 25 papers between 2019 and 2025.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
Mono-InternVL-1.5: Towards Cheaper and Faster Monolithic Multimodal Large Language Models.
CoRR, July, 2025

Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
CoRR, April, 2025

Vision-to-Music Generation: A Survey.
CoRR, March, 2025

TIDE : Temporal-Aware Sparse Autoencoders for Interpretable Diffusion Transformers in Image Generation.
CoRR, March, 2025

Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding.
CoRR, January, 2025

Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

OS Agents: A Survey on MLLM-based Agents for Computer, Phone and Browser Use.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024
Drifting Ionospheric Scintillation Simulation for L-Band Geosynchronous SAR.
IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens., 2024

Multimodal Music Generation with Explicit Bridges and Retrieval Augmentation.
CoRR, 2024

Sparkle: Mastering Basic Spatial Capabilities in Vision Language Models Elicits Generalization to Composite Spatial Reasoning.
CoRR, 2024

Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training.
CoRR, 2024

Synergizing Spatial Optimization with Large Language Models for Open-Domain Urban Itinerary Planning.
CoRR, 2024

Parameter-Inverted Image Pyramid Networks.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

ItiNera: Integrating Spatial Optimization with Large Language Models for Open-domain Urban Itinerary Planning.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: EMNLP 2024, 2024

Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
Video Background Music Generation: Dataset, Method and Evaluation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

2022
Video Background Music Generation: Dataset, Method and Evaluation.
CoRR, 2022

2021
Video Background Music Generation with Controllable Music Transformer.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Confidence-aware Non-repetitive Multimodal Transformers for TextCaps.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2019
An adaptive template matching-based single object tracking algorithm with parallel acceleration.
J. Vis. Commun. Image Represent., 2019

An Efficient Density-Based Local Outlier Detection Approach for Scattered Data.
IEEE Access, 2019

ADCMO: An Anomaly Detection Approach Based on Local Outlier Factor for Continuously Monitored Object.
Proceedings of the 2019 IEEE Intl Conf on Parallel & Distributed Processing with Applications, 2019

Deeper Monocular Depth Prediction via Long and Short Skip Connection.
Proceedings of the International Joint Conference on Neural Networks, 2019

LogGOPSC: A Parallel Computation Model Extending Network Contention into LogGOPS.
Proceedings of the 2019 IEEE International Conference on Cluster Computing, 2019


  Loading...