Kanzhi Cheng

Orcid: 0009-0004-4532-1446

According to our database1, Kanzhi Cheng authored at least 13 papers between 2022 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents.
CoRR, June, 2025

ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows.
CoRR, May, 2025

Vision-Language Models Can Self-Improve Reasoning via Reflection.
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

OS-ATLAS: Foundation Action Model for Generalist GUI Agents.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Interactive Evolution: A Neural-Symbolic Self-Training Framework For Large Language Models.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

CapArena: Benchmarking and Analyzing Detailed Image Captioning in the LLM Era.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

2024
A Survey of Neural Code Intelligence: Paradigms, Advances and Beyond.
CoRR, 2024

SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023
Food-500 Cap: A Fine-Grained Food Caption Benchmark for Evaluating Vision-Language Models.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Beyond Generic: Enhancing Image Captioning with Real-World Knowledge using Vision-Language Pre-Training Model.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

2022
ADS-Cap: A Framework for Accurate and Diverse Stylized Captioning with Unpaired Stylistic Corpora.
Proceedings of the Natural Language Processing and Chinese Computing, 2022


  Loading...