Wenyi Hong

According to our database1, Wenyi Hong authored at least 27 papers between 2017 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
Vision2Web: A Hierarchical Benchmark for Visual Website Development with Agent Verification.
CoRR, March, 2026

GLM-OCR Technical Report.
CoRR, March, 2026

2025
UI2CodeN: A Visual Language Model for Test-Time Scalable Interactive UI-to-Code Generation.
CoRR, November, 2025

WebVIA: A Web-based Vision-Language Agentic Framework for Interactive and Verifiable UI-to-Code Generation.
CoRR, November, 2025

Glyph: Scaling Context Windows via Visual-Text Compression.
CoRR, October, 2025

GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning.
CoRR, July, 2025

CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

CogCoM: A Visual Language Model with Chain-of-Manipulations Reasoning.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

LVBench: An Extreme Long Video Understanding Benchmark.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024
MathGLM-Vision: Solving Mathematical Problems with Multi-Modal Large Language Model.
CoRR, 2024

CogVLM2: Visual Language Models for Image and Video Understanding.
CoRR, 2024

VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents.
CoRR, 2024

CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer.
CoRR, 2024

LVBench: An Extreme Long Video Understanding Benchmark.
CoRR, 2024

CogCoM: Train Large Vision-Language Models Diving into Details through Chain of Manipulations.
CoRR, 2024

CogVLM: Visual Expert for Pretrained Language Models.
Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Relay Diffusion: Unifying diffusion process across resolutions for image synthesis.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Inf-DiT: Upsampling Any-Resolution Image with Memory-Efficient Diffusion Transformer.
Proceedings of the Computer Vision - ECCV 2024, 2024

CogAgent: A Visual Language Model for GUI Agents.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
CogAgent: A Visual Language Model for GUI Agents.
CoRR, 2023

CogVLM: Visual Expert for Pretrained Language Models.
CoRR, 2023

CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

2022
CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

2021
CogView: Mastering Text-to-Image Generation via Transformers.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

2017
Improved Approximation Algorithm for the Combination of Parallel Machine Scheduling and Vertex Cover.
Int. J. Found. Comput. Sci., 2017


  Loading...