Wanrong Zhu

Orcid: 0009-0005-3448-0078

According to our database1, Wanrong Zhu authored at least 35 papers between 2018 and 2025.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
GUI-AIMA: Aligning Intrinsic Multimodal Attention with a Context Anchor for GUI Grounding.
CoRR, November, 2025

Online Statistical Inference of Constrained Stochastic Optimization via Random Scaling.
CoRR, May, 2025

Towards Visual Text Grounding of Multimodal Large Language Model.
CoRR, April, 2025

MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024
MMSci: A Multimodal Multi-Discipline Dataset for PhD-Level Scientific Comprehension.
CoRR, 2024

List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs.
CoRR, 2024

Automatic Layout Planning for Visually-Rich Documents with Instruction-Following Models.
CoRR, 2024

High Confidence Level Inference is Almost Free using Parallel Stochastic Optimization.
CoRR, 2024

Multimodal Procedural Planning via Dual Text-Image Prompting.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

VELMA: Verbalization Embodiment of LLM Agents for Vision and Language Navigation in Street View.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation.
CoRR, 2023

VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use.
CoRR, 2023

OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models.
CoRR, 2023

Weighted Averaged Stochastic Gradient Descent: Asymptotic Normality and Optimality.
CoRR, 2023

Large Language Models Are Implicitly Topic Models: Explaining and Finding Good Demonstrations for In-Context Learning.
CoRR, 2023

Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved with Text.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Large Language Models Are Latent Variable Models: Explaining and Finding Good Demonstrations for In-Context Learning.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

LayoutGPT: Compositional Visual Planning and Generation with Large Language Models.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

VisIT-Bench: A Dynamic Benchmark for Evaluating Instruction-Following Vision-and-Language Models.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Neuro-Symbolic Procedural Planning with Commonsense Prompting.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Collaborative Generative AI: Integrating GPT-k for Efficient Editing in Text-to-Image Generation.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Visualize Before You Write: Imagination-Guided Open-Ended Text Generation.
Proceedings of the Findings of the Association for Computational Linguistics: EACL 2023, 2023

ImaginE: An Imagination-Based Automatic Evaluation Metric for Natural Language Generation.
Proceedings of the Findings of the Association for Computational Linguistics: EACL 2023, 2023

2022
Beyond Sub-Gaussian Noises: Sharp Concentration Analysis for Stochastic Gradient Descent.
J. Mach. Learn. Res., 2022

CLIP also Understands Text: Prompting CLIP for Phrase Understanding.
CoRR, 2022

Neuro-Symbolic Causal Language Planning with Commonsense Prompting.
CoRR, 2022

Diagnosing Vision-and-Language Navigation: What Really Matters.
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

Imagination-Augmented Natural Language Understanding.
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

End-to-end Dense Video Captioning as Sequence Generation.
Proceedings of the 29th International Conference on Computational Linguistics, 2022

2021
Multimodal Text Style Transfer for Outdoor Vision-and-Language Navigation.
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 2021

2020
A Fully Online Approach for Covariance Matrices Estimation of Stochastic Gradient Descent Solutions.
CoRR, 2020

Towards Understanding Sample Variance in Visually Grounded Language Generation: Evaluations and Observations.
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

2019
Text Infilling.
CoRR, 2019

Texar: A Modularized, Versatile, and Extensible Toolkit for Text Generation.
Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019

2018
Texar: A Modularized, Versatile, and Extensible Toolkit for Text Generation.
CoRR, 2018


  Loading...