Wanrong Zhu

Orcid: 0009-0005-3448-0078

According to our database¹, Wanrong Zhu authored at least 35 papers between 2018 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Bibliography

2025

GUI-AIMA: Aligning Intrinsic Multimodal Attention with a Context Anchor for GUI Grounding.

[BibT_eX]

[DOI]

CoRR, November, 2025

Online Statistical Inference of Constrained Stochastic Optimization via Random Scaling.

[BibT_eX]

[DOI]

CoRR, May, 2025

Towards Visual Text Grounding of Multimodal Large Language Model.

[BibT_eX]

[DOI]

CoRR, April, 2025

MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024

MMSci: A Multimodal Multi-Discipline Dataset for PhD-Level Scientific Comprehension.

[BibT_eX]

[DOI]

CoRR, 2024

List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs.

[BibT_eX]

[DOI]

CoRR, 2024

Automatic Layout Planning for Visually-Rich Documents with Instruction-Following Models.

[BibT_eX]

[DOI]

CoRR, 2024

High Confidence Level Inference is Almost Free using Parallel Stochastic Optimization.

[BibT_eX]

[DOI]

CoRR, 2024

Multimodal Procedural Planning via Dual Text-Image Prompting.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

VELMA: Verbalization Embodiment of LLM Agents for Vision and Language Navigation in Street View.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation.

[BibT_eX]

[DOI]

CoRR, 2023

VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use.

[BibT_eX]

[DOI]

CoRR, 2023

OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

Weighted Averaged Stochastic Gradient Descent: Asymptotic Normality and Optimality.

[BibT_eX]

[DOI]

Ziyang Wei

Wanrong Zhu

Wei Biao Wu

CoRR, 2023

Large Language Models Are Implicitly Topic Models: Explaining and Finding Good Demonstrations for In-Context Learning.

[BibT_eX]

[DOI]

Xinyi Wang

Wanrong Zhu

William Yang Wang

CoRR, 2023

Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved with Text.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Large Language Models Are Latent Variable Models: Explaining and Finding Good Demonstrations for In-Context Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

LayoutGPT: Compositional Visual Planning and Generation with Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

VisIT-Bench: A Dynamic Benchmark for Evaluating Instruction-Following Vision-and-Language Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Neuro-Symbolic Procedural Planning with Commonsense Prompting.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Collaborative Generative AI: Integrating GPT-k for Efficient Editing in Text-to-Image Generation.

[BibT_eX]

[DOI]

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Visualize Before You Write: Imagination-Guided Open-Ended Text Generation.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EACL 2023, 2023

ImaginE: An Imagination-Based Automatic Evaluation Metric for Natural Language Generation.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EACL 2023, 2023

2022

Beyond Sub-Gaussian Noises: Sharp Concentration Analysis for Stochastic Gradient Descent.

[BibT_eX]

[DOI]

Wanrong Zhu

Zhipeng Lou

Wei Biao Wu

J. Mach. Learn. Res., 2022

CLIP also Understands Text: Prompting CLIP for Phrase Understanding.

[BibT_eX]

[DOI]

CoRR, 2022

Neuro-Symbolic Causal Language Planning with Commonsense Prompting.

[BibT_eX]

[DOI]

CoRR, 2022

Diagnosing Vision-and-Language Navigation: What Really Matters.

[BibT_eX]

[DOI]

Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

Imagination-Augmented Natural Language Understanding.

[BibT_eX]

[DOI]

Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

End-to-end Dense Video Captioning as Sequence Generation.

[BibT_eX]

[DOI]

Proceedings of the 29th International Conference on Computational Linguistics, 2022

2021

Multimodal Text Style Transfer for Outdoor Vision-and-Language Navigation.

[BibT_eX]

[DOI]

Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 2021

2020

A Fully Online Approach for Covariance Matrices Estimation of Stochastic Gradient Descent Solutions.

[BibT_eX]

[DOI]

Wanrong Zhu

Xi Chen

Wei Biao Wu

CoRR, 2020

Towards Understanding Sample Variance in Visually Grounded Language Generation: Evaluations and Observations.

[BibT_eX]

[DOI]

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

2019

Text Infilling.

[BibT_eX]

[DOI]

Wanrong Zhu

Zhiting Hu

Eric P. Xing

CoRR, 2019

Texar: A Modularized, Versatile, and Extensible Toolkit for Text Generation.

[BibT_eX]

[DOI]

Devendra Singh Sachan

Eric P. Xing

Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019

2018

Texar: A Modularized, Versatile, and Extensible Toolkit for Text Generation.

[BibT_eX]

[DOI]

Devendra Singh Sachan

Eric P. Xing

CoRR, 2018

Wanrong Zhu

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...