Wanrong Zhu

Orcid: 0009-0005-3448-0078

According to our database¹, Wanrong Zhu authored at least 42 papers between 2018 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Bibliography

2026

Conformalized Percentile Interval: Finite Sample Validity and Improved Conditional Performance.

[BibT_eX]

[DOI]

Ran Zou

Wanrong Zhu

Bin Nan

CoRR, May, 2026

Refining Covariance Matrix Estimation in Stochastic Gradient Descent Through Bias Reduction.

[BibT_eX]

[DOI]

CoRR, April, 2026

AnyDoc: Enhancing Document Generation via Large-Scale HTML/CSS Data Synthesis and Height-Aware Reinforcement Optimization.

[BibT_eX]

[DOI]

Jiawei Lin

Wanrong Zhu

Vlad I. Morariu

Christopher Tensmeyer

CoRR, March, 2026

A Flexible Empirical Bayes Approach to Generalized Linear Models, with Applications to Sparse Logistic Regression.

[BibT_eX]

[DOI]

Dongyue Xie

Wanrong Zhu

Matthew Stephens

CoRR, January, 2026

MiLDEdit: Reasoning-Based Multi-Layer Design Document Editing.

[BibT_eX]

[DOI]

Christopher Tensmeyer

CoRR, January, 2026

OIDA-QA: A Multimodal Benchmark for Analyzing the Opioid Industry Documents Archive.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025

Text-Conditioned Background Generation for Editable Multi-Layer Documents.

[BibT_eX]

[DOI]

CoRR, December, 2025

GUI-AIMA: Aligning Intrinsic Multimodal Attention with a Context Anchor for GUI Grounding.

[BibT_eX]

[DOI]

CoRR, November, 2025

Online Statistical Inference of Constrained Stochastic Optimization via Random Scaling.

[BibT_eX]

[DOI]

CoRR, May, 2025

Towards Visual Text Grounding of Multimodal Large Language Model.

[BibT_eX]

[DOI]

CoRR, April, 2025

MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024

MMSci: A Multimodal Multi-Discipline Dataset for PhD-Level Scientific Comprehension.

[BibT_eX]

[DOI]

CoRR, 2024

List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs.

[BibT_eX]

[DOI]

CoRR, 2024

Automatic Layout Planning for Visually-Rich Documents with Instruction-Following Models.

[BibT_eX]

[DOI]

CoRR, 2024

High Confidence Level Inference is Almost Free using Parallel Stochastic Optimization.

[BibT_eX]

[DOI]

CoRR, 2024

Multimodal Procedural Planning via Dual Text-Image Prompting.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

VELMA: Verbalization Embodiment of LLM Agents for Vision and Language Navigation in Street View.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation.

[BibT_eX]

[DOI]

CoRR, 2023

VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use.

[BibT_eX]

[DOI]

CoRR, 2023

OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

Weighted Averaged Stochastic Gradient Descent: Asymptotic Normality and Optimality.

[BibT_eX]

[DOI]

Ziyang Wei

Wanrong Zhu

Wei Biao Wu

CoRR, 2023

Large Language Models Are Implicitly Topic Models: Explaining and Finding Good Demonstrations for In-Context Learning.

[BibT_eX]

[DOI]

Xinyi Wang

Wanrong Zhu

William Yang Wang

CoRR, 2023

Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved with Text.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Large Language Models Are Latent Variable Models: Explaining and Finding Good Demonstrations for In-Context Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

LayoutGPT: Compositional Visual Planning and Generation with Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

VisIT-Bench: A Dynamic Benchmark for Evaluating Instruction-Following Vision-and-Language Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Neuro-Symbolic Procedural Planning with Commonsense Prompting.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Collaborative Generative AI: Integrating GPT-k for Efficient Editing in Text-to-Image Generation.

[BibT_eX]

[DOI]

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Visualize Before You Write: Imagination-Guided Open-Ended Text Generation.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EACL 2023, 2023

ImaginE: An Imagination-Based Automatic Evaluation Metric for Natural Language Generation.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EACL 2023, 2023

2022

Beyond Sub-Gaussian Noises: Sharp Concentration Analysis for Stochastic Gradient Descent.

[BibT_eX]

[DOI]

Wanrong Zhu

Zhipeng Lou

Wei Biao Wu

J. Mach. Learn. Res., 2022

CLIP also Understands Text: Prompting CLIP for Phrase Understanding.

[BibT_eX]

[DOI]

CoRR, 2022

Neuro-Symbolic Causal Language Planning with Commonsense Prompting.

[BibT_eX]

[DOI]

CoRR, 2022

Diagnosing Vision-and-Language Navigation: What Really Matters.

[BibT_eX]

[DOI]

Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

Imagination-Augmented Natural Language Understanding.

[BibT_eX]

[DOI]

Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

End-to-end Dense Video Captioning as Sequence Generation.

[BibT_eX]

[DOI]

Proceedings of the 29th International Conference on Computational Linguistics, 2022

2021

Multimodal Text Style Transfer for Outdoor Vision-and-Language Navigation.

[BibT_eX]

[DOI]

Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 2021

2020

A Fully Online Approach for Covariance Matrices Estimation of Stochastic Gradient Descent Solutions.

[BibT_eX]

[DOI]

Wanrong Zhu

Xi Chen

Wei Biao Wu

CoRR, 2020

Towards Understanding Sample Variance in Visually Grounded Language Generation: Evaluations and Observations.

[BibT_eX]

[DOI]

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

2019

Text Infilling.

[BibT_eX]

[DOI]

Wanrong Zhu

Zhiting Hu

Eric P. Xing

CoRR, 2019

Texar: A Modularized, Versatile, and Extensible Toolkit for Text Generation.

[BibT_eX]

[DOI]

Devendra Singh Sachan

Eric P. Xing

Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019

2018

Texar: A Modularized, Versatile, and Extensible Toolkit for Text Generation.

[BibT_eX]

[DOI]

Devendra Singh Sachan

Eric P. Xing

CoRR, 2018

Wanrong Zhu

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...