We stand with Ukraine

We stand with Ukraine

Xuehai He

According to our database¹, Xuehai He authored at least 41 papers between 2019 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

Latent Recurrent Transformer: Architecture Exploration, Training Strategies, and Scaling Behavior.

[DOI]

,

,

,

,

,

,

,

,

,

,

CoRR, May, 2026

Interleaved Vision-and-Language Generation via Generative Voken.

[DOI]

,

,

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2026

2025

Self-Evolving 3D Scene Generation from a Single Image.

[DOI]

,

,

,

,

,

CoRR, December, 2025

ThetaEvolve: Test-time Learning on Open Problems.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Simon Shaolei Du

,

CoRR, November, 2025

MorphoSim: An Interactive, Controllable, and Editable Language-guided 4D World Simulator.

[DOI]

,

,

Thivyanth Venkateswaran

,

,

,

,

CoRR, October, 2025

Bridging the Gap Between Multimodal Foundation Models and World Models.

[DOI]

CoRR, October, 2025

GRIT: Teaching MLLMs to Think with Images.

[DOI]

,

,

,

,

,

,

Sravana Jyothi Narayanaraju

,

,

CoRR, May, 2025

Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space.

[DOI]

,

,

,

,

,

,

,

CoRR, May, 2025

Reinforcement Learning for Reasoning in Large Language Models with One Training Example.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

Simon Shaolei Du

,

CoRR, April, 2025

Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space.

[DOI]

,

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

Reinforcement Learning for Reasoning in Large Language Models with One Training Example.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

GRIT: Teaching MLLMs to Think with Images.

[DOI]

,

,

,

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

JARVIS: A Neuro-Symbolic Commonsense Reasoning Framework for Conversational Embodied Agents.

[DOI]

,

,

,

,

,

,

,

Proceedings of The 19th International Conference on Neurosymbolic Learning and Reasoning (NeSy 2025), 2025

EditRoom: LLM-parameterized Graph Diffusion for Composable 3D Room Layout Editing.

[DOI]

,

,

,

,

,

,

,

,

,

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

William Yang Wang

,

,

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

VLM4D: Towards Spatiotemporal Awareness in Vision Language Models.

[DOI]

,

Alexander Vilesov

,

,

,

,

Aditya Nagachandra

,

,

,

,

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Is Your World Simulator a Good Story Presenter? A Consecutive Events-Based Benchmark for Future Long Video Generation.

[DOI]

,

,

,

,

,

,

Simon Shaolei Du

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Worse than Random? An Embarrassingly Simple Probing Evaluation of Large Multimodal Models in Medical VQA.

[DOI]

,

,

,

Proceedings of the Findings of the Association for Computational Linguistics, 2025

2024

FlexEControl: Flexible and Efficient Multimodal Control for Text-to-Image Generation.

[DOI]

,

,

Jacob Zhiyuan Fang

,

Robinson Piramuthu

,

,

Vicente Ordonez

,

Gunnar A. Sigurdsson

,

,

Trans. Mach. Learn. Res., 2024

Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners.

[DOI]

,

,

,

,

,

Pradyumna Narayana

,

,

William Yang Wang

,

Trans. Mach. Learn. Res., 2024

Simultaneous Selection and Adaptation of Source Data via Four-Level Optimization.

[DOI]

,

,

Trans. Assoc. Comput. Linguistics, 2024

Mojito: Motion Trajectory and Intensity Control for Video Generation.

[DOI]

,

,

,

,

,

,

,

Olatunji Ruwase

,

,

CoRR, 2024

ComCLIP: Training-Free Compositional Image and Text Matching.

[DOI]

,

,

,

Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024

Mastering Robot Manipulation with Multimodal Prompts through Pretraining and Multi-task Fine-tuning.

[DOI]

,

,

Michael Johnston

,

,

,

,

Suhaila Shakiah

,

,

William Yang Wang

Proceedings of the Forty-first International Conference on Machine Learning, 2024

2023

MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens.

[DOI]

,

,

CoRR, 2023

Discriminative Diffusion Models as Few-shot Vision and Language Learners.

[DOI]

,

,

,

,

,

Pradyumna Narayana

,

,

William Yang Wang

,

CoRR, 2023

LayoutGPT: Compositional Visual Planning and Generation with Large Language Models.

[DOI]

,

,

,

,

,

,

,

,

William Yang Wang

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis.

[DOI]

,

,

,

,

,

Pradyumna Narayana

,

,

,

William Yang Wang

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Multimodal Graph Transformer for Multimodal Question Answering.

[DOI]

,

Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 2023

Parameter-Efficient Model Adaptation for Vision Transformers.

[DOI]

,

,

Pengchuan Zhang

,

,

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022

Parameter-efficient Fine-tuning for Vision Transformers.

[DOI]

,

,

Pengchuan Zhang

,

,

CoRR, 2022

CPL: Counterfactual Prompt Learning for Vision and Language Models.

[DOI]

,

,

,

,

,

,

Pradyumna Narayana

,

,

William Yang Wang

,

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

2021

On the Generation of Medical Dialogs for COVID-19.

[DOI]

,

,

,

,

,

,

,

Subrato Chakravorty

,

,

,

,

,

,

,

,

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

Towards Visual Question Answering on Pathology Images.

[DOI]

,

,

,

,

,

,

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

2020

Pathological Visual Question Answering.

[DOI]

,

,

,

,

,

,

CoRR, 2020

Transfer Learning or Self-supervised Learning? A Tale of Two Pretraining Paradigms.

[DOI]

,

,

,

,

Shanghang Zhang

,

CoRR, 2020

On the Generation of Medical Dialogues for COVID-19.

[DOI]

,

,

,

,

Subrato Chakravorty

,

,

,

,

,

,

,

CoRR, 2020

COVID-CT-Dataset: A CT Scan Dataset about COVID-19.

[DOI]

,

,

,

CoRR, 2020

PathVQA: 30000+ Questions for Medical Visual Question Answering.

[DOI]

,

,

,

,

CoRR, 2020

2019

Learned Turbo Message Passing for Affine Rank Minimization and Compressed Robust Principal Component Analysis.

[DOI]

,

,

IEEE Access, 2019

Learned Turbo-type Affine Rank Minimization.

[DOI]

,

,

Proceedings of the 11th International Conference on Wireless Communications and Signal Processing, 2019

Loading...