Xuehai He

According to our database1, Xuehai He authored at least 41 papers between 2019 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
Latent Recurrent Transformer: Architecture Exploration, Training Strategies, and Scaling Behavior.
CoRR, May, 2026

Interleaved Vision-and-Language Generation via Generative Voken.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2026

2025
Self-Evolving 3D Scene Generation from a Single Image.
CoRR, December, 2025

ThetaEvolve: Test-time Learning on Open Problems.
CoRR, November, 2025

MorphoSim: An Interactive, Controllable, and Editable Language-guided 4D World Simulator.
CoRR, October, 2025

Bridging the Gap Between Multimodal Foundation Models and World Models.
CoRR, October, 2025

GRIT: Teaching MLLMs to Think with Images.
CoRR, May, 2025

Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space.
CoRR, May, 2025

Reinforcement Learning for Reasoning in Large Language Models with One Training Example.
CoRR, April, 2025

Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

Reinforcement Learning for Reasoning in Large Language Models with One Training Example.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

GRIT: Teaching MLLMs to Think with Images.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

JARVIS: A Neuro-Symbolic Commonsense Reasoning Framework for Conversational Embodied Agents.
Proceedings of The 19th International Conference on Neurosymbolic Learning and Reasoning (NeSy 2025), 2025

EditRoom: LLM-parameterized Graph Diffusion for Composable 3D Room Layout Editing.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

VLM4D: Towards Spatiotemporal Awareness in Vision Language Models.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Is Your World Simulator a Good Story Presenter? A Consecutive Events-Based Benchmark for Future Long Video Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Worse than Random? An Embarrassingly Simple Probing Evaluation of Large Multimodal Models in Medical VQA.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

2024
FlexEControl: Flexible and Efficient Multimodal Control for Text-to-Image Generation.
Trans. Mach. Learn. Res., 2024

Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners.
Trans. Mach. Learn. Res., 2024

Simultaneous Selection and Adaptation of Source Data via Four-Level Optimization.
Trans. Assoc. Comput. Linguistics, 2024

Mojito: Motion Trajectory and Intensity Control for Video Generation.
CoRR, 2024

ComCLIP: Training-Free Compositional Image and Text Matching.
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024

Mastering Robot Manipulation with Multimodal Prompts through Pretraining and Multi-task Fine-tuning.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

2023
MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens.
CoRR, 2023

Discriminative Diffusion Models as Few-shot Vision and Language Learners.
CoRR, 2023

LayoutGPT: Compositional Visual Planning and Generation with Large Language Models.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Multimodal Graph Transformer for Multimodal Question Answering.
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 2023

Parameter-Efficient Model Adaptation for Vision Transformers.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
Parameter-efficient Fine-tuning for Vision Transformers.
CoRR, 2022

CPL: Counterfactual Prompt Learning for Vision and Language Models.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

2021
On the Generation of Medical Dialogs for COVID-19.
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

Towards Visual Question Answering on Pathology Images.
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

2020
Pathological Visual Question Answering.
CoRR, 2020

Transfer Learning or Self-supervised Learning? A Tale of Two Pretraining Paradigms.
CoRR, 2020

On the Generation of Medical Dialogues for COVID-19.
CoRR, 2020

COVID-CT-Dataset: A CT Scan Dataset about COVID-19.
CoRR, 2020

PathVQA: 30000+ Questions for Medical Visual Question Answering.
CoRR, 2020

2019
Learned Turbo Message Passing for Affine Rank Minimization and Compressed Robust Principal Component Analysis.
IEEE Access, 2019

Learned Turbo-type Affine Rank Minimization.
Proceedings of the 11th International Conference on Wireless Communications and Signal Processing, 2019


  Loading...