Xiaojian Ma

Orcid: 0000-0001-5609-3822

Affiliations:

State Key Laboratory of General Artificial Intelligence, BIGAI, China

According to our database¹, Xiaojian Ma authored at least 63 papers between 2018 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Bibliography

2025

MILR: Improving Multimodal Image Generation via Test-Time Latent Reasoning.

[BibT_eX]

[DOI]

CoRR, September, 2025

NEP: Autoregressive Image Editing via Next Editing Token Prediction.

[BibT_eX]

[DOI]

CoRR, August, 2025

Move to Understand a 3D Scene: Bridging Visual Grounding and Exploration for Efficient and Versatile Embodied Navigation.

[BibT_eX]

[DOI]

CoRR, July, 2025

LEO-VL: Towards 3D Vision-Language Generalists via Data Scaling with Efficient Representation.

[BibT_eX]

[DOI]

CoRR, June, 2025

From Objects to Anywhere: A Holistic Benchmark for Multi-level Visual Grounding in 3D Scenes.

[BibT_eX]

[DOI]

CoRR, June, 2025

FlowDreamer: A RGB-D World Model with Flow-based Motion Representations for Robot Manipulation.

[BibT_eX]

[DOI]

CoRR, May, 2025

Iterative Trajectory Exploration for Multimodal Agents.

[BibT_eX]

[DOI]

CoRR, April, 2025

TongUI: Building Generalized GUI Agents by Learning from Multimodal Web Tutorials.

[BibT_eX]

[DOI]

CoRR, April, 2025

Building LLM Agents by Incorporating Insights from Computer Systems.

[BibT_eX]

[DOI]

CoRR, April, 2025

JARVIS-1: Open-World Multi-Task Agents With Memory-Augmented Multimodal Language Models.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., March, 2025

Falcon: Fast Visuomotor Policies via Partial Denoising.

[BibT_eX]

[DOI]

CoRR, March, 2025

LongViTU: Instruction Tuning for Long-Form Video Understanding.

[BibT_eX]

[DOI]

CoRR, January, 2025

Embodied VideoAgent: Persistent Memory from Egocentric Videos and Embodied Sensors Enables Dynamic Scene Understanding.

[BibT_eX]

[DOI]

CoRR, January, 2025

Multi-modal Agent Tuning: Building a VLM-Driven Agent for Efficient Tool Usage.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

GROOT-2: Weakly Supervised Multimodal Instruction Following Agents.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

ROCKET-1: Mastering Open-World Interaction with Visual-Temporal Context Prompting.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play Visual Games with Keyboards and Mouse.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2025

2024

GROOT-2: Weakly Supervised Multi-Modal Instruction Following Agents.

[BibT_eX]

[DOI]

CoRR, 2024

ROCKET-1: Master Open-World Interaction with Visual-Temporal Context Prompting.

[BibT_eX]

[DOI]

CoRR, 2024

Task-oriented Sequential Grounding in 3D Scenes.

[BibT_eX]

[DOI]

CoRR, 2024

Latent Energy-Based Odyssey: Black-Box Optimization via Expanded Exploration in the Energy-Based Latent Space.

[BibT_eX]

[DOI]

CoRR, 2024

Semantic Gaussians: Open-Vocabulary Scene Understanding with 3D Gaussian Splatting.

[BibT_eX]

[DOI]

CoRR, 2024

VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding.

[BibT_eX]

[DOI]

CoRR, 2024

RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon Generation.

[BibT_eX]

[DOI]

CoRR, 2024

UltraEdit: Instruction-based Fine-Grained Image Editing at Scale.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

OmniJARVIS: Unified Vision-Language-Action Tokenization Enables Open-World Instruction Following Agents.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Multi-modal Situated Reasoning in 3D Scenes.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

MindAgent: Emergent Gaming Interaction.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2024, 2024

An Embodied Generalist Agent in 3D World.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

MMICL: Empowering Vision-language Model with Multi-Modal In-Context Learning.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Bongard-OpenWorld: Few-Shot Reasoning for Free-form Visual Concepts in the Real World.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

GROOT: Learning to Follow Instructions by Watching Gameplay Videos.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Unifying 3D Vision-Language Understanding via Promptable Queries.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

[inline-graphic not available: see fulltext]VideoAgent: A Memory-Augmented Multimodal Agent for Video Understanding.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

CLOVA: A Closed-LOop Visual Assistant with Tool Usage and Update.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

MindAgent: Emergent Gaming Interaction.

[BibT_eX]

[DOI]

CoRR, 2023

Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents.

[BibT_eX]

[DOI]

CoRR, 2023

Learning Energy-Based Prior Model with Diffusion-Amortized MCMC.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Describe, Explain, Plan and Select: Interactive Planning with LLMs Enables Open-World Multi-Task Agents.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

SQA3D: Situated Question Answering in 3D Scenes.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

3D-VisTA: Pre-trained Transformer for 3D Vision and Text Alignment.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Open-World Multi-Task Control Through Goal-Aware Representation Learning and Adaptive Horizon Prediction.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

Perceive, Ground, Reason, and Act: A Benchmark for General-purpose Visual Representation.

[BibT_eX]

[DOI]

CoRR, 2022

Latent Diffusion Energy-Based Model for Interpretable Text Modeling.

[BibT_eX]

[DOI]

CoRR, 2022

Latent Diffusion Energy-Based Model for Interpretable Text Modelling.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2022

RelViT: Concept-guided Vision Transformer for Visual Relational Reasoning.

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Learning Representations, 2022

Bongard-HOI: Benchmarking Few-Shot Visual Reasoning for Human-Object Interactions.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021

HALMA: Humanlike Abstraction Learning Meets Affordance in Rapid Problem Solving.

[BibT_eX]

[DOI]

CoRR, 2021

Unsupervised Foreground Extraction via Deep Region Competition.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Adversarial Option-Aware Hierarchical Imitation Learning.

[BibT_eX]

[DOI]

Proceedings of the 38th International Conference on Machine Learning, 2021

2020

Robust Robotic Pouring using Audition and Haptics.

[BibT_eX]

[DOI]

CoRR, 2020

Robust Robotic Pouring using Audition and Haptics.

[BibT_eX]

[DOI]

Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2020

A Mobile Robot Hand-Arm Teleoperation System by Vision and IMU.

[BibT_eX]

[DOI]

Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2020

Reinforcement Learning from Imperfect Demonstrations under Soft Expert Guidance.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

Theory-Based Causal Transfer: Integrating Instance-Level Induction and Abstract-Level Structure Learning.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019

Making Sense of Audio Vibration for Liquid Height Estimation in Robotic Pouring.

[BibT_eX]

[DOI]

CoRR, 2019

Imitation Learning from Observations by Minimizing Inverse Dynamics Disagreement.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Making Sense of Audio Vibration for Liquid Height Estimation in Robotic Pouring.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2019

PointNetGPD: Detecting Grasp Configurations from Point Sets.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Robotics and Automation, 2019

Vision-based Teleoperation of Shadow Dexterous Hand using End-to-End Deep Neural Network.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Robotics and Automation, 2019

Task Transfer by Preference-Based Cost Learning.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018

Learning and Inference Movement with Deep Generative Model.

[BibT_eX]

[DOI]

CoRR, 2018

Adversarial Task Transfer from Preference.

[BibT_eX]

[DOI]

CoRR, 2018

Xiaojian Ma

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...