Tai Wang

This page is a disambiguation page, it actually contains multiple papers from persons of the same or a similar name.

Bibliography

2026
Unleashing Spatial Reasoning in Multimodal Large Language Models via Textual Representation Guided Reasoning.
CoRR, March, 2026

Demystifying Action Space Design for Robotic Manipulation Policies.
CoRR, February, 2026

RoboInter: A Holistic Intermediate Representation Suite Towards Robotic Manipulation.
CoRR, February, 2026

Nimbus: A Unified Embodied Synthetic Data Generation Framework.
CoRR, January, 2026

InternVLA-A1: Unifying Understanding, Generation and Action for Robotic Manipulation.
CoRR, January, 2026

Towards Efficient and Robust Manipulation via Multi-Frame Vision-Language-Action Modeling.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025
VL-LN Bench: Towards Long-horizon Goal-oriented Navigation with Active Dialogs.
CoRR, December, 2025

LoGoPlanner: Localization Grounded Navigation Policy with Metric-aware Visual Geometry.
CoRR, December, 2025

MMSI-Video-Bench: A Holistic Benchmark for Video-Based Spatial Intelligence.
CoRR, December, 2025

Ground Slow, Move Fast: A Dual-System Foundation Model for Generalizable Vision-and-Language Navigation.
CoRR, December, 2025

Exploring the impact of the Big Five personality traits on cognitive performance in scientific reasoning: an ordered network analysis.
Cogn. Process., November, 2025

G<sup>2</sup>VLM: Geometry Grounded Vision Language Model with Unified 3D Reconstruction and Spatial Reasoning.
CoRR, November, 2025

ChangingGrounding: 3D Visual Grounding in Changing Scenes.
CoRR, October, 2025

InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for Generalist Robot Policy.
CoRR, October, 2025

X-VLA: Soft-Prompted Transformer as Scalable Cross-Embodiment Vision-Language-Action Model.
CoRR, October, 2025

InternScenes: A Large-scale Simulatable Indoor Scene Dataset with Realistic Layouts.
CoRR, September, 2025

InstructVLA: Vision-Language-Action Instruction Tuning from Understanding to Manipulation.
CoRR, July, 2025

OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding.
CoRR, July, 2025

StreamVLN: Streaming Vision-and-Language Navigation via SlowFast Context Modeling.
CoRR, July, 2025

CronusVLA: Transferring Latent Motion Across Time for Multi-Frame Prediction in Manipulation.
CoRR, June, 2025

MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence.
CoRR, May, 2025

LabUtopia: High-Fidelity Simulation and Hierarchical Benchmark for Scientific Embodied Agents.
CoRR, May, 2025

NavDP: Learning Sim-to-Real Navigation Diffusion Policy with Privileged Information Guidance.
CoRR, May, 2025

Strategic priorities for transformative progress in advancing biology with proteomics and artificial intelligence.
CoRR, February, 2025

Evolving Symbolic 3D Visual Grounder with Weakly Supervised Reflection.
CoRR, February, 2025

Position-Guided Point Cloud Panoptic Segmentation Transformer.
Int. J. Comput. Vis., January, 2025

Towards Latency-Aware 3D Streaming Perception for Autonomous Driving.
Proceedings of the IEEE International Conference on Robotics and Automation, 2025

LLaVA-3D: A Simple Yet Effective Pathway to Empowering LMMs with 3D Capabilities.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

VFLowOpt: A Token Pruning Framework for LMMs with Visual Information Flow-Guided Optimization.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Gleam: Learning Generalizable Exploration Policy for Active Mapping in Complex 3D Indoor Scenes.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Language-to-Space Programming for Training-Free 3D Visual Grounding.
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

RoboGround: Robotic Manipulation with Grounded Vision-Language Priors.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

GENMANIP: LLM-driven Simulation for Generalizable Instruction-Following Manipulation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024
Vision-Centric BEV Perception: A Survey.
IEEE Trans. Pattern Anal. Mach. Intell., December, 2024

Optimizing learning return on investment: Identifying learning strategies based on user behavior characteristic in language learning applications.
Educ. Inf. Technol., April, 2024

LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness.
CoRR, 2024

GRUtopia: Dream General Robots in a City at Scale.
CoRR, 2024

OVExp: Open Vocabulary Exploration for Object-Oriented Navigation.
CoRR, 2024

An Empirical Study of Training State-of-the-Art LiDAR Segmentation Models.
CoRR, 2024

Grounded 3D-LLM with Referent Tokens.
CoRR, 2024

MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotations.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

OctreeOcc: Efficient and Multi-Granularity Occupancy Prediction Using Octree Queries.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

CooHOI: Learning Cooperative Human-Object Interaction with Manipulated Object Dynamics.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Unified Human-Scene Interaction via Prompted Chain-of-Contacts.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

ScanReason: Empowering 3D Visual Grounding with Reasoning Capabilities.
Proceedings of the Computer Vision - ECCV 2024, 2024

Omni6D: Large-Vocabulary 3D Object Dataset for Category-Level 6D Object Pose Estimation.
Proceedings of the Computer Vision - ECCV 2024, 2024

PointLLM: Empowering Large Language Models to Understand Point Clouds.
Proceedings of the Computer Vision - ECCV 2024, 2024

Learning to Adapt SAM for Segmenting Cross-Domain Point Clouds.
Proceedings of the Computer Vision - ECCV 2024, 2024

EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

GenNBV: Generalizable Next-Best-View Policy for Active 3D Reconstruction.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

VLM-Grounder: A VLM Agent for Zero-Shot 3D Visual Grounding.
Proceedings of the Conference on Robot Learning, 6-9 November 2024, Munich, Germany., 2024

2023
SAM-guided Unsupervised Domain Adaptation for 3D Segmentation.
CoRR, 2023

Object2Scene: Putting Objects in Context for Open-Vocabulary 3D Detection.
CoRR, 2023

Towards Better 3D Knowledge Transfer via Masked Image Modeling for Multi-view 3D Understanding.
CoRR, 2023

MonoDETR: Depth-guided Transformer for Monocular 3D Object Detection.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Scene as Occupancy.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

GeoMIM: Towards Better 3D Knowledge Transfer via Masked Image Modeling for Multi-view 3D Understanding.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

MV-JAR: Masked Voxel Jigsaw and Reconstruction for LiDAR-Based Self-Supervised Pre-Training.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

DORT: Modeling Dynamic Objects in Recurrent for Multi-Camera 3D Object Detection and Tracking.
Proceedings of the Conference on Robot Learning, 2023

2022
Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR-Based Perception.
IEEE Trans. Pattern Anal. Mach. Intell., 2022

A novel active learning method for profust reliability analysis based on the Kriging model.
Eng. Comput., 2022

Vision-Centric BEV Perception: A Survey.
CoRR, 2022

MV-FCOS3D++: Multi-View Camera-Only 4D Object Detection with Pretrained Monocular Backbones.
CoRR, 2022

MonoDETR: Depth-aware Transformer for Monocular 3D Object Detection.
CoRR, 2022

SIDE: Center-based Stereo 3D Detector with Structure-aware Instance Depth Estimation.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2022

Monocular 3D Object Detection with Depth from Motion.
Proceedings of the Computer Vision - ECCV 2022, 2022

2021
Lkurtogram Guided Adaptive Empirical Wavelet Transform and Purified Instantaneous Energy Operation for Fault Diagnosis of Wind Turbine Bearing.
IEEE Trans. Instrum. Meas., 2021

Density-aware Chamfer Distance as a Comprehensive Metric for Point Cloud Completion.
CoRR, 2021

Disclosing Personal Names in Screen Names Predicts Better Final Achievement Levels in Massive Open Online Courses.
IEEE Access, 2021

Balanced Chamfer Distance as a Comprehensive Metric for Point Cloud Completion.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

FCOS3D: Fully Convolutional One-Stage Monocular 3D Object Detection.
Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, 2021

Implementation of equipment maintenance and assembly assistance system based on augmented reality.
Proceedings of the EITCE 2021: 5th International Conference on Electronic Information Technology and Computer Engineering, Xiamen, China, October 22, 2021

Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR Segmentation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Probabilistic and Geometric Depth: Detecting Objects in Perspective.
Proceedings of the Conference on Robot Learning, 8-11 November 2021, London, UK., 2021

2020
Na<sub>2</sub>CO<sub>3</sub><i>-</i>responsive Photosynthetic and ROS Scavenging Mechanisms in Chloroplasts of Alkaligrass Revealed by Phosphoproteomics.
Genom. Proteom. Bioinform., 2020

FLAVA: Find, Localize, Adjust and Verify to Annotate LiDAR-based Point Clouds.
Proceedings of the UIST '20 Adjunct: The 33rd Annual ACM Symposium on User Interface Software and Technology, 2020

Coreference Resolution Improves Educational Knowledge Graph Construction.
Proceedings of the 2020 IEEE International Conference on Knowledge Graph, 2020

SSN: Shape Signature Networks for Multi-class Object Detection from Point Clouds.
Proceedings of the Computer Vision - ECCV 2020, 2020

Reconfigurable Voxels: A New Representation for LiDAR-Based Point Clouds.
Proceedings of the 4th Conference on Robot Learning, 2020

2019
Temporal emotion-aspect modeling for discovering what students are concerned about in online course forums.
Interact. Learn. Environ., 2019

2018
Characterizing Concept Conveying in Interactions between MOOC Students and Assistants.
Proceedings of the IEEE International Conference on Teaching, 2018

An Emotion Oriented Topic Modeling Approach to Discover What Students are Concerned about in Course Forums.
Proceedings of the 18th IEEE International Conference on Advanced Learning Technologies, 2018

2016
Sentiment recognition of online course reviews using multi-swarm optimization-based selected features.
Neurocomputing, 2016

An Empirical Study on Academic Commentary and Its Implications on Reading and Writing.
CoRR, 2016

2015
基于网络社团结构的Web内容分级算法及其性能分析 (Web Content Rating Algorithm Based on Network Community Structure and its Performance Analysis).
计算机科学, 2015

Mood disorder patients' language features on their microblogs.
Int. J. Embed. Syst., 2015

2013
Sentiment Recognition of Online Chinese Micro Movie Reviews Using Multiple Probabilistic Reasoning Model.
J. Comput., 2013

2012
SOLARCAP: Super capacitor buffering of solar energy for self-sustainable field systems.
Proceedings of the IEEE 25th International SOC Conference, 2012

2007
A Minimized Latency Broadcast in Multi-Rate Wireless Mesh Networks: Distributed Formulation and Rate First Algorithm.
Proceedings of the 2007 IEEE International Conference on Multimedia and Expo, 2007

A Fast Broadcast Tree Construction in Multi-Rate Wireless Mesh Networks.
Proceedings of IEEE International Conference on Communications, 2007

2006
A dynamic caching algorithm based on internal popularity distribution of streaming media.
Multim. Syst., 2006

Variable Rate Caching for Video Delivery in Heterogeneous Environment.
Proceedings of IEEE International Conference on Communications, 2006

Internal popularity of streaming video and its implication on caching.
Proceedings of the 20th International Conference on Advanced Information Networking and Applications (AINA 2006), 2006


  Loading...