Xi Wang

Orcid: 0000-0001-5442-1116

Affiliations:
  • ETH Zurich, Zurich, Switzerland


According to our database1, Xi Wang authored at least 48 papers between 2012 and 2025.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
GaussianVLM: Scene-centric 3D Vision-Language Models using Language-aligned Gaussian Splats for Embodied Reasoning and Beyond.
CoRR, July, 2025

VisualChef: Generating Visual Aids in Cooking via Mask Inpainting.
CoRR, June, 2025

StateSpaceDiffuser: Bringing Long Context to Diffusion World Models.
CoRR, May, 2025

MAPLE: Encoding Dexterous Robotic Manipulation Priors Learned From Egocentric Videos.
CoRR, April, 2025

SIGHT: Single-Image Conditioned Generation of Hand Trajectories for Hand-Object Interaction.
CoRR, March, 2025

Leveraging Driver Field-of-View for Multimodal Ego-Trajectory Prediction.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Exploration-Driven Generative Interactive Environments.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

GEM: A Generalizable Ego-Vision Multimodal World Model for Fine-Grained Ego-Motion, Object Dynamics, and Scene Composition Control.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024
Holistic Understanding of 3D Scenes as Universal Scene Description.
CoRR, 2024

Understanding the World's Museums through Vision-Language Reasoning.
CoRR, 2024

InTraGen: Trajectory-controlled Video Generation for Object Interactions.
CoRR, 2024

EgoGaussian: Dynamic Scene Understanding from Egocentric Video with 3D Gaussian Splatting.
CoRR, 2024

OpenSUN3D: 1st Workshop Challenge on Open-Vocabulary 3D Scene Understanding.
CoRR, 2024

RHOBIN Challenge: Reconstruction of Human Object Interaction.
CoRR, 2024

Gaze-Guided Graph Neural Network for Action Anticipation Conditioned on Intention.
Proceedings of the 2024 Symposium on Eye Tracking Research and Applications, 2024

A Transformer-Based Model for the Prediction of Human Gaze Behavior on Videos.
Proceedings of the 2024 Symposium on Eye Tracking Research and Applications, 2024

PALM: Predicting Actions through Language Models.
Proceedings of the Computer Vision - ECCV 2024, 2024

Source-Free Domain-Invariant Performance Prediction.
Proceedings of the Computer Vision - ECCV 2024, 2024

ROMEO: Revisiting Optimization Methods for Reconstructing 3D Human-Object Interaction Models From Images.
Proceedings of the Computer Vision - ECCV 2024 Workshops, 2024

I-Design: Personalized LLM Interior Designer.
Proceedings of the Computer Vision - ECCV 2024 Workshops, 2024

Summarize the Past to Predict the Future: Natural Language Descriptions of Context Boost Multimodal Object Interaction Anticipation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

WANDR: Intention-guided Human Motion Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

What Do You See in Vehicle? Comprehensive Vision Solution for In-Vehicle Gaze Estimation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
G-MEMP: Gaze-Enhanced Multimodal Ego-Motion Prediction in Driving.
CoRR, 2023

LALM: Long-Term Action Anticipation with Language Models.
CoRR, 2023

Palm: Predicting Actions through Language Models @ Ego4D Long-Term Action Anticipation Challenge 2023.
CoRR, 2023

Summarize the Past to Predict the Future: Natural Language Descriptions of Context Boost Multimodal Object Interaction.
CoRR, 2023

Selecting which Dense Retriever to use for Zero-Shot Search.
Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region, 2023

Model-aware 3D Eye Gaze from Weak and Few-shot Supervisions.
Proceedings of the IEEE International Symposium on Mixed and Augmented Reality Adjunct, 2023

GazeNeRF: 3D-Aware Gaze Redirection with Neural Radiance Fields.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

EFE: End-to-end Frame-to-Gaze Estimation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Convolutional Persistence as a Remedy to Neural Model Analysis.
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2023

2022
Deepfake Caricatures: Amplifying attention to artifacts increases deepfake detection by humans and machines.
CoRR, 2022

Rethinking Persistent Homology For Visual Recognition.
Proceedings of the Topological, 2022

Reconstructing Action-Conditioned Human-Object Interactions Using Commonsense Knowledge Priors.
Proceedings of the International Conference on 3D Vision, 2022

2021
Self-Supervised 3D Hand Pose Estimation from monocular RGB via Contrastive Learning.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

EMICS'21: Eye Movements as an Interface to Cognitive State.
Proceedings of the CHI '21: CHI Conference on Human Factors in Computing Systems, 2021

2020
Toward Quantifying Ambiguities in Artistic Images.
ACM Trans. Appl. Percept., 2020

EMICS'20: Eye Movements as an Interface to Cognitive State.
Proceedings of the Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems, 2020

2019
Keep It Simple: Depth-based Dynamic Adjustment of Rendering for Head-mounted Displays Decreases Visual Comfort.
ACM Trans. Appl. Percept., 2019

Center of circle after perspective transformation.
CoRR, 2019

The Mental Image Revealed by Gaze Tracking.
Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, 2019

2018
Tracking the gaze on objects in 3D: how do people really look at the bunny?
ACM Trans. Graph., 2018

2015
Accuracy of Monocular Gaze Tracking on 3D Geometry.
Proceedings of the Eye Tracking and Visualization, 2015

2014
Comparison of Different Color Spaces for Image Segmentation using Graph-cut.
Proceedings of the VISAPP 2014, 2014

Graph-cut segmentation of polarimetric SAR images.
Proceedings of the 2014 IEEE Geoscience and Remote Sensing Symposium, 2014

2012
Consistent spatio-temporal filling of disocclusions in the multiview-video-plus-depth format.
Proceedings of the 14th IEEE International Workshop on Multimedia Signal Processing, 2012

Depth image-based rendering with spatio-temporally consistent texture synthesis for 3-D video with global motion.
Proceedings of the 19th IEEE International Conference on Image Processing, 2012


  Loading...