Xi Wang

Orcid: 0000-0001-5442-1116

Affiliations:

ETH Zurich, Zurich, Switzerland

According to our database¹, Xi Wang authored at least 51 papers between 2012 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Bibliography

2025

GaussianVLM: Scene-Centric 3D Vision-Language Models Using Language-Aligned Gaussian Splats for Embodied Reasoning and Beyond.

[BibT_eX]

[DOI]

IEEE Robotics Autom. Lett., December, 2025

LangHOPS: Language Grounded Hierarchical Open-Vocabulary Part Segmentation.

[BibT_eX]

[DOI]

CoRR, October, 2025

LLM Agents Beyond Utility: An Open-Ended Perspective.

[BibT_eX]

[DOI]

Asen Nachkov

Xi Wang

Luc Van Gool

CoRR, October, 2025

VisualChef: Generating Visual Aids in Cooking via Mask Inpainting.

[BibT_eX]

[DOI]

CoRR, June, 2025

StateSpaceDiffuser: Bringing Long Context to Diffusion World Models.

[BibT_eX]

[DOI]

CoRR, May, 2025

MAPLE: Encoding Dexterous Robotic Manipulation Priors Learned From Egocentric Videos.

[BibT_eX]

[DOI]

Robert K. Katzschmann

Marc Pollefeys

CoRR, April, 2025

SIGHT: Single-Image Conditioned Generation of Hand Trajectories for Hand-Object Interaction.

[BibT_eX]

[DOI]

CoRR, March, 2025

Leveraging Gradient Information for Out-of-Domain Performance Estimations.

[BibT_eX]

[DOI]

Proceedings of the Machine Learning and Knowledge Discovery in Databases. Research Track, 2025

Leveraging Driver Field-of-View for Multimodal Ego-Trajectory Prediction.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Exploration-Driven Generative Interactive Environments.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

GEM: A Generalizable Ego-Vision Multimodal World Model for Fine-Grained Ego-Motion, Object Dynamics, and Scene Composition Control.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

EgoGaussian: Dynamic Scene Understanding from Egocentric Video with 3D Gaussian Splatting.

[BibT_eX]

[DOI]

Proceedings of the International Conference on 3D Vision, 2025

2024

Holistic Understanding of 3D Scenes as Universal Scene Description.

[BibT_eX]

[DOI]

CoRR, 2024

Understanding the World's Museums through Vision-Language Reasoning.

[BibT_eX]

[DOI]

Ada-Astrid Balauca

Sanjana Garai

Stefan Balauca

Rasesh Udayakumar Shetty

Naitik Agrawal

Dhwanil Subhashbhai Shah

CoRR, 2024

InTraGen: Trajectory-controlled Video Generation for Object Interactions.

[BibT_eX]

[DOI]

CoRR, 2024

OpenSUN3D: 1st Workshop Challenge on Open-Vocabulary 3D Scene Understanding.

[BibT_eX]

[DOI]

CoRR, 2024

RHOBIN Challenge: Reconstruction of Human Object Interaction.

[BibT_eX]

[DOI]

CoRR, 2024

Gaze-Guided Graph Neural Network for Action Anticipation Conditioned on Intention.

[BibT_eX]

[DOI]

Proceedings of the 2024 Symposium on Eye Tracking Research and Applications, 2024

A Transformer-Based Model for the Prediction of Human Gaze Behavior on Videos.

[BibT_eX]

[DOI]

Proceedings of the 2024 Symposium on Eye Tracking Research and Applications, 2024

PALM: Predicting Actions through Language Models.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Source-Free Domain-Invariant Performance Prediction.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

ROMEO: Revisiting Optimization Methods for Reconstructing 3D Human-Object Interaction Models From Images.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024 Workshops, 2024

I-Design: Personalized LLM Interior Designer.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024 Workshops, 2024

Summarize the Past to Predict the Future: Natural Language Descriptions of Context Boost Multimodal Object Interaction Anticipation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

WANDR: Intention-guided Human Motion Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

What Do You See in Vehicle? Comprehensive Vision Solution for In-Vehicle Gaze Estimation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

G-MEMP: Gaze-Enhanced Multimodal Ego-Motion Prediction in Driving.

[BibT_eX]

[DOI]

CoRR, 2023

LALM: Long-Term Action Anticipation with Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

Palm: Predicting Actions through Language Models @ Ego4D Long-Term Action Anticipation Challenge 2023.

[BibT_eX]

[DOI]

CoRR, 2023

Summarize the Past to Predict the Future: Natural Language Descriptions of Context Boost Multimodal Object Interaction.

[BibT_eX]

[DOI]

CoRR, 2023

Selecting which Dense Retriever to use for Zero-Shot Search.

[BibT_eX]

[DOI]

Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region, 2023

Model-aware 3D Eye Gaze from Weak and Few-shot Supervisions.

[BibT_eX]

[DOI]

Nikola Popovic

Dimitrios Christodoulou

Danda Pani Paudel

Xi Wang

Luc Van Gool

Proceedings of the IEEE International Symposium on Mixed and Augmented Reality Adjunct, 2023

GazeNeRF: 3D-Aware Gaze Redirection with Neural Radiance Fields.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

EFE: End-to-end Frame-to-Gaze Estimation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Convolutional Persistence as a Remedy to Neural Model Analysis.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Artificial Intelligence and Statistics, 2023

2022

Deepfake Caricatures: Amplifying attention to artifacts increases deepfake detection by humans and machines.

[BibT_eX]

[DOI]

CoRR, 2022

Rethinking Persistent Homology For Visual Recognition.

[BibT_eX]

[DOI]

Proceedings of the Topological, 2022

Reconstructing Action-Conditioned Human-Object Interactions Using Commonsense Knowledge Priors.

[BibT_eX]

[DOI]

Proceedings of the International Conference on 3D Vision, 2022

2021

Self-Supervised 3D Hand Pose Estimation from monocular RGB via Contrastive Learning.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

EMICS'21: Eye Movements as an Interface to Cognitive State.

[BibT_eX]

[DOI]

Proceedings of the CHI '21: CHI Conference on Human Factors in Computing Systems, 2021

2020

Toward Quantifying Ambiguities in Artistic Images.

[BibT_eX]

[DOI]

ACM Trans. Appl. Percept., 2020

EMICS'20: Eye Movements as an Interface to Cognitive State.

[BibT_eX]

[DOI]

Proceedings of the Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems, 2020

2019

Keep It Simple: Depth-based Dynamic Adjustment of Rendering for Head-mounted Displays Decreases Visual Comfort.

[BibT_eX]

[DOI]

Jochen Jacobs

Xi Wang

Marc Alexa

ACM Trans. Appl. Percept., 2019

Center of circle after perspective transformation.

[BibT_eX]

[DOI]

Xi Wang

Albert Chern

Marc Alexa

CoRR, 2019

The Mental Image Revealed by Gaze Tracking.

[BibT_eX]

[DOI]

Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, 2019

2018

Tracking the gaze on objects in 3D: how do people really look at the bunny?

[BibT_eX]

[DOI]

ACM Trans. Graph., 2018

2015

Accuracy of Monocular Gaze Tracking on 3D Geometry.

[BibT_eX]

[DOI]

Proceedings of the Eye Tracking and Visualization, 2015

2014

Comparison of Different Color Spaces for Image Segmentation using Graph-cut.

[BibT_eX]

[DOI]

Proceedings of the VISAPP 2014, 2014

Graph-cut segmentation of polarimetric SAR images.

[BibT_eX]

[DOI]

Ronny Hänsch

Olaf Hellwich

Xi Wang

Proceedings of the 2014 IEEE Geoscience and Remote Sensing Symposium, 2014

2012

Consistent spatio-temporal filling of disocclusions in the multiview-video-plus-depth format.

[BibT_eX]

[DOI]

Proceedings of the 14th IEEE International Workshop on Multimedia Signal Processing, 2012

Depth image-based rendering with spatio-temporally consistent texture synthesis for 3-D video with global motion.

[BibT_eX]

[DOI]

Proceedings of the 19th IEEE International Conference on Image Processing, 2012

Xi Wang

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...