Yifei Huang

Christoph Feichtenhofer

IEEE Trans. Pattern Anal. Mach. Intell., November, 2025

Weakly Supervised Temporal Sentence Grounding via Positive Sample Mining.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., October, 2025

FineBio: A Fine-Grained Video Dataset of Biological Experiments with Hierarchical Annotation.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., October, 2025

Guiding Audio-Visual Question Answering with Collective Question Reasoning.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., October, 2025

Vinci: A Real-time Smart Assistant Based on Egocentric Vision-language Model for Portable Devices.

[BibT_eX]

[DOI]

Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., September, 2025

EgoExoBench: A Benchmark for First- and Third-person View Video Understanding in MLLMs.

[BibT_eX]

[DOI]

CoRR, July, 2025

Bridging Perspectives: A Survey on Cross-view Collaborative Intelligence with Egocentric-Exocentric Vision.

[BibT_eX]

[DOI]

CoRR, June, 2025

Egocentric Action-aware Inertial Localization in Point Clouds.

[BibT_eX]

[DOI]

CoRR, May, 2025

Learning Streaming Video Representation via Multitask Training.

[BibT_eX]

[DOI]

CoRR, April, 2025

An Egocentric Vision-Language Model based Portable Real-time Smart Assistant.

[BibT_eX]

[DOI]

CoRR, March, 2025

AutoGaze: A Very Initial Exploration in A SAM2-based Pipeline for Automated Eye-Object Interaction Analysis in First-Person Videos.

[BibT_eX]

[DOI]

Qing Zhang

Jun Rekimoto

Proceedings of the IEEE Conference on Virtual Reality and 3D User Interfaces, 2025

Egocentric Object-Interaction Anticipation with Retentive and Predictive Learning.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence, 2025

EgoExo-Gen: Ego-centric Video Prediction by Watching Exo-centric Videos.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Modeling Fine-Grained Hand-Object Dynamics for Egocentric Video Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

SiMHand: Mining Similar Hands for Large-Scale 3D Hand Pose Pre-training.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024

Matching Compound Prototypes for Few-Shot Action Recognition.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., September, 2024

Vinci: A Real-time Embodied Smart Assistant based on Egocentric Vision-Language Model.

[BibT_eX]

[DOI]

CoRR, 2024

Pre-Training for 3D Hand Pose Estimation with Contrastive Learning on Large-Scale Hand Images in the Wild.

[BibT_eX]

[DOI]

CoRR, 2024

EgoVideo: Exploring Egocentric Foundation Model and Downstream Adaptation.

[BibT_eX]

[DOI]

CoRR, 2024

InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding.

[BibT_eX]

[DOI]

CoRR, 2024

Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding.

[BibT_eX]

[DOI]

CoRR, 2024

Masked Video and Body-Worn IMU Autoencoder for Egocentric Action Recognition.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

InternVideo2: Scaling Foundation Models for Multimodal Video Understanding.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

ActionVOS: Actions as Prompts for Video Object Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Retrieval-Augmented Egocentric Video Captioning.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives.

[BibT_eX]

[DOI]

Triantafyllos Afouras

Oluwatumininu Oguntola

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

MoVQA: A Benchmark of Versatile Question-Answering for Long-Form Movie Understanding.

[BibT_eX]

[DOI]

CoRR, 2023

Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives.

[BibT_eX]

[DOI]

CoRR, 2023

VideoLLM: Modeling Video Sequence with Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

Fine-grained Affordance Annotation for Egocentric Hand-Object Interaction Videos.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023

3D Segmenter: 3D Transformer based Semantic Segmentation via 2D Panoramic Distillation.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Memory-and-Anticipation Transformer for Online Action Understanding.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Weakly Supervised Temporal Sentence Grounding with Uncertainty-Guided Self-training.

[BibT_eX]

[DOI]

Lijin Yang

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Structural Multiplane Image: Bridging Neural View Synthesis and 3D Reconstruction.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

First Bite/Chew: distinguish different types of food by first biting/chewing and the corresponding hand movement.

[BibT_eX]

[DOI]

Proceedings of the Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems, 2023

Proposal-based Temporal Action Localization with Point-level Supervision.

[BibT_eX]

[DOI]

Proceedings of the 34th British Machine Vision Conference 2023, 2023

First Bite/Chew: distinguish typical allergic food by two IMUs.

[BibT_eX]

[DOI]

Proceedings of the Augmented Humans International Conference 2023, 2023

2022

Spatio-Temporal Perturbations for Video Attribution.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., 2022

InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D Challenges.

[BibT_eX]

[DOI]

CoRR, 2022

Precise Affordance Annotation for Egocentric Action Video Datasets.

[BibT_eX]

[DOI]

CoRR, 2022

Seeing our Blind Spots: Smart Glasses-based Simulation to Increase Design Students' Awareness of Visual Impairment.

[BibT_eX]

[DOI]

Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology, 2022

Inner self drawing machine.

[BibT_eX]

[DOI]

Proceedings of the SIGGRAPH Asia 2022 Art Gallery, 2022

GazeSync: Eye Movement Transfer Using an Optical Eye Tracker and Monochrome Liquid Crystal Displays.

[BibT_eX]

[DOI]

Proceedings of the IUI 2022: 27th International Conference on Intelligent User Interfaces, Helsinki, Finland, March 22 - 25, 2022, 2022

Compound Prototype Matching for Few-Shot Action Recognition.

[BibT_eX]

[DOI]

Lijin Yang

Proceedings of the Computer Vision - ECCV 2022, 2022

Interact before Align: Leveraging Cross-Modal Knowledge for Domain Adaptive Action Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Ego4D: Around the World in 3, 000 Hours of Egocentric Video.

[BibT_eX]

[DOI]

Christoph Feichtenhofer

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021

Ego4D: Around the World in 3, 000 Hours of Egocentric Video.

[BibT_eX]

[DOI]

Christoph Feichtenhofer

CoRR, 2021

EPIC-KITCHENS-100 Unsupervised Domain Adaptation Challenge for Action Recognition 2021: Team M3EM Technical Report.

[BibT_eX]

[DOI]

CoRR, 2021

Towards Visually Explaining Video Understanding Networks with Perturbation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2021

Precise Multi-Modal In-Hand Pose Estimation using Low-Precision Sensors for Robotic Assembly.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Robotics and Automation, 2021

Goal-Oriented Gaze Estimation for Zero-Shot Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Stacked Temporal Attention: Improving First-person Action Recognition by Emphasizing Discriminative Clips.

[BibT_eX]

[DOI]

Proceedings of the 32nd British Machine Vision Conference 2021, 2021

Leveraging Human Selective Attention for Medical Image Analysis with Limited Training Data.

[BibT_eX]

[DOI]

Proceedings of the 32nd British Machine Vision Conference 2021, 2021

Commonsense Knowledge Aware Concept Selection For Diverse and Informative Visual Storytelling.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020

Mutual Context Network for Jointly Estimating Egocentric Gaze and Action.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2020

An Ego-Vision System for Discovering Human Joint Attention.

[BibT_eX]

[DOI]

Minjie Cai

IEEE Trans. Hum. Mach. Syst., 2020

Learn to Extract Building Outline from Misaligned Annotation through Nearest Feature Selector.

[BibT_eX]

[DOI]

Remote. Sens., 2020

A Comprehensive Study on Visual Explanations for Spatio-temporal Networks.

[BibT_eX]

[DOI]

CoRR, 2020

Learn to Recover Visible Color for Video Surveillance in a Day.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

Improving Action Segmentation via Graph-Based Temporal Reasoning.

[BibT_eX]

[DOI]

Yusuke Sugano

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019

Mutual Context Network for Jointly Estimating Egocentric Gaze and Actions.

[BibT_eX]

[DOI]

CoRR, 2019

Manipulation-Skill Assessment from Videos with Spatial Attention Network.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshops, 2019

2018

Predicting Gaze in Egocentric Video by Learning Task-Dependent Attention Transition.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2018, 2018

Semantic Aware Attention Based Deep Object Co-segmentation.

[BibT_eX]

[DOI]

Hong Chen