Miao Liu

Christoph Feichtenhofer

IEEE Trans. Pattern Anal. Mach. Intell., November, 2025

Learning Predictive Visuomotor Coordination.

[BibT_eX]

[DOI]

CoRR, March, 2025

Non-Intrusive Speech Quality Assessment Based on Deep Neural Networks for Speech Communication.

[BibT_eX]

[DOI]

IEEE Trans. Neural Networks Learn. Syst., January, 2025

Accelerating Multimodal Large Language Models by Searching Optimal Vision Token Reduction.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Unleashing In-context Learning of Autoregressive Models for Few-shot Image Manipulation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Building a Mind Palace: Structuring Environment-Grounded Semantic Graphs for Effective Long Video Analysis with LLMs.

[BibT_eX]

[DOI]

Sigmund Vanvalkenburgh

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024

Audio-Visual Temporal Forgery Detection Using Embedding-Level Fusion and Multi-Dimensional Contrastive Loss.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., August, 2024

In the Eye of Transformer: Global-Local Correlation for Egocentric Gaze Estimation and Beyond.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., 2024

Human Action Anticipation: A Survey.

[BibT_eX]

[DOI]

CoRR, 2024

BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation.

[BibT_eX]

[DOI]

CoRR, 2024

Animated Stickers: Bringing Stickers to Life with Video Diffusion.

[BibT_eX]

[DOI]

CoRR, 2024

ListenFormer: Responsive Listening Head Generation with Non-autoregressive Transformers.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Visually Guided Binaural Audio Generation with Cross-Modal Consistency.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Non-Intrusive Speech Quality Assessment with Multi-Task Learning Based on Tensor Network.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Listen to Look Into the Future: Audio-Visual Egocentric Gaze Anticipation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

LEGO: Learning EGOcentric Action Frame Generation via Visual Instruction Tuning.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

The Audio-Visual Conversational Graph: From an Egocentric-Exocentric Perspective.

[BibT_eX]

[DOI]

Wenqi Jia

Hao Jiang

Ishwarya Ananthabhotla

Vamsi Krishna Ithapu

Ruohan Gao

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives.

[BibT_eX]

[DOI]

Triantafyllos Afouras

Oluwatumininu Oguntola

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

In the Eye of the Beholder: Gaze and Actions in First Person Video.

[BibT_eX]

[DOI]

Yin Li

IEEE Trans. Pattern Anal. Mach. Intell., June, 2023

Werewolf Among Us: Multimodal Resources for Modeling Persuasion Behaviors in Social Deduction Games.

[BibT_eX]

[DOI]

Shirley Anugrah Hayati

Diyi Yang

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

2022

Werewolf Among Us: A Multimodal Dataset for Modeling Persuasion Behaviors in Social Deduction Games.

[BibT_eX]

[DOI]

Shirley Anugrah Hayati

Diyi Yang

CoRR, 2022

BIT-MI Deep Learning-based Model to Non-intrusive Speech Quality Assessment Challenge in Online Conferencing Applications.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Binaural Sound Source Localization based on Neural Networks in Mismatched HRTF Condition.

[BibT_eX]

[DOI]

Proceedings of the ICCAI '22: 8th International Conference on Computing and Artificial Intelligence, Tianjin, China, March 18, 2022

MOS Predictor for Synthetic Speech with I-Vector Inputs.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Egocentric Activity Recognition and Localization on a 3D Map.

[BibT_eX]

[DOI]

Lingni Ma

Proceedings of the Computer Vision - ECCV 2022, 2022

Generative Adversarial Network for Future Hand Segmentation from Egocentric Video.

[BibT_eX]

[DOI]

Wenqi Jia

Proceedings of the Computer Vision - ECCV 2022, 2022

Ego4D: Around the World in 3, 000 Hours of Egocentric Video.

[BibT_eX]

[DOI]

Christoph Feichtenhofer

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

In the Eye of Transformer: Global-Local Correlation for Egocentric Gaze Estimation.

[BibT_eX]

[DOI]

Proceedings of the 33rd British Machine Vision Conference 2022, 2022

2021

Neural network-based non-intrusive speech quality assessment using attention pooling function.

[BibT_eX]

[DOI]

EURASIP J. Audio Speech Music. Process., 2021

Ego4D: Around the World in 3, 000 Hours of Egocentric Video.

[BibT_eX]

[DOI]

Christoph Feichtenhofer

CoRR, 2021

Frequency Axis Pooling Method for Weakly Labeled Sound Event Detection and Classification.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

4D Human Body Capture from Egocentric Video via 3D Scene Grounding.

[BibT_eX]

[DOI]

Proceedings of the International Conference on 3D Vision, 2021

2020

SyncWISE: Window Induced Shift Estimation for Synchronization of Video and Accelerometry from Wearable Sensors.

[BibT_eX]

[DOI]

Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., 2020

Forecasting Human-Object Interaction: Joint Prediction of Motor Attention and Actions in First Person Video.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

Attention Distillation for Learning Video Representations.

[BibT_eX]

[DOI]

Proceedings of the 31st British Machine Vision Conference 2020, 2020

2019

Forecasting Human Object Interaction: Joint Prediction of Motor Attention and Egocentric Activity.

[BibT_eX]

[DOI]

CoRR, 2019

Paying More Attention to Motion: Attention Distillation for Learning Video Representations.

[BibT_eX]

[DOI]

CoRR, 2019

2018

In the Eye of Beholder: Joint Learning of Gaze and Actions in First Person Video.

[BibT_eX]

[DOI]

Yin Li