Miao Liu

Orcid: 0000-0002-2039-2051

Affiliations:
  • META GenAI


According to our database1, Miao Liu authored at least 38 papers between 2018 and 2025.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
Learning Predictive Visuomotor Coordination.
CoRR, March, 2025

Non-Intrusive Speech Quality Assessment Based on Deep Neural Networks for Speech Communication.
IEEE Trans. Neural Networks Learn. Syst., January, 2025

Accelerating Multimodal Large Language Models by Searching Optimal Vision Token Reduction.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Unleashing In-context Learning of Autoregressive Models for Few-shot Image Manipulation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Building a Mind Palace: Structuring Environment-Grounded Semantic Graphs for Effective Long Video Analysis with LLMs.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024
Audio-Visual Temporal Forgery Detection Using Embedding-Level Fusion and Multi-Dimensional Contrastive Loss.
IEEE Trans. Circuits Syst. Video Technol., August, 2024

In the Eye of Transformer: Global-Local Correlation for Egocentric Gaze Estimation and Beyond.
Int. J. Comput. Vis., 2024

Human Action Anticipation: A Survey.
CoRR, 2024

BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation.
CoRR, 2024

Animated Stickers: Bringing Stickers to Life with Video Diffusion.
CoRR, 2024

ListenFormer: Responsive Listening Head Generation with Non-autoregressive Transformers.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Visually Guided Binaural Audio Generation with Cross-Modal Consistency.
Proceedings of the IEEE International Conference on Acoustics, 2024

Non-Intrusive Speech Quality Assessment with Multi-Task Learning Based on Tensor Network.
Proceedings of the IEEE International Conference on Acoustics, 2024

Listen to Look Into the Future: Audio-Visual Egocentric Gaze Anticipation.
Proceedings of the Computer Vision - ECCV 2024, 2024

LEGO: Learning EGOcentric Action Frame Generation via Visual Instruction Tuning.
Proceedings of the Computer Vision - ECCV 2024, 2024

The Audio-Visual Conversational Graph: From an Egocentric-Exocentric Perspective.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives.
, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
In the Eye of the Beholder: Gaze and Actions in First Person Video.
IEEE Trans. Pattern Anal. Mach. Intell., June, 2023

Werewolf Among Us: Multimodal Resources for Modeling Persuasion Behaviors in Social Deduction Games.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

2022
Werewolf Among Us: A Multimodal Dataset for Modeling Persuasion Behaviors in Social Deduction Games.
CoRR, 2022

BIT-MI Deep Learning-based Model to Non-intrusive Speech Quality Assessment Challenge in Online Conferencing Applications.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Binaural Sound Source Localization based on Neural Networks in Mismatched HRTF Condition.
Proceedings of the ICCAI '22: 8th International Conference on Computing and Artificial Intelligence, Tianjin, China, March 18, 2022

MOS Predictor for Synthetic Speech with I-Vector Inputs.
Proceedings of the IEEE International Conference on Acoustics, 2022

Egocentric Activity Recognition and Localization on a 3D Map.
Proceedings of the Computer Vision - ECCV 2022, 2022

Generative Adversarial Network for Future Hand Segmentation from Egocentric Video.
Proceedings of the Computer Vision - ECCV 2022, 2022


In the Eye of Transformer: Global-Local Correlation for Egocentric Gaze Estimation.
Proceedings of the 33rd British Machine Vision Conference 2022, 2022

2021
Neural network-based non-intrusive speech quality assessment using attention pooling function.
EURASIP J. Audio Speech Music. Process., 2021

Ego4D: Around the World in 3, 000 Hours of Egocentric Video.
CoRR, 2021

Frequency Axis Pooling Method for Weakly Labeled Sound Event Detection and Classification.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

4D Human Body Capture from Egocentric Video via 3D Scene Grounding.
Proceedings of the International Conference on 3D Vision, 2021

2020
SyncWISE: Window Induced Shift Estimation for Synchronization of Video and Accelerometry from Wearable Sensors.
Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., 2020

Forecasting Human-Object Interaction: Joint Prediction of Motor Attention and Actions in First Person Video.
Proceedings of the Computer Vision - ECCV 2020, 2020

Attention Distillation for Learning Video Representations.
Proceedings of the 31st British Machine Vision Conference 2020, 2020

2019
Forecasting Human Object Interaction: Joint Prediction of Motor Attention and Egocentric Activity.
CoRR, 2019

Paying More Attention to Motion: Attention Distillation for Learning Video Representations.
CoRR, 2019

2018
In the Eye of Beholder: Joint Learning of Gaze and Actions in First Person Video.
Proceedings of the Computer Vision - ECCV 2018, 2018


  Loading...