Mengmeng Xu

Orcid: 0000-0001-9152-4632

Affiliations:
  • King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia


According to our database1, Mengmeng Xu authored at least 33 papers between 2019 and 2025.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
Ego4D: Around the World in 3,600 Hours of Egocentric Video.
IEEE Trans. Pattern Anal. Mach. Intell., November, 2025

Faster Diffusion Through Temporal Attention Decomposition.
Trans. Mach. Learn. Res., 2025

MarDini: Masked Auto-regressive Diffusion for Video Generation at Scale.
Trans. Mach. Learn. Res., 2025

Mindstorms in Natural Language-Based Societies of Mind.
Comput. Vis. Media, 2025

Learning Flow Fields in Attention for Controllable Person Image Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

OpenTAD: A Unified Framework and Comprehensive Study of Temporal Action Detection.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2025

2024
MarDini: Masked Autoregressive Diffusion for Video Generation at Scale.
CoRR, 2024

Boundary Denoising for Video Activity Localization.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

FLATTEN: optical FLow-guided ATTENtion for consistent text-to-video editing.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Move Anything with Layered Scene Diffusion.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

GenTron: Diffusion Transformers for Image and Video Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
Hyper-VolTran: Fast and Generalizable One-Shot Image to 3D Object Structure via HyperNetworks.
CoRR, 2023

GenTron: Delving Deep into Diffusion Transformers for Image and Video Generation.
CoRR, 2023

Where is my Wallet? Modeling Object Proposal Sets for Egocentric Visual Query Localization.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

NewsNet: A Novel Dataset for Hierarchical Temporal Segmentation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

ETAD: Training Action Detection End to End on a Laptop.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
Multi-Modal Few-Shot Temporal Action Detection via Vision-Language Meta-Adaptation.
CoRR, 2022

Negative Frames Matter in Egocentric Visual Query 2D Localization.
CoRR, 2022

ETAD: A Unified Framework for Efficient Temporal Action Detection.
CoRR, 2022

Contrastive Language-Action Pre-training for Temporal Localization.
CoRR, 2022

SegTAD: Precise Temporal Action Detection via Semantic Segmentation.
Proceedings of the Computer Vision - ECCV 2022 Workshops, 2022


LC-NAS: Latency Constrained Neural Architecture Search for Point Cloud Networks.
Proceedings of the International Conference on 3D Vision, 2022

2021
Ego4D: Around the World in 3, 000 Hours of Egocentric Video.
CoRR, 2021

Low-Fidelity End-to-End Video Encoder Pre-training for Temporal Action Localization.
CoRR, 2021

Low-Fidelity Video Encoder Optimization for Temporal Action Localization.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

VLG-Net: Video-Language Graph Matching Network for Video Grounding.
Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, 2021

Boundary-sensitive Pre-training for Temporal Localization in Videos.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Relation-aware Video Reading Comprehension for Temporal Language Grounding.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

BAOD: Budget-Aware Object Detection.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2021

2020
G-TAD: Sub-Graph Localization for Temporal Action Detection.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019
Semantic Part RCNN for Real-World Pedestrian Detection.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019

Missing Labels in Object Detection.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019


  Loading...