Mengmeng Xu

Jürgen Schmidhuber

Trans. Mach. Learn. Res., 2025

MarDini: Masked Auto-regressive Diffusion for Video Generation at Scale.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2025

Mindstorms in Natural Language-Based Societies of Mind.

[BibT_eX]

[DOI]

Comput. Vis. Media, 2025

Learning Flow Fields in Attention for Controllable Person Image Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

OpenTAD: A Unified Framework and Comprehensive Study of Temporal Action Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2025

2024

MarDini: Masked Autoregressive Diffusion for Video Generation at Scale.

[BibT_eX]

[DOI]

CoRR, 2024

Boundary Denoising for Video Activity Localization.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

FLATTEN: optical FLow-guided ATTENtion for consistent text-to-video editing.

[BibT_eX]

[DOI]

Bodo Rosenhahn

Sen He

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Move Anything with Layered Scene Diffusion.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

GenTron: Diffusion Transformers for Image and Video Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

Hyper-VolTran: Fast and Generalizable One-Shot Image to 3D Object Structure via HyperNetworks.

[BibT_eX]

[DOI]

Christian Simon

Sen He

Amine Benhalloum

CoRR, 2023

GenTron: Delving Deep into Diffusion Transformers for Image and Video Generation.

[BibT_eX]

[DOI]

CoRR, 2023

Where is my Wallet? Modeling Object Proposal Sets for Egocentric Visual Query Localization.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

NewsNet: A Novel Dataset for Hierarchical Temporal Segmentation.

[BibT_eX]

[DOI]

Raghavendra Ramachandra

Chia-Wen Lin

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

ETAD: Training Action Detection End to End on a Laptop.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

Multi-Modal Few-Shot Temporal Action Detection via Vision-Language Meta-Adaptation.

[BibT_eX]

[DOI]

Sauradip Nag

Xiatian Zhu

Yi-Zhe Song

CoRR, 2022

Negative Frames Matter in Egocentric Visual Query 2D Localization.

[BibT_eX]

[DOI]

Santhosh Kumar Ramakrishnan

CoRR, 2022

ETAD: A Unified Framework for Efficient Temporal Action Detection.

[BibT_eX]

[DOI]

CoRR, 2022

Contrastive Language-Action Pre-training for Temporal Localization.

[BibT_eX]

[DOI]

CoRR, 2022

SegTAD: Precise Temporal Action Detection via Semantic Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022 Workshops, 2022

Ego4D: Around the World in 3, 000 Hours of Egocentric Video.

[BibT_eX]

[DOI]

Christoph Feichtenhofer

Kiran K. Somasundaram

Giovanni Maria Farinella

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

LC-NAS: Latency Constrained Neural Architecture Search for Point Cloud Networks.

[BibT_eX]

[DOI]

Proceedings of the International Conference on 3D Vision, 2022

2021

Ego4D: Around the World in 3, 000 Hours of Egocentric Video.

[BibT_eX]

[DOI]

Santhosh Kumar Ramakrishnan

Christoph Feichtenhofer

Kiran K. Somasundaram

Giovanni Maria Farinella

CoRR, 2021

Low-Fidelity End-to-End Video Encoder Pre-training for Temporal Action Localization.

[BibT_eX]

[DOI]

Xiatian Zhu

Brais Martínez

CoRR, 2021

Low-Fidelity Video Encoder Optimization for Temporal Action Localization.

[BibT_eX]

[DOI]

Xiatian Zhu

Brais Martínez

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

VLG-Net: Video-Language Graph Matching Network for Video Grounding.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, 2021

Boundary-sensitive Pre-training for Temporal Localization in Videos.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Relation-aware Video Reading Comprehension for Temporal Language Grounding.

[BibT_eX]

[DOI]

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

BAOD: Budget-Aware Object Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2021

2020

G-TAD: Sub-Graph Localization for Temporal Action Detection.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019

Semantic Part RCNN for Real-World Pedestrian Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019

Missing Labels in Object Detection.

[BibT_eX]

[DOI]

Yancheng Bai