Gedas Bertasius

CoRR, 2024

Augmented Reality Demonstrations for Scalable Robot Imitation Learning.

[BibT_eX]

[DOI]

CoRR, 2024

DAM: Dynamic Adapter Merging for Continual Video QA Learning.

[BibT_eX]

[DOI]

CoRR, 2024

Video ReCap: Recursive Captioning of Hour-Long Videos.

[BibT_eX]

[DOI]

CoRR, 2024

Mementos: A Comprehensive Benchmark for Multimodal Large Language Model Reasoning over Image Sequences.

[BibT_eX]

[DOI]

CoRR, 2024

2023

MuMUR: Multilingual Multimodal Universal Retrieval.

[BibT_eX]

[DOI]

Avinash Madasu

Estelle Aflalo

Gabriela Ben Melech Stan

Inf. Retr. J., June, 2023

A Simple LLM Framework for Long-Range Video Question-Answering.

[BibT_eX]

[DOI]

CoRR, 2023

RGNet: A Unified Retrieval and Grounding Network for Long Videos.

[BibT_eX]

[DOI]

CoRR, 2023

LoCoNet: Long-Short Context Network for Active Speaker Detection.

[BibT_eX]

[DOI]

CoRR, 2023

Unified Coarse-to-Fine Alignment for Video-Text Retrieval.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

SimpleClick: Interactive Image Segmentation with Simple Vision Transformers.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Improving Video Retrieval Using Multilingual Knowledge Transfer.

[BibT_eX]

[DOI]

Avinash Madasu

Estelle Aflalo

Gabriela Ben Melech Stan

Shao-Yen Tseng

Vasudev Lal

Proceedings of the Advances in Information Retrieval, 2023

Vision Transformers are Parameter-Efficient Audio-Visual Learners.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Efficient Movie Scene Detection using State-Space Transformers.

[BibT_eX]

[DOI]

Md Mohaiminul Islam

Mahmudul Hasan

Kishan Shamsundar Athrey

Tony Braskich

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

VindLU: A Recipe for Effective Video-and-Language Pretraining.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

Object State Change Classification in Egocentric Videos using the Divided Space-Time Attention Mechanism.

[BibT_eX]

[DOI]

Md Mohaiminul Islam

CoRR, 2022

Learning to Retrieve Videos by Asking Questions.

[BibT_eX]

[DOI]

Avinash Madasu

Junier Oliva

Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

EclipSE: Efficient Long-Range Video Retrieval Using Sight and Sound.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

Long Movie Clip Classification with State-Space Video Models.

[BibT_eX]

[DOI]

Md Mohaiminul Islam

Proceedings of the Computer Vision - ECCV 2022, 2022

TallFormer: Temporal Action Localization with a Long-Memory Transformer.

[BibT_eX]

[DOI]

Feng Cheng

Proceedings of the Computer Vision - ECCV 2022, 2022

Long-Short Temporal Contrastive Learning of Video Transformers.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Learning To Recognize Procedural Activities with Distant Supervision.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021

Supervoxel Attention Graphs for Long-Range Video Modeling.

[BibT_eX]

[DOI]

Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2021

Is Space-Time Attention All You Need for Video Understanding?

[BibT_eX]

[DOI]

Heng Wang

Proceedings of the 38th International Conference on Machine Learning, 2021

Vx2Text: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020

COBE: Contextualized Object Embeddings from Narrated Instructional Video.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Classifying, Segmenting, and Tracking Object Instances in Video with Mask Propagation.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Attentive Action and Context Factorization.

[BibT_eX]

[DOI]

Proceedings of the 31st British Machine Vision Conference 2020, 2020

2019

Learning Temporal Pose Estimation from Sparsely-Labeled Videos.

[BibT_eX]

[DOI]

Christoph Feichtenhofer

Du Tran

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

2018

Learning Discriminative Motion Features Through Detection.

[BibT_eX]

[DOI]

Christoph Feichtenhofer

Du Tran

CoRR, 2018

Object Detection in Video with Spatiotemporal Sampling Networks.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2018, 2018

Egocentric Basketball Motion Planning From a Single First-Person Image.

[BibT_eX]

[DOI]

Aaron Chan

Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

2017

First-Person Action-Object Detection with EgoNet.

[BibT_eX]

[DOI]

Proceedings of the Robotics: Science and Systems XIII, 2017

Using Cross-Model EgoSupervision to Learn Cooperative Basketball Intention.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Computer Vision Workshops, 2017

Am I a Baller? Basketball Performance Assessment from First-Person Videos.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Computer Vision, 2017

Unsupervised Learning of Important Objects from First-Person Videos.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Computer Vision, 2017

Convolutional Random Walk Networks for Semantic Image Segmentation.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

Local Perturb-and-MAP for Structured Prediction.

[BibT_eX]

[DOI]

Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, 2017

2016

Exploiting Visual-Spatial First-Person Co-Occurrence for Action-Object Detection without Labels.

[BibT_eX]

[DOI]

Stella X. Yu

CoRR, 2016

Am I a Baller? Basketball Skill Assessment using First-Person Cameras.

[BibT_eX]

[DOI]

CoRR, 2016

Automatic Lymph Node Cluster Segmentation Using Holistically-Nested Neural Networks and Structured Optimization in CT Images.

[BibT_eX]

[DOI]

Proceedings of the Medical Image Computing and Computer-Assisted Intervention - MICCAI 2016, 2016

Semantic Segmentation with Boundary Neural Fields.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

2015

Exploiting Egocentric Object Prior for 3D Saliency Detection.

[BibT_eX]

[DOI]

Hyun Soo Park

CoRR, 2015

High-for-Low and Low-for-High: Efficient Boundary Detection from Deep Object Features and Its Applications to High-Level Vision.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Computer Vision, 2015

DeepEdge: A multi-scale bifurcated deep network for top-down contour detection.

[BibT_eX]

[DOI]