Makarand Tapaswi

Narayanan C. Krishnan

Proceedings of the 22nd IEEE International Symposium on Biomedical Imaging, 2025

The Sound of Water: Inferring Physical Properties from Pouring Liquids.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

What You See is What You Ask: Evaluating Audio Descriptions.

[BibT_eX]

[DOI]

Divy Kala

Eshika Khandelwal

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

Investigating Mechanisms for In-Context Vision Language Binding.

[BibT_eX]

[DOI]

Darshana Saravanan

Vineet Gandhi

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2025

VELOCITI: Benchmarking Video-Language Compositional Reasoning with Strict Entailment.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024

IdentifyMe: A Challenging Long-Context Mention Resolution Benchmark.

[BibT_eX]

[DOI]

CoRR, 2024

Detect, Describe, Discriminate: Moving Beyond VQA for MLLM Evaluation.

[BibT_eX]

[DOI]

Manu Gaur

Darshan Singh S

CoRR, 2024

Major Entity Identification: A Generalizable Alternative to Coreference Resolution.

[BibT_eX]

[DOI]

CoRR, 2024

VELOCITI: Can Video-Language Models Bind Semantic Concepts through Time?

[BibT_eX]

[DOI]

CoRR, 2024

FiGCLIP: Fine-Grained CLIP Adaptation via Densely Annotated Videos.

[BibT_eX]

[DOI]

Darshan Singh S

Zeeshan Khan

CoRR, 2024

Major Entity Identification: A Generalizable Alternative to Coreference Resolution.

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

"Previously on..." from Recaps to Story Summarization.

[BibT_eX]

[DOI]

Aditya Kumar Singh

Dhruv Srivastava

Siri Venkata Pavan Kumar Kandru

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

MICap: A Unified Model for Identity-Aware Movie Descriptions.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

NurtureNet: A Multi-task Video-based Approach for Newborn Anthropometry.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

Eye vs. AI: Human Gaze and Model Attention in Video Memorability.

[BibT_eX]

[DOI]

CoRR, 2023

GrapeQA: GRaph Augmentation and Pruning to Enhance Question-Answering.

[BibT_eX]

[DOI]

Dhaval Taunk

Lakshya Khanna

Vasudeva Varma

Charu Sharma

Proceedings of the Companion Proceedings of the ACM Web Conference 2023, 2023

Unsupervised Audio-Visual Lecture Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023

How You Feelin'? Learning Emotions and Mental States in Movie Scenes.

[BibT_eX]

[DOI]

Dhruv Srivastava

Aditya Kumar Singh

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Test of Time: Instilling Video-Language Models with a Sense of Time.

[BibT_eX]

[DOI]

Piyush Bagad

Cees G. M. Snoek

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

Can we Adopt Self-supervised Pretraining for Chest X-Rays?

[BibT_eX]

[DOI]

Arsh Verma

CoRR, 2022

Grounded Video Situation Recognition.

[BibT_eX]

[DOI]

Zeeshan Khan

C. V. Jawahar

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Language Conditioned Spatial Relation Reasoning for 3D Object Grounding.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Sonus Texere! Automated Dense Soundtrack Construction for Books using Movie Adaptations.

[BibT_eX]

[DOI]

Jaidev Shriram

Vinoo Alluri

Proceedings of the 23rd International Society for Music Information Retrieval Conference, 2022

Learning Object Manipulation Skills from Video via Approximate Differentiable Physics.

[BibT_eX]

[DOI]

Vladimír Petrík

Mohammad Nomaan Qureshi

Josef Sivic

Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2022

Learning from Unlabeled 3D Environments for Vision-and-Language Navigation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

Think Global, Act Local: Dual-scale Graph Transformer for Vision-and-Language Navigation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Instruction-driven history-aware policies for robotic manipulations.

[BibT_eX]

[DOI]

Proceedings of the Conference on Robot Learning, 2022

2021

Long term spatio-temporal modeling for action detection.

[BibT_eX]

[DOI]

Vineeth N. Balasubramanian

Vijay Kumar

Ivan Laptev

Comput. Vis. Image Underst., 2021

Feature generation for long-tail classification.

[BibT_eX]

[DOI]

Rahul Vigneswaran

Marc T. Law

Proceedings of the ICVGIP '21: Indian Conference on Computer Vision, Graphics and Image Processing, Jodhpur, India, December 19, 2021

Airbert: In-domain Pretraining for Vision-and-Language Navigation.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

2020

Video Face Clustering With Self-Supervised Representation Learning.

[BibT_eX]

[DOI]

IEEE Trans. Biom. Behav. Identity Sci., 2020

Deep Multimodal Feature Encoding for Video Ordering.

[BibT_eX]

[DOI]

Vivek Sharma

CoRR, 2020

Clustering based Contrastive Learning for Improving Face Representations.

[BibT_eX]

[DOI]

Proceedings of the 15th IEEE International Conference on Automatic Face and Gesture Recognition, 2020

Learning Interactions and Relationships Between Movie Characters.

[BibT_eX]

[DOI]

Anna Kukleva

Ivan Laptev

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Learning Object Manipulation Skills via Approximate State Estimation from Real Videos.

[BibT_eX]

[DOI]

Proceedings of the 4th Conference on Robot Learning, 2020

2019

The Shmoop Corpus: A Dataset of Stories with Loosely Aligned Summaries.

[BibT_eX]

[DOI]

CoRR, 2019

Visual Reasoning by Progressive Module Networks.

[BibT_eX]

[DOI]

Seung Wook Kim

Proceedings of the 7th International Conference on Learning Representations, 2019

Video Face Clustering With Unknown Number of Clusters.

[BibT_eX]

[DOI]

Marc T. Law

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips.

[BibT_eX]

[DOI]

Antoine Miech

Dimitri Zhukov

Jean-Baptiste Alayrac

Ivan Laptev

Josef Sivic

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Self-Supervised Learning of Face Representations for Video Face Clustering.

[BibT_eX]

[DOI]

Proceedings of the 14th IEEE International Conference on Automatic Face & Gesture Recognition, 2019

2018

Progressive Reasoning by Module Composition.

[BibT_eX]

[DOI]

Seung Wook Kim

CoRR, 2018

Now You Shake Me: Towards Automatic 4D Cinema.

[BibT_eX]

[DOI]

Yuhao Zhou

Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

MovieGraphs: Towards Understanding Human-Centric Situations From Videos.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

2017

Situation Recognition with Graph Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Computer Vision, 2017

2016

Story Understanding through Semantic Analysis and Automatic Alignment of Text and Video

[BibT_eX]

[DOI]

PhD thesis, 2016

Relaxed Earth Mover's Distances for Chain- and Tree-connected Spaces and their use as a Loss Function in Deep Learning.

[BibT_eX]

[DOI]

Manuel Martínez

Monica-Laura Haurilet

Ziad Al-Halah

CoRR, 2016

Naming TV characters by watching and analyzing dialogs.

[BibT_eX]

[DOI]

Monica-Laura Haurilet

Ziad Al-Halah

Proceedings of the 2016 IEEE Winter Conference on Applications of Computer Vision, 2016

MovieQA: Understanding Stories in Movies through Question-Answering.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

Recovering the Missing Link: Predicting Class-Attribute Associations for Unsupervised Zero-Shot Learning.

[BibT_eX]

[DOI]

Ziad Al-Halah

Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

2015

Aligning plot synopses to videos for story-based retrieval.

[BibT_eX]

[DOI]

Int. J. Multim. Inf. Retr., 2015

Accio: A Data Set for Face Track Retrieval in Movies Across Age.

[BibT_eX]

[DOI]

Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, 2015

KIT at MediaEval 2015 - Evaluating Visual Cues for Affective Impact of Movies Task.

[BibT_eX]

[DOI]

Proceedings of the Working Notes Proceedings of the MediaEval 2015 Workshop, 2015

Improved weak labels using contextual cues for person identification in videos.

[BibT_eX]

[DOI]

Proceedings of the 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, 2015

Book2Movie: Aligning video scenes with book chapters.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015

2014

Story-based Video Retrieval in TV series using Plot Synopses.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Multimedia Retrieval, 2014

Total Cluster: A person agnostic clustering method for broadcast videos.

[BibT_eX]

[DOI]

Proceedings of the 2014 Indian Conference on Computer Vision, 2014

Cleaning up after a face tracker: False positive removal.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE International Conference on Image Processing, 2014

StoryGraphs: Visualizing Character Interactions as a Timeline.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014

A time pooled track kernel for person identification.

[BibT_eX]

[DOI]

Proceedings of the 11th IEEE International Conference on Advanced Video and Signal Based Surveillance, 2014

2013

QCompere @ REPERE 2013.

[BibT_eX]

[DOI]

Achintya Kumar Sarkar

Proceedings of the First Workshop on Speech, 2013

Semi-supervised Learning with Constraints for Person Identification in Multimedia Data.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013

2012

KIT at MediaEval 2012 - Content - based Genre Classification with Visual Cues.

[BibT_eX]

[DOI]

Proceedings of the Working Notes Proceedings of the MediaEval 2012 Workshop, 2012

Fusion of Speech, Faces and Text for Person Identification in TV Broadcast.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2012. Workshops and Demonstrations, 2012

"Knock! Knock! Who is it?" probabilistic person identification in TV-series.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012

Contextual Constraints for Person Retrieval in Camera Networks.

[BibT_eX]

[DOI]

Proceedings of the Ninth IEEE International Conference on Advanced Video and Signal-Based Surveillance, 2012

2008

Multilingual spoken-password based user authentication in emerging economies using cellular phone networks.

[BibT_eX]

[DOI]

Proceedings of the 2008 IEEE Spoken Language Technology Workshop, 2008

Audio-Visual Person Authentication with Multiple Visualized-Speech Features and Multiple Face Profiles.

[BibT_eX]

[DOI]

Amitava Das

Ohil K. Manyam