Ajay Divakaran

Orcid: 0000-0003-0371-5346

According to our database1, Ajay Divakaran authored at least 127 papers between 1995 and 2024.

Collaborative distances:


IEEE Fellow

IEEE Fellow 2011, "For contributions to multimedia content analysis".



In proceedings 
PhD thesis 


On csauthors.net:


Pelican: Correcting Hallucination in Vision-LLMs via Claim Decomposition and Program of Thought Verification.
CoRR, 2024

Empowering Interdisciplinary Insights with Dynamic Graph Embedding Trajectories.
CoRR, 2024

Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models.
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024

Demonstrations Are All You Need: Advancing Offensive Content Paraphrasing using In-Context Learning.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

BloomVQA: Assessing Hierarchical Multi-modal Comprehension.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

A Video is Worth 10, 000 Words: Training and Benchmarking with Diverse Captions for Better Long Video Retrieval.
CoRR, 2023

DRESS: Instructing Large Vision-Language Models to Align and Interact with Humans via Natural Language Feedback.
CoRR, 2023

Confidence Calibration for Systems with Cascaded Predictive Modules.
CoRR, 2023

Probing Conceptual Understanding of Large Visual-Language Models.
CoRR, 2023

Predicting Information Pathways Across Online Communities.
Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2023

TIJO: Trigger Inversion with Joint Optimization for Defending Multimodal Backdoored Models.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Detecting Trojaned DNNs Using Counterfactual Attributions.
Proceedings of the IEEE International Conference on Assured Autonomy, 2023

Broadening AI Ethics Narratives: An Indic Art View.
Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, 2023

Multilingual Content Moderation: A Case Study on Reddit.
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 2023

Class Prototypes based Contrastive Learning for Classifying Multi-Label and Fine-Grained Educational Videos.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Unpacking Large Language Models with Conceptual Consistency.
CoRR, 2022

Model-Free Generative Replay for Lifelong Reinforcement Learning: Application to Starcraft-2.
CoRR, 2022

Detecting out-of-context objects using contextual cues.
CoRR, 2022

Challenges in Procedural Multimodal Machine Comprehension: A Novel Way To Benchmark.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2022

Detecting Out-Of-Context Objects Using Graph Contextual Reasoning Network.
Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, 2022

Hybrid Consistency Training with Prototype Adaptation for Few-Shot Learning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2022

System Design for an Integrated Lifelong Reinforcement Learning Agent for Real-Time Strategy Games.
Proceedings of the Second International Conference on AI-ML Systems, 2022

Towards Solving Multimodal Comprehension.
CoRR, 2021

Modular Adaptation for Cross-Domain Few-Shot Learning.
CoRR, 2021

Knowing What VQA Does Not: Pointing to Error-Inducing Regions to Improve Explanation Helpfulness.
CoRR, 2021

Comprehension Based Question Answering using Bloom's Taxonomy.
Proceedings of the 6th Workshop on Representation Learning for NLP, 2021

Confidence Calibration for Domain Generalization under Covariate Shift.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Towards Understanding Confusion and Affective States Under Communication Failures in Voice-Based Human-Machine Interaction.
Proceedings of the 2021 9th International Conference on Affective Computing and Intelligent Interaction, 2021

Zero-Shot Learning with Knowledge Enhanced Visual Semantic Embeddings.
CoRR, 2020

Lifelong Learning using Eigentasks: Task Separation, Skill Acquisition, and Selective Transfer.
CoRR, 2020

Deep Adaptive Semantic Logic (DASL): Compiling Declarative Knowledge into Deep Neural Networks.
CoRR, 2020

Progressive Growing of Neural ODEs.
CoRR, 2020

Stacked Spatio-Temporal Graph Convolutional Networks for Action Segmentation.
Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2020

Fine-Tuning for One-Look Regression Vehicle Counting in Low-Shot Aerial Datasets.
Proceedings of the Pattern Recognition. ICPR International Workshops and Challenges, 2020

FoodX-251: A Dataset for Fine-grained Food Classification.
CoRR, 2019

Deep Unified Multimodal Embeddings for Understanding both Content and Users in Social Media Networks.
CoRR, 2019

Data-Efficient Mutual Information Neural Estimator.
CoRR, 2019

Lucid Explanations Help: Using a Human-AI Image-Guessing Game to Evaluate Machine Explanation Helpfulness.
CoRR, 2019

Align2Ground: Weakly Supervised Phrase Grounding Guided by Image-Caption Alignment.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Can You Explain That? Lucid Explanations Help Human-AI Collaborative Image Retrieval.
Proceedings of the Seventh AAAI Conference on Human Computation and Crowdsourcing, 2019

Sunny and Dark Outside?! Improving Answer Consistency in VQA through Entailed Question Generation.
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019

Integrating Text and Image: Determining Multimodal Document Intent in Instagram Posts.
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019

Deep Multimodal Fusion: A Hybrid Approach.
Int. J. Comput. Vis., 2018

Understanding Visual Ads by Aligning Symbols and Objects using Co-Attention.
CoRR, 2018

Zero-Shot Object Detection.
Proceedings of the Computer Vision - ECCV 2018, 2018

Combining Weakly and Webly Supervised Learning for Classifying Food Images.
CoRR, 2017

Multimodal analytics to study collaborative problem solving in pair programming.
Proceedings of the Sixth International Conference on Learning Analytics & Knowledge, 2016

Human Social Interaction Modeling Using Temporal Deep Networks.
CoRR, 2015

2nd Workshop on Computational Models of Social Interactions: Human-Computer-Media Communication (HCMC2015).
Proceedings of the 23rd Annual ACM Conference on Multimedia Conference, MM '15, Brisbane, Australia, October 26, 2015

Exploiting Multimodal Affect and Semantics to Identify Politically Persuasive Web Videos.
Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, Seattle, WA, USA, November 09, 2015

Audio-based affect detection in web videos.
Proceedings of the 2015 IEEE International Conference on Multimedia and Expo, 2015

The Tower Game Dataset: A multimodal dataset for analyzing social interaction predicates.
Proceedings of the 2015 International Conference on Affective Computing and Intelligent Interaction, 2015

Multimodal fusion using dynamic hybrid models.
Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2014

SRI-Sarnoff AURORA System at TRECVID 2014 Multimedia Event Detection and Recounting.
Proceedings of the 2014 TREC Video Retrieval Evaluation, 2014

Emotion detection in speech using deep networks.
Proceedings of the IEEE International Conference on Acoustics, 2014

Video event recognition using concept attributes.
Proceedings of the 2013 IEEE Workshop on Applications of Computer Vision, 2013

SRI-Sarnoff AURORA System at TRECVID 2013 Multimedia Event Detection and Recounting.
Proceedings of the 2013 TREC Video Retrieval Evaluation, 2013

Semantic pooling for complex event detection.
Proceedings of the ACM Multimedia Conference, 2013

Affect analysis in natural human interaction using Joint Hidden Conditional Random Fields.
Proceedings of the 2013 IEEE International Conference on Multimedia and Expo, 2013

Dynamic Pooling for Complex Event Recognition.
Proceedings of the IEEE International Conference on Computer Vision, 2013

Leveraging a Generalized Tutoring Framework in Exploratory Simulations Of Ill-Defined Domains.
Proceedings of the Workshops at the 16th International Conference on Artificial Intelligence in Education AIED 2013, 2013

On the Applicability of Speaker Diarization to Audio Indexing of Non-Speech and Mixed Non-Speech/Speech Video Soundtracks.
Int. J. Multim. Data Eng. Manag., 2012

SRI-Sarnoff AURORA System at TRECVID 2012 Multimedia Event Detection and Recounting.
Proceedings of the 2012 TREC Video Retrieval Evaluation, 2012

Multimedia event recounting with concept based representation.
Proceedings of the 20th ACM Multimedia Conference, MM '12, Nara, Japan, October 29, 2012

How to put it into words - using random forests to extract symbol level descriptions from audio content for concept detection.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Evaluation of low-level features and their combinations for complex event detection in open source videos.
Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012


On the Applicability of Speaker Diarization to Audio Concept Detection for Multimedia Retrieval.
Proceedings of the 2011 IEEE International Symposium on Multimedia, 2011

Recognition and volume estimation of food intake using a mobile device.
Proceedings of the IEEE Workshop on Applications of Computer Vision (WACV 2009), 2009

Speech denoising using nonnegative matrix factorization with priors.
Proceedings of the IEEE International Conference on Acoustics, 2008

Detection of music segment boundaries using audio-visual features for a personal video recorder.
IEEE Trans. Consumer Electron., 2007

An SVM Framework for Genre-Independent Scene Change Detection.
Proceedings of the 2007 IEEE International Conference on Multimedia and Expo, 2007

An enhanced video summarization system using audio features for a personal video recorder.
IEEE Trans. Consumer Electron., 2006

A Content-Adaptive Analysis and Representation Framework for Audio Event Discovery from "Unscripted" Multimedia.
EURASIP J. Adv. Signal Process., 2006

Sports Program Boundary Detection.
Proceedings of the 2006 IEEE International Conference on Multimedia and Expo, 2006

Broadcast Video Program Summarization using Face Tracks.
Proceedings of the 2006 IEEE International Conference on Multimedia and Expo, 2006

Generative Process Tracking for Audio Analysis.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

A highlight scene detection and video summarization system using audio feature for a personal video recorder.
IEEE Trans. Consumer Electron., 2005

Modeling sports highlights using a time-series clustering framework and model interpretation.
Proceedings of the Storage and Retrieval Methods and Applications for Multimedia 2005, 2005

Highlights extraction from sports video based on an audio-visual marker detection framework.
Proceedings of the 2005 IEEE International Conference on Multimedia and Expo, 2005

Layered dynamic mixture model for pattern discovery in asynchronous multi-modal streams [video applications].
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

Systematic acquisition of audio classes for elevator surveillance.
Proceedings of the Electronic Imaging: Image and Video Communications and Processing 2005, 2005

Structure analysis of soccer video with domain knowledge and hidden Markov models.
Pattern Recognit. Lett., 2004

Framework for measurement of the intensity of motion activity of video segments.
J. Vis. Commun. Image Represent., 2004

MPEG-7 meta-data enhanced encoder system for embedded systems.
Proceedings of the Visual Communications and Image Processing 2004, 2004

Audio-visual event detection based on mining of semantic audio-visual labels.
Proceedings of the Storage and Retrieval Methods and Applications for Multimedia 2004, 2004

Video mining using combinations of unsupervised and supervised learning techniques.
Proceedings of the Storage and Retrieval Methods and Applications for Multimedia 2004, 2004

Audio-Assisted Video Browsing for DVD Recorders.
Proceedings of the Advances in Multimedia Information Processing - PCM 2004, 5th Pacific Rim Conference on Multimedia, Tokyo, Japan, November 30, 2004

A time series clustering based framework for multimedia mining and summarization using audio features.
Proceedings of the 6th ACM SIGMM International Workshop on Multimedia Information Retrieval, 2004

Effective and efficient sports highlights extraction using the minimum description length criterion in selecting GMM structures.
Proceedings of the 2004 IEEE International Conference on Multimedia and Expo, 2004

Time series analysis and segmentation using eigenvectors for mining semantic audio label sequences.
Proceedings of the 2004 IEEE International Conference on Multimedia and Expo, 2004

Adaptive fast playback-based video skimming using a compressed-domain visual complexity measure.
Proceedings of the 2004 IEEE International Conference on Multimedia and Expo, 2004

Towards maximizing the end-user experience.
Proceedings of the 2004 IEEE International Conference on Multimedia and Expo, 2004

Discovering meaningful multimedia patterns with audio-visual concepts and associated text.
Proceedings of the 2004 International Conference on Image Processing, 2004

Video mining: pattern discovery versus pattern recognition.
Proceedings of the 2004 International Conference on Image Processing, 2004

Survey of compressed-domain features used in audio-visual indexing and analysis.
J. Vis. Commun. Image Represent., 2003

Procedure for audio-assisted browsing of news video using generalized sound recognition.
Proceedings of the Storage and Retrieval for Media Databases 2003, 2003

Automatic extraction of soccer video highlights using a combination of motion and audio features.
Proceedings of the Storage and Retrieval for Media Databases 2003, 2003

Unsupervised discovery of multilevel statistical video structures using hierarchical hidden Markov models.
Proceedings of the 2003 IEEE International Conference on Multimedia and Expo, 2003

Multi-camera calibration, object tracking and query generation.
Proceedings of the 2003 IEEE International Conference on Multimedia and Expo, 2003

Generation of sports highlights using motion activity in combination with a common audio feature extraction framework.
Proceedings of the 2003 International Conference on Image Processing, 2003

Feature selection for unsupervised discovery of statistical temporal structures in video.
Proceedings of the 2003 International Conference on Image Processing, 2003

Audio events detection based highlights extraction from baseball, golf and soccer games in a unified framework.
Proceedings of the 2003 IEEE International Conference on Acoustics, 2003

Comparing MFCC and MPEG-7 audio features for feature extraction, maximum likelihood HMM and entropic prior HMM for sports audio classification.
Proceedings of the 2003 IEEE International Conference on Acoustics, 2003

Rapid generation of sports video highlights using the MPEG-7 motion activity descriptor.
Proceedings of the Storage and Retrieval for Media Databases 2002, 2002

Data reduction procedure for principal cast and other talking head detection.
Proceedings of the Storage and Retrieval for Media Databases 2002, 2002

Representation of motion activity in hierarchical levels for video indexing and filtering.
Proceedings of the 2002 International Conference on Image Processing, 2002

Motion activity-based extraction of key-frames from video shots.
Proceedings of the 2002 International Conference on Image Processing, 2002

Structure analysis of soccer video with hidden Markov models.
Proceedings of the IEEE International Conference on Acoustics, 2002

MPEG-7 visual motion descriptors.
IEEE Trans. Circuits Syst. Video Technol., 2001

Video summarization using descriptors of motion activity: A motion activity based approach to key-frame extraction from video shots.
J. Electronic Imaging, 2001

Automatic measurement of intensity of motion activity of video segments.
Proceedings of the Storage and Retrieval for Media Databases 2001, 2001

Video summarization using motion descriptors.
Proceedings of the Storage and Retrieval for Media Databases 2001, 2001

A Motion Activity Descriptor and Its Extraction in Compressed Domain.
Proceedings of the Advances in Multimedia Information Processing, 2001

Algorithms And System For Segmentation And Structure Analysis In Soccer Video.
Proceedings of the 2001 IEEE International Conference on Multimedia and Expo, 2001

A Novel Pair-Wise Comparison Based Analytical Framework For Automatic Measurement Of Intensity Of Motion Activity Of Video Segments.
Proceedings of the 2001 IEEE International Conference on Multimedia and Expo, 2001

Constant pace skimming and temporal sub-sampling of video using motion activity.
Proceedings of the 2001 International Conference on Image Processing, 2001

An Overview of MPEG-7 Motion Descriptors and Their Applications.
Proceedings of the Computer Analysis of Images and Patterns, 9th International Conference, 2001

Video browsing system based on compressed domain feature extraction.
IEEE Trans. Consumer Electron., 2000

Descriptor for spatial distribution of motion activity for compressed video.
Proceedings of the Storage and Retrieval for Media Databases 2000, 2000

Fade-in/out scene change detection in the MPEG-1/2/4 compressed video domain.
Proceedings of the Storage and Retrieval for Media Databases 2000, 2000

Efficient Representation and Comparison of Multimedia Content using DAG-Composition.
Proceedings of the 2000 IEEE International Conference on Multimedia and Expo, 2000

A Region Based Descriptor for Spatial Distribution of Motion Activity for Compressed Video.
Proceedings of the 2000 International Conference on Image Processing, 2000

Scene change detection and feature extraction for MPEG-4 sequences.
Proceedings of the Storage and Retrieval for Image and Video Databases VII, 1999

Video compression by mean-corrected motion compensation of partial quadtrees.
IEEE Trans. Circuits Syst. Video Technol., 1997

Information-theoretic performance of quadrature mirror filters.
IEEE Trans. Inf. Theory, 1995
