Dahun Kim

Ganesh Satish Mallya

Divyashree Sreepathihalli

CoRR, April, 2026

Taking Shortcuts for Categorical VQA Using Super Neurons.

[BibT_eX]

[DOI]

CoRR, March, 2026

2025

EmbeddingGemma: Powerful and Lightweight Text Representations.

[BibT_eX]

[DOI]

Henrique Schechter Vera

Sindhu Raghuram Panyam

Gustavo Hernández Ábrego

Sai Meher Karthik Duddu

Mojtaba Seyedhosseini

CoRR, September, 2025

Zero-Shot Multi-Spectral Learning: Reimagining a Generalist Multimodal Gemini 2.5 Model for Remote Sensing Applications.

[BibT_eX]

[DOI]

CoRR, September, 2025

Time-Scaling State-Space Models for Dense Video Captioning.

[BibT_eX]

[DOI]

CoRR, September, 2025

Context-Adaptive Multi-Prompt Embedding with Large Language Models for Vision-Language Alignment.

[BibT_eX]

[DOI]

CoRR, August, 2025

Learning Visual Grounding from Generative Vision and Language Model.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2025

VideoComp: Advancing Fine-Grained Compositional and Temporal Alignment in Video-Text Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024

Uni-DVPS: Unified Model for Depth-Aware Video Panoptic Segmentation.

[BibT_eX]

[DOI]

IEEE Robotics Autom. Lett., July, 2024

Textless Unit-to-Unit Training for Many-to-Many Multilingual Speech-to-Speech Translation.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2024

Whats in a Video: Factorized Autoregressive Decoding for Online Dense Video Captioning.

[BibT_eX]

[DOI]

CoRR, 2024

OmniBind: Teach to Build Unequal-Scale Modality Interaction for Omni-Bind of All.

[BibT_eX]

[DOI]

CoRR, 2024

Video-kMaX: A Simple Unified Approach for Online and Near-Online Video Panoptic Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024

Region-Centric Image-Language Pretraining for Open-Vocabulary Detection.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Mirasol3B: A Multimodal Autoregressive Model for Time-Aligned and Contextual Modalities.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

RECLIP: Resource-efficient CLIP by Training with Small Images.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2023

MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2023

Detection-Oriented Image-Text Pretraining for Open-Vocabulary Detection.

[BibT_eX]

[DOI]

CoRR, 2023

Many-to-Many Spoken Language Translation via Unified Speech and Text Representation Learning with Unit-to-Unit Translation.

[BibT_eX]

[DOI]

CoRR, 2023

Memory-Aware DVFS Governing Policy for Improved Energy-Saving in the Linux Kernel.

[BibT_eX]

[DOI]

Philkyue Shin

Seongsoo Hong

Proceedings of the 29th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications, 2023

Neural Image-based Avatars: Generalizable Radiance Fields for Human Avatar Modeling.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Contrastive Feature Masking Open-Vocabulary Vision Transformer.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

Dense Pixel-Level Interpretation of Dynamic Scenes With Video Panoptic Segmentation.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2022

Learning Open-World Object Proposals Without Learning to Classify.

[BibT_eX]

[DOI]

IEEE Robotics Autom. Lett., 2022

Tailor Me: An Editing Network for Fashion Attribute Shape Manipulation.

[BibT_eX]

[DOI]

Viswanathan Swaminathan

Henry Fuchs

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2022

CMT-DeepLab: Clustering Mask Transformers for Panoptic Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

TubeFormer-DeepLab: Video Mask Transformer.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021

DeepLab2: A TensorFlow Library for Deep Labeling.

[BibT_eX]

[DOI]

CoRR, 2021

The Devil is in the Boundary: Exploiting Boundary Representation for Basis-based Instance Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2021

Neural Human Performer: Learning Generalizable Radiance Fields for Human Performance Rendering.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Learning To Associate Every Segment for Video Panoptic Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Global Context and Geometric Priors for Effective Non-Local Self-Attention.

[BibT_eX]

[DOI]

Proceedings of the 32nd British Machine Vision Conference 2021, 2021

2020

Recurrent Temporal Aggregation Framework for Deep Video Inpainting.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2020

Rotationally-Consistent Novel View Synthesis for Humans.

[BibT_eX]

[DOI]

Viswanathan Swaminathan

Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Rotationally-Temporally Consistent Novel View Synthesis of Human Performance Video.

[BibT_eX]

[DOI]

Viswanathan Swaminathan

Henry Fuchs

Proceedings of the Computer Vision - ECCV 2020, 2020

Video Panoptic Segmentation.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Align-and-Attend Network for Globally and Locally Coherent Video Inpainting.

[BibT_eX]

[DOI]

Proceedings of the 31st British Machine Vision Conference 2020, 2020

Hide-and-Tell: Learning to Bridge Photo Streams for Visual Storytelling.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019

Preserving Semantic and Temporal Consistency for Unpaired Video-to-Video Translation.

[BibT_eX]

[DOI]

Proceedings of the 27th ACM International Conference on Multimedia, 2019

Video Retargeting: Trade-off between Content Preservation and Spatio-temporal Consistency.

[BibT_eX]

[DOI]

Proceedings of the 27th ACM International Conference on Multimedia, 2019

Deep Video Inpainting.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Deep Blind Video Decaptioning by Temporal Aggregation and Recurrence.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Self-Supervised Video Representation Learning with Space-Time Cubic Puzzles.

[BibT_eX]

[DOI]

Donghyeon Cho

In So Kweon

Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

Discriminative Feature Learning for Unsupervised Video Summarization.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018

Learning Image Representations by Completing Damaged Jigsaw Puzzles.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision, 2018

LinkNet: Relational Embedding for Scene Graph.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

2017

Two-Phase Learning for Weakly Supervised Object Localization.

[BibT_eX]

[DOI]

CoRR, 2017

Two-Phase Learning for Weakly Supervised Object Localization.

[BibT_eX]

[DOI]