Muhammad Maaz

CoRR, June, 2025

PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding.

[BibT_eX]

[DOI]

CoRR, April, 2025

Mobile-VideoGPT: Fast and Accurate Video Understanding Language Model.

[BibT_eX]

[DOI]

CoRR, March, 2025

Palo: A Polyglot Large Multimodal Model for 5B People.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2025

A Culturally-diverse Multilingual Multimodal Video Benchmark & Model.

[BibT_eX]

[DOI]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

2024

UNETR++: Delving Into Efficient and Accurate 3D Medical Image Segmentation.

[BibT_eX]

[DOI]

Ming-Hsuan Yang

IEEE Trans. Medical Imaging, September, 2024

VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding.

[BibT_eX]

[DOI]

Salman Khan

CoRR, 2024

GLaMM: Pixel Grounding Large Multimodal Model.

[BibT_eX]

[DOI]

Sahal Shaji Mullappilly

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models.

[BibT_eX]

[DOI]

Salman Khan

Fahad Khan

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023

PG-Video-LLaVA: Pixel Grounding Large Video-Language Models.

[BibT_eX]

[DOI]

Shehan Munasinghe

Rusiru Thushara

Salman Khan

Mubarak Shah

CoRR, 2023

SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications.

[BibT_eX]

[DOI]

Ming-Hsuan Yang

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Fine-tuned CLIP Models are Efficient Video Learners.

[BibT_eX]

[DOI]

Muhammad Uzair Khattak

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

MaPLe: Multi-modal Prompt Learning.

[BibT_eX]

[DOI]

Muhammad Uzair Khattak

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

Bridging the Gap between Object and Image-level Representations for Open-Vocabulary Detection.

[BibT_eX]

[DOI]

Muhammad Uzair Khattak

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for Mobile Vision Applications.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022 Workshops, 2022

Class-Agnostic Object Detection with Multi-modal Transformer.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

2021

Multi-modal Transformers Excel at Class-agnostic Object Detection.

[BibT_eX]

[DOI]

CoRR, 2021

Self-Supervised Learning for Fine-Grained Visual Categorization.

[BibT_eX]

[DOI]