Omkar Thawakar

According to our database1, Omkar Thawakar authored at least 32 papers between 2019 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

Online presence:

On csauthors.net:

Bibliography

2026
CoVR-R:Reason-Aware Composed Video Retrieval.
CoRR, March, 2026

Mobile-O: Unified Multimodal Understanding and Generation on Mobile Device.
CoRR, February, 2026

A Multi-Agent Diffusion Approach for MRI Anomaly Segmentation via Modality-Specific LoRA Specialization.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2026

DuwatBench: Bridging Language and Visual Heritage through an Arabic Calligraphy Benchmark for Multimodal Understanding.
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics, 2026

2025
Thinking Beyond Labels: Vocabulary-Free Fine-Grained Recognition using Reasoning-Augmented LMMs.
CoRR, December, 2025

EvoLMM: Self-Evolving Large Multimodal Models with Continuous Rewards.
CoRR, November, 2025

How Good are Foundation Models in Step-by-Step Embodied Reasoning?
CoRR, September, 2025

ARB: A Comprehensive Arabic Multimodal Reasoning Benchmark.
CoRR, May, 2025

LLM Post-Training: A Deep Dive into Reasoning Large Language Models.
CoRR, February, 2025

AIN: The Arabic INclusive Large Multimodal Model.
CoRR, February, 2025

Video Instance Segmentation in an Open-World.
Int. J. Comput. Vis., January, 2025

CAMEL-Bench: A Comprehensive Arabic LMM Benchmark.
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2025, Albuquerque, New Mexico, USA, April 29, 2025

DriveLMM-o1: A Step-by-Step Reasoning Dataset and Large Multimodal Model for Driving Scenario Understanding.
Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2025

Beyond Simple Edits: Composed Video Retrieval with Dense Modifications.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Vocabulary-Free Fine-Grained Visual Recognition via Enriched Contextually Grounded Vision-Language Model.
Proceedings of the IEEE/CVF International Conference on Computer Vision, ICCV 2025, 2025

Fann or Flop: A Multigenre, Multiera Benchmark for Arabic Poetry Understanding in LLMs.
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025


LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

Time Travel: A Comprehensive Benchmark to Evaluate LMMs on Historical and Cultural Artifacts.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

2024
All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages.
CoRR, 2024

CAMEL-Bench: A Comprehensive Arabic LMM Benchmark.
CoRR, 2024

Dynamic Pre-training: Towards Efficient and Scalable All-in-One Image Restoration.
CoRR, 2024

MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT.
CoRR, 2024

Composed Video Retrieval via Enriched Context and Discriminative Embeddings.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

XrayGPT: Chest Radiographs Summarization using Large Medical Vision-Language Models.
Proceedings of the 23rd Workshop on Biomedical Natural Language Processing, 2024

2023
XrayGPT: Chest Radiographs Summarization using Medical Vision-Language Models.
CoRR, 2023

3D Mitochondria Instance Segmentation with Spatio-Temporal Transformers.
Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2023, 2023

Arabic Mini-ClimateGPT : A Climate Change and Sustainability Tailored Arabic LLM.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Fast Video Instance Segmentation via Recurrent Encoder-Based Transformers.
Proceedings of the Computer Analysis of Images and Patterns, 2023

2022
Video Instance Segmentation via Multi-Scale Spatio-Temporal Split Attention Transformer.
Proceedings of the Computer Vision - ECCV 2022, 2022

2019
Motion Saliency Based Generative Adversarial Network for Underwater Moving Object Segmentation.
Proceedings of the 2019 IEEE International Conference on Image Processing, 2019

Image and Video Super Resolution using Recurrent Generative Adversarial Network.
Proceedings of the 16th IEEE International Conference on Advanced Video and Signal Based Surveillance, 2019


  Loading...