Omkar Thawakar

According to our database1, Omkar Thawakar authored at least 25 papers between 2019 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
Beyond Simple Edits: Composed Video Retrieval with Dense Modifications.
CoRR, August, 2025

Vocabulary-free Fine-grained Visual Recognition via Enriched Contextually Grounded Vision-Language Model.
CoRR, July, 2025

Fann or Flop: A Multigenre, Multiera Benchmark for Arabic Poetry Understanding in LLMs.
CoRR, May, 2025

ARB: A Comprehensive Arabic Multimodal Reasoning Benchmark.
CoRR, May, 2025

DriveLMM-o1: A Step-by-Step Reasoning Dataset and Large Multimodal Model for Driving Scenario Understanding.
CoRR, March, 2025

LLM Post-Training: A Deep Dive into Reasoning Large Language Models.
CoRR, February, 2025

AIN: The Arabic INclusive Large Multimodal Model.
CoRR, February, 2025

Video Instance Segmentation in an Open-World.
Int. J. Comput. Vis., January, 2025

CAMEL-Bench: A Comprehensive Arabic LMM Benchmark.
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2025, Albuquerque, New Mexico, USA, April 29, 2025


LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

Time Travel: A Comprehensive Benchmark to Evaluate LMMs on Historical and Cultural Artifacts.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

2024
All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages.
CoRR, 2024

CAMEL-Bench: A Comprehensive Arabic LMM Benchmark.
CoRR, 2024

Dynamic Pre-training: Towards Efficient and Scalable All-in-One Image Restoration.
CoRR, 2024

MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT.
CoRR, 2024

Composed Video Retrieval via Enriched Context and Discriminative Embeddings.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

XrayGPT: Chest Radiographs Summarization using Large Medical Vision-Language Models.
Proceedings of the 23rd Workshop on Biomedical Natural Language Processing, 2024

2023
XrayGPT: Chest Radiographs Summarization using Medical Vision-Language Models.
CoRR, 2023

3D Mitochondria Instance Segmentation with Spatio-Temporal Transformers.
Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2023, 2023

Arabic Mini-ClimateGPT : A Climate Change and Sustainability Tailored Arabic LLM.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Fast Video Instance Segmentation via Recurrent Encoder-Based Transformers.
Proceedings of the Computer Analysis of Images and Patterns, 2023

2022
Video Instance Segmentation via Multi-Scale Spatio-Temporal Split Attention Transformer.
Proceedings of the Computer Vision - ECCV 2022, 2022

2019
Motion Saliency Based Generative Adversarial Network for Underwater Moving Object Segmentation.
Proceedings of the 2019 IEEE International Conference on Image Processing, 2019

Image and Video Super Resolution using Recurrent Generative Adversarial Network.
Proceedings of the 16th IEEE International Conference on Advanced Video and Signal Based Surveillance, 2019


  Loading...