Mohamed Elhoseiny

Orcid: 0000-0001-9659-1551

According to our database1, Mohamed Elhoseiny authored at least 62 papers between 2019 and 2026.

Collaborative distances:

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

Online presence:

On csauthors.net:

Bibliography

2026
iMotion-LLM: Instruction-Conditioned Trajectory Generation.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2026

XProvence: Zero-Cost Multilingual Context Pruning for Retrieval-Augmented Generation.
Proceedings of the Advances in Information Retrieval, 2026

Step-by-step Layered Design Generation.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025
3DCoMPaT<sup>++</sup>: An Improved Large-Scale 3D Vision Dataset for Compositional Recognition.
IEEE Trans. Pattern Anal. Mach. Intell., December, 2025

Aberration-Aware Depth-From-Focus.
IEEE Trans. Pattern Anal. Mach. Intell., September, 2025

Local Masked Reconstruction for Efficient Self-Supervised Learning on High-Resolution Images.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2025

Temporal Model-Based Federated Active Medical Image Classification.
Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2025, 2025

From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

4D-Bench: Benchmarking Multi-Modal Large Language Models for 4D Object Understanding.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

WikiAutoGen: Towards Multi-Modal Wikipedia-Style Article Generation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Diffusion-Based Imaginative Coordination for Bimanual Manipulation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

AVTrustBench: Assessing and Enhancing Reliability and Robustness in Audio-Visual LLMs.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Aurelia: Test-Time Reasoning Distillation in Audio-Visual LLMs.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Kestrel: 3D Multimodal LLM for Part-Aware Grounded Description.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

InfiniBench: A Benchmark for Large Multi-Modal Models in Long-Form Movies and TV Shows.
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

StoryGPT-V: Large Language Models as Consistent Story Visualizers.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Document Haystacks: Vision-Language Reasoning Over Piles of 1000+ Documents.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024
A Hybrid Graph Network for Complex Activity Detection in Video.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024

Multimodal Representation and Retrieval [MRR 2024].
Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2024

No Culture Left Behind: ArtELingo-28, a Benchmark of WikiArt with Captions in 28 Languages.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Uni3DL: A Unified Model for 3D Vision-Language Understanding.
Proceedings of the Computer Vision - ECCV 2024, 2024

Affective Visual Dialog: A Large-Scale Benchmark for Emotional Reasoning Based on Visually Grounded Conversations.
Proceedings of the Computer Vision - ECCV 2024, 2024

MEERKAT: Audio-Visual Large Language Model for Grounding in Space and Time.
Proceedings of the Computer Vision - ECCV 2024, 2024

Goldfish: Vision-Language Understanding of Arbitrarily Long Videos.
Proceedings of the Computer Vision - ECCV 2024, 2024

Overcoming Generic Knowledge Loss with Selective Parameter Update.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

ShapeWalk: Compositional Shape Editing Through Language-Guided Chains.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

AI Art Neural Constellation: Revealing the Collective and Contrastive State of AI-Generated and Human Art.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Adversarial Text to Continuous Image Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

EmoTalker: Audio Driven Emotion Aware Talking Head Generation.
Proceedings of the Computer Vision - ACCV 2024, 2024

ImageCaptioner2: Image Captioner for Image Captioning Bias Amplification Assessment.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Continual Zero-Shot Learning through Semantically Guided Generative Random Walks.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

OxfordTVG-HIC: Can Machine Make Humorous Captions from Images?
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

FishNet: A Large-scale Dataset and Benchmark for Fish Recognition, Detection, and Functional Trait Prediction.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Exploring Open-Vocabulary Semantic Segmentation from CLIP Vision Encoder Distillation Only.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

HRS-Bench: Holistic, Reliable and Scalable Benchmark for Text-to-Image Models.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

MammalNet: A Large-Scale Video Benchmark for Mammal Recognition and Behavior Understanding.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
Efficiently Disentangle Causal Representations.
CoRR, 2022

3DRefTransformer: Fine-Grained Object Identification in Real-World Scenes Using Natural Language.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2022

Exploring Hierarchical Graph Representation for Large-Scale Zero-Shot Image Classification.
Proceedings of the Computer Vision - ECCV 2022, 2022

Social-Implicit: Rethinking Trajectory Prediction Evaluation and The Effectiveness of Implicit Maximum Likelihood Estimation.
Proceedings of the Computer Vision - ECCV 2022, 2022

3D CoMPaT: Composition of Materials on Parts of 3D Things.
Proceedings of the Computer Vision - ECCV 2022, 2022

StyleGAN-V: A Continuous Video Generator with the Price, Image Quality and Perks of StyleGAN2.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

It is Okay to Not Be Okay: Overcoming Emotional Bias in Affective Image Captioning by Contrastive Data Collection.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

RelTransformer: A Transformer-Based Long-Tail Visual Relationship Recognition.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021
Domain-Aware Continual Zero-Shot Learning.
CoRR, 2021

RelTransformer: Balancing the Visual Relationship Detection from Local Context, Scene and Memory.
CoRR, 2021

Imaginative Walks: Generative Random Walk Deviation Loss for Improved Unseen Learning Representation.
CoRR, 2021

Aligning Latent and Image Spaces to Connect the Unconnectable.
CoRR, 2021

VisualGPT: Data-efficient Image Captioning by Balancing Visual Input and Linguistic Knowledge from Pretraining.
CoRR, 2021

Exploring Long Tail Visual Relationship Recognition with Large Vocabulary.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Adversarial Generation of Continuous Images.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

ArtEmis: Affective Language for Visual Art.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020
Temporal Positive-unlabeled Learning for Biomedical Hypothesis Generation via Risk Estimation.
CoRR, 2020

Normalization Matters in Zero-Shot Learning.
CoRR, 2020

Inner Ensemble Nets.
CoRR, 2020

Efficient long-distance relation extraction with DG-SpanBERT.
CoRR, 2020

Long-tail Visual Relationship Recognition with a Visiolinguistic Hubless Loss.
CoRR, 2020

ReferIt3D: Neural Listeners for Fine-Grained 3D Object Identification in Real-World Scenes.
Proceedings of the Computer Vision - ECCV 2020, 2020

Social-STGCNN: A Social Spatio-Temporal Graph Convolutional Neural Network for Human Trajectory Prediction.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019
Uncertainty-guided Continual Learning with Bayesian Neural Networks.
CoRR, 2019

Semi-Supervised Few-Shot Learning with Local and Global Consistency.
CoRR, 2019


  Loading...