Mohamed Elhoseiny

Orcid: 0000-0001-9659-1551

According to our database¹, Mohamed Elhoseiny authored at least 62 papers between 2019 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Bibliography

2026

iMotion-LLM: Instruction-Conditioned Trajectory Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2026

XProvence: Zero-Cost Multilingual Context Pruning for Retrieval-Augmented Generation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Information Retrieval, 2026

Step-by-step Layered Design Generation.

[BibT_eX]

[DOI]

Balaji Vasan Srinivasan

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025

3DCoMPaT<sup>++</sup>: An Improved Large-Scale 3D Vision Dataset for Compositional Recognition.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., December, 2025

Aberration-Aware Depth-From-Focus.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., September, 2025

Local Masked Reconstruction for Efficient Self-Supervised Learning on High-Resolution Images.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2025

Temporal Model-Based Federated Active Medical Image Classification.

[BibT_eX]

[DOI]

Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2025, 2025

From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

4D-Bench: Benchmarking Multi-Modal Large Language Models for 4D Object Understanding.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

WikiAutoGen: Towards Multi-Modal Wikipedia-Style Article Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Diffusion-Based Imaginative Coordination for Bimanual Manipulation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

AVTrustBench: Assessing and Enhancing Reliability and Robustness in Audio-Visual LLMs.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Aurelia: Test-Time Reasoning Distillation in Audio-Visual LLMs.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

Kestrel: 3D Multimodal LLM for Part-Aware Grounded Description.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

InfiniBench: A Benchmark for Large Multi-Modal Models in Long-Form Movies and TV Shows.

[BibT_eX]

[DOI]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

StoryGPT-V: Large Language Models as Consistent Story Visualizers.

[BibT_eX]

[DOI]

Xiaoqian Shen

Mohamed Elhoseiny

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Document Haystacks: Vision-Language Reasoning Over Piles of 1000+ Documents.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024

A Hybrid Graph Network for Complex Activity Detection in Video.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024

Multimodal Representation and Retrieval [MRR 2024].

[BibT_eX]

[DOI]

Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2024

No Culture Left Behind: ArtELingo-28, a Benchmark of WikiArt with Captions in 28 Languages.

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Uni3DL: A Unified Model for 3D Vision-Language Understanding.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Affective Visual Dialog: A Large-Scale Benchmark for Emotional Reasoning Based on Visually Grounded Conversations.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

MEERKAT: Audio-Visual Large Language Model for Grounding in Space and Time.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Goldfish: Vision-Language Understanding of Arbitrarily Long Videos.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Overcoming Generic Knowledge Loss with Selective Parameter Update.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

ShapeWalk: Compositional Shape Editing Through Language-Guided Chains.

[BibT_eX]

[DOI]

Habib Slim

Mohamed Elhoseiny

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

AI Art Neural Constellation: Revealing the Collective and Contrastive State of AI-Generated and Human Art.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Adversarial Text to Continuous Image Generation.

[BibT_eX]

[DOI]

Chamuditha Jayanga Galappaththige

Mohamed Elhoseiny

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

EmoTalker: Audio Driven Emotion Aware Talking Head Generation.

[BibT_eX]

[DOI]

Xiaoqian Shen

Faizan Farooq Khan

Mohamed Elhoseiny

Proceedings of the Computer Vision - ACCV 2024, 2024

ImageCaptioner2: Image Captioner for Image Captioning Bias Amplification Assessment.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

Continual Zero-Shot Learning through Semantically Guided Generative Random Walks.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

OxfordTVG-HIC: Can Machine Make Humorous Captions from Images?

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

FishNet: A Large-scale Dataset and Benchmark for Fish Recognition, Detection, and Functional Trait Prediction.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Exploring Open-Vocabulary Semantic Segmentation from CLIP Vision Encoder Distillation Only.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

HRS-Bench: Holistic, Reliable and Scalable Benchmark for Text-to-Image Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

MammalNet: A Large-Scale Video Benchmark for Mammal Recognition and Behavior Understanding.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

Efficiently Disentangle Causal Representations.

[BibT_eX]

[DOI]

CoRR, 2022

3DRefTransformer: Fine-Grained Object Identification in Real-World Scenes Using Natural Language.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2022

Exploring Hierarchical Graph Representation for Large-Scale Zero-Shot Image Classification.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

Social-Implicit: Rethinking Trajectory Prediction Evaluation and The Effectiveness of Implicit Maximum Likelihood Estimation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

3D CoMPaT: Composition of Materials on Parts of 3D Things.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

StyleGAN-V: A Continuous Video Generator with the Price, Image Quality and Perks of StyleGAN2.

[BibT_eX]

[DOI]

Ivan Skorokhodov

Sergey Tulyakov

Mohamed Elhoseiny

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

It is Okay to Not Be Okay: Overcoming Emotional Bias in Affective Image Captioning by Contrastive Data Collection.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

RelTransformer: A Transformer-Based Long-Tail Visual Relationship Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021

Domain-Aware Continual Zero-Shot Learning.

[BibT_eX]

[DOI]

Kai Yi

Mohamed Elhoseiny

CoRR, 2021

RelTransformer: Balancing the Visual Relationship Detection from Local Context, Scene and Memory.

[BibT_eX]

[DOI]

CoRR, 2021

Imaginative Walks: Generative Random Walk Deviation Loss for Improved Unseen Learning Representation.

[BibT_eX]

[DOI]

CoRR, 2021

Aligning Latent and Image Spaces to Connect the Unconnectable.

[BibT_eX]

[DOI]

Ivan Skorokhodov

Grigorii Sotnikov

Mohamed Elhoseiny

CoRR, 2021

VisualGPT: Data-efficient Image Captioning by Balancing Visual Input and Linguistic Knowledge from Pretraining.

[BibT_eX]

[DOI]

CoRR, 2021

Exploring Long Tail Visual Relationship Recognition with Large Vocabulary.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Adversarial Generation of Continuous Images.

[BibT_eX]

[DOI]

Ivan Skorokhodov

Savva Ignatyev

Mohamed Elhoseiny

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

ArtEmis: Affective Language for Visual Art.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020

Temporal Positive-unlabeled Learning for Biomedical Hypothesis Generation via Risk Estimation.

[BibT_eX]

[DOI]

CoRR, 2020

Normalization Matters in Zero-Shot Learning.

[BibT_eX]

[DOI]

Ivan Skorokhodov

Mohamed Elhoseiny

CoRR, 2020

Inner Ensemble Nets.

[BibT_eX]

[DOI]

Abduallah A. Mohamed

Muhammed Mohaimin Sadiq

Ehab AlBadawy

Mohamed Elhoseiny

Christian G. Claudel

CoRR, 2020

Efficient long-distance relation extraction with DG-SpanBERT.

[BibT_eX]

[DOI]

CoRR, 2020

Long-tail Visual Relationship Recognition with a Visiolinguistic Hubless Loss.

[BibT_eX]

[DOI]

CoRR, 2020

ReferIt3D: Neural Listeners for Fine-Grained 3D Object Identification in Real-World Scenes.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

Social-STGCNN: A Social Spatio-Temporal Graph Convolutional Neural Network for Human Trajectory Prediction.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019

Uncertainty-guided Continual Learning with Bayesian Neural Networks.

[BibT_eX]

[DOI]

CoRR, 2019

Semi-Supervised Few-Shot Learning with Local and Global Consistency.

[BibT_eX]

[DOI]

CoRR, 2019

Mohamed Elhoseiny

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...