Ali Vosoughi

Orcid: 0000-0003-1014-2937

According to our database¹, Ali Vosoughi authored at least 20 papers between 2023 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

Video Understanding With Large Language Models: A Survey.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., February, 2026

Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025

PromptReverb: Multimodal Room Impulse Response Generation Through Latent Rectified Flow Matching.

[BibT_eX]

[DOI]

CoRR, October, 2025

Diagnosing Visual Reasoning: Challenges, Insights, and a Path Forward.

[BibT_eX]

[DOI]

CoRR, October, 2025

Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models.

[BibT_eX]

[DOI]

CoRR, October, 2025

OPENXRD: A Comprehensive Benchmark and Enhancement Framework for LLM/MLLM XRD Question Answering.

[BibT_eX]

[DOI]

CoRR, July, 2025

Can Sound Replace Vision in LLaVA With Token Substitution?

[BibT_eX]

[DOI]

CoRR, June, 2025

I<sup>2</sup>G: Generating Instructional Illustrations via Text-Conditioned Diffusion.

[BibT_eX]

[DOI]

CoRR, May, 2025

VERIFY: A Benchmark of Visual Explanation and Reasoning for Investigating Multimodal Reasoning Fidelity.

[BibT_eX]

[DOI]

CoRR, March, 2025

MMPerspective: Do MLLMs Understand Perspective? A Comprehensive Benchmark for Perspective Perception, Reasoning, and Robustness.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

Quality Over Quantity? LLM-Based Curation for a Data-Efficient Audio-Video Foundation Model.

[BibT_eX]

[DOI]

Ali Vosoughi

Dimitra Emmanouilidou

Hannes Gamper

Proceedings of the 33rd European Signal Processing Conference, 2025

2024

Cross Modality Bias in Visual Question Answering: A Causal View With Possible Worlds VQA.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2024

OSCaR: Object State Captioning and State Change Representation.

[BibT_eX]

[DOI]

CoRR, 2024

OSCaR: Object State Captioning and State Change Representation.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2024, 2024

EAGLE: Egocentric AGgregated Language-video Engine.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Learning Audio Concepts from Counterfactual Natural Language.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

2023

Video Understanding with Large Language Models: A Survey.

[BibT_eX]

[DOI]

CoRR, 2023

Separating Invisible Sounds Toward Universal Audiovisual Scene-Aware Sound Separation.

[BibT_eX]

[DOI]

CoRR, 2023

MISAR: A Multimodal Instructional System with Augmented Reality.

[BibT_eX]

[DOI]

CoRR, 2023

Unveiling Cross Modality Bias in Visual Question Answering: A Causal View with Possible Worlds VQA.

[BibT_eX]

[DOI]

CoRR, 2023

Ali Vosoughi

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...