We stand with Ukraine

We stand with Ukraine

Shoubin Yu

Orcid: 0009-0006-1670-0054

According to our database¹, Shoubin Yu authored at least 29 papers between 2021 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

STORM: Internalized Modeling for Spatial-Temporal Reasoning in Video-Language Models.

[DOI]

,

,

,

,

,

,

,

,

,

,

CoRR, May, 2026

EgoMemReason: A Memory-Driven Reasoning Benchmark for Long-Horizon Egocentric Video Understanding.

[DOI]

,

,

,

,

,

,

,

Gedas Bertasius

,

CoRR, May, 2026

Ego2Web: A Web Agent Benchmark Grounded in Egocentric Videos.

[DOI]

,

,

,

,

Srinivas Sunkara

,

,

,

,

CoRR, March, 2026

VisionCoach: Reinforcing Grounded Video Reasoning via Visual-Perception Prompting.

[DOI]

,

,

,

CoRR, March, 2026

Balancing Faithfulness and Performance in Reasoning via Multi-Listener Soft Execution.

[DOI]

Nithin Sivakumaran

,

,

,

,

,

,

Elias Stengel-Eskin

CoRR, February, 2026

When and How Much to Imagine: Adaptive Test-Time Scaling with World Models for Visual Spatial Reasoning.

[DOI]

,

,

,

,

,

,

CoRR, February, 2026

A Novel Approach to Evaluating the Effectiveness of Large Language Models for Multimodal Analysis of Embodied Learning in Classrooms.

[DOI]

Joyce Horn Fonteles

,

Nithin Sivakumaran

,

,

,

,

Elias Stengel-Eskin

,

,

,

Proceedings of the LAK26: 16th International Learning Analytics and Knowledge Conference, 2026

2025

Prune-Then-Plan: Step-Level Calibration for Stable Frontier Exploration in Embodied Question Answering.

[DOI]

,

,

,

,

,

CoRR, November, 2025

SciVideoBench: Benchmarking Scientific Video Reasoning in Large Multimodal Models.

[DOI]

,

Taojiannan Yang

,

,

Lincoln Spencer

,

,

,

Serena Yeung-Levy

,

CoRR, October, 2025

Movie Facts and Fibs (MF<sup>2</sup>): A Benchmark for Long Movie Understanding.

[DOI]

CoRR, June, 2025

Training-free Guidance in Text-to-Video Generation via Multimodal Planning and Structured Noise Initialization.

[DOI]

,

,

,

,

,

CoRR, April, 2025

4D-LRM: Large Space-Time Reconstruction Model From and To Any View at Any Time.

[DOI]

,

,

,

,

,

,

,

,

,

Kalyan Sunkavalli

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

A Multimodal Classroom Video Question-Answering Framework for Automated Understanding of Collaborative Learning.

[DOI]

Nithin Sivakumaran

,

,

,

,

,

,

Elias Stengel-Eskin

,

,

,

Cindy E. Hmelo-Silver

,

Jonathan P. Rowe

,

James C. Lester

,

Proceedings of the 27th International Conference on Multimodal Interaction, 2025

CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion.

[DOI]

,

,

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image And Video Generation.

[DOI]

,

,

,

,

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel.

[DOI]

,

,

,

,

,

,

,

,

,

,

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

VEGGIE: Instructional Editing and Reasoning Video Concepts with Grounded Generation.

[DOI]

,

,

,

,

,

,

,

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

MEXA: Towards General Multimodal Reasoning with Dynamic Multi-Expert Aggregation.

[DOI]

,

,

,

,

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2025, 2025

RACCooN: Versatile Instructional Video Editing with Auto-Generated Narratives.

[DOI]

,

,

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

Video-RTS: Rethinking Reinforcement Learning and Test-Time Scaling for Efficient and Enhanced Video Reasoning.

[DOI]

,

,

,

Md Mohaiminul Islam

,

Gedas Bertasius

,

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos.

[DOI]

,

,

Elias Stengel-Eskin

,

,

,

Gedas Bertasius

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level.

[DOI]

,

,

,

Taojiannan Yang

,

Lincoln Spencer

,

,

Ajmal Saeed Mian

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024

Regularity Learning via Explicit Distribution Modeling for Skeletal Video Anomaly Detection.

[DOI]

,

,

,

,

,

,

,

,

IEEE Trans. Circuits Syst. Video Technol., August, 2024

RACCooN: Remove, Add, and Change Video Content with Auto-Generated Narratives.

[DOI]

,

,

CoRR, 2024

CREMA: Multimodal Compositional Video Reasoning via Efficient Modular Adaptation and Fusion.

[DOI]

,

,

CoRR, 2024

Zero-Shot Controllable Image-to-Video Animation via Motion Decomposition.

[DOI]

,

Jacob Zhiyuan Fang

,

,

Gunnar A. Sigurdsson

,

Vicente Ordonez

,

Robinson Piramuthu

,

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

A Simple LLM Framework for Long-Range Video Question-Answering.

[DOI]

,

,

Md Mohaiminul Islam

,

,

,

,

Gedas Bertasius

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

2023

Self-Chained Image-Language Model for Video Localization and Question Answering.

[DOI]

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

2021

STAR: A Benchmark for Situated Reasoning in Real-World Videos.

[DOI]

,

,

,

,

Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, 2021

Loading...