Shoubin Yu

Orcid: 0009-0006-1670-0054

According to our database¹, Shoubin Yu authored at least 20 papers between 2021 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2025

SciVideoBench: Benchmarking Scientific Video Reasoning in Large Multimodal Models.

[BibT_eX]

[DOI]

CoRR, October, 2025

Video-RTS: Rethinking Reinforcement Learning and Test-Time Scaling for Efficient and Enhanced Video Reasoning.

[BibT_eX]

[DOI]

CoRR, July, 2025

4D-LRM: Large Space-Time Reconstruction Model From and To Any View at Any Time.

[BibT_eX]

[DOI]

CoRR, June, 2025

MEXA: Towards General Multimodal Reasoning with Dynamic Multi-Expert Aggregation.

[BibT_eX]

[DOI]

CoRR, June, 2025

Movie Facts and Fibs (MF<sup>2</sup>): A Benchmark for Long Movie Understanding.

[BibT_eX]

[DOI]

CoRR, June, 2025

Training-free Guidance in Text-to-Video Generation via Multimodal Planning and Structured Noise Initialization.

[BibT_eX]

[DOI]

CoRR, April, 2025

VEGGIE: Instructional Editing and Reasoning of Video Concepts with Grounded Generation.

[BibT_eX]

[DOI]

CoRR, March, 2025

A Multimodal Classroom Video Question-Answering Framework for Automated Understanding of Collaborative Learning.

[BibT_eX]

[DOI]

Cindy E. Hmelo-Silver

Jonathan P. Rowe

James C. Lester

Mohit Bansal

Proceedings of the 27th International Conference on Multimodal Interaction, 2025

CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion.

[BibT_eX]

[DOI]

Shoubin Yu

Jaehong Yoon

Mohit Bansal

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image And Video Generation.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024

Regularity Learning via Explicit Distribution Modeling for Skeletal Video Anomaly Detection.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., August, 2024

RACCooN: Remove, Add, and Change Video Content with Auto-Generated Narratives.

[BibT_eX]

[DOI]

Jaehong Yoon

Shoubin Yu

Mohit Bansal

CoRR, 2024

CREMA: Multimodal Compositional Video Reasoning via Efficient Modular Adaptation and Fusion.

[BibT_eX]

[DOI]

Shoubin Yu

Jaehong Yoon

Mohit Bansal

CoRR, 2024

Zero-Shot Controllable Image-to-Video Animation via Motion Decomposition.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

A Simple LLM Framework for Long-Range Video Question-Answering.

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

2023

Self-Chained Image-Language Model for Video Localization and Question Answering.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

2021

STAR: A Benchmark for Situated Reasoning in Real-World Videos.

[BibT_eX]

[DOI]

Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, 2021

Shoubin Yu

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...