Shoubin Yu

Orcid: 0009-0006-1670-0054

According to our database1, Shoubin Yu authored at least 27 papers between 2021 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
Ego2Web: A Web Agent Benchmark Grounded in Egocentric Videos.
CoRR, March, 2026

VisionCoach: Reinforcing Grounded Video Reasoning via Visual-Perception Prompting.
CoRR, March, 2026

Balancing Faithfulness and Performance in Reasoning via Multi-Listener Soft Execution.
CoRR, February, 2026

When and How Much to Imagine: Adaptive Test-Time Scaling with World Models for Visual Spatial Reasoning.
CoRR, February, 2026

A Novel Approach to Evaluating the Effectiveness of Large Language Models for Multimodal Analysis of Embodied Learning in Classrooms.
Proceedings of the LAK26: 16th International Learning Analytics and Knowledge Conference, 2026

2025
Prune-Then-Plan: Step-Level Calibration for Stable Frontier Exploration in Embodied Question Answering.
CoRR, November, 2025

SciVideoBench: Benchmarking Scientific Video Reasoning in Large Multimodal Models.
CoRR, October, 2025

4D-LRM: Large Space-Time Reconstruction Model From and To Any View at Any Time.
CoRR, June, 2025

Movie Facts and Fibs (MF<sup>2</sup>): A Benchmark for Long Movie Understanding.
CoRR, June, 2025

Training-free Guidance in Text-to-Video Generation via Multimodal Planning and Structured Noise Initialization.
CoRR, April, 2025

A Multimodal Classroom Video Question-Answering Framework for Automated Understanding of Collaborative Learning.
Proceedings of the 27th International Conference on Multimodal Interaction, 2025

CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image And Video Generation.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

VEGGIE: Instructional Editing and Reasoning Video Concepts with Grounded Generation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

MEXA: Towards General Multimodal Reasoning with Dynamic Multi-Expert Aggregation.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2025, 2025

RACCooN: Versatile Instructional Video Editing with Auto-Generated Narratives.
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

Video-RTS: Rethinking Reinforcement Learning and Test-Time Scaling for Efficient and Enhanced Video Reasoning.
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024
Regularity Learning via Explicit Distribution Modeling for Skeletal Video Anomaly Detection.
IEEE Trans. Circuits Syst. Video Technol., August, 2024

RACCooN: Remove, Add, and Change Video Content with Auto-Generated Narratives.
CoRR, 2024

CREMA: Multimodal Compositional Video Reasoning via Efficient Modular Adaptation and Fusion.
CoRR, 2024

Zero-Shot Controllable Image-to-Video Animation via Motion Decomposition.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

A Simple LLM Framework for Long-Range Video Question-Answering.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

2023
Self-Chained Image-Language Model for Video Localization and Question Answering.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

2021
STAR: A Benchmark for Situated Reasoning in Real-World Videos.
Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, 2021


  Loading...