Yang Shi

Orcid: 0009-0003-9241-236X

Affiliations:
  • Peking University, Beijing, China


According to our database1, Yang Shi authored at least 26 papers between 2025 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

Online presence:

On csauthors.net:

Bibliography

2026
OpenWorldLib: A Unified Codebase and Definition of Advanced World Models.
CoRR, April, 2026

Agentic-MME: What Agentic Capability Really Brings to Multimodal Intelligence?
CoRR, April, 2026

VTC-Bench: Evaluating Agentic Multimodal Models via Compositional Visual Tool Chaining.
CoRR, March, 2026

OmniSIFT: Modality-Asymmetric Token Compression for Efficient Omni-modal Large Language Models.
CoRR, February, 2026

Research on World Models Is Not Merely Injecting World Knowledge into Specific Tasks.
CoRR, February, 2026

DiaDem: Advancing Dialogue Descriptions in Audiovisual Video Captioning for Multimodal Large Language Models.
CoRR, January, 2026

CoF-T2I: Video Models as Pure Visual Reasoners for Text-to-Image Generation.
CoRR, January, 2026

Detecting Unobserved Confounders: A Kernelized Regression Approach.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025
GRAN-TED: Generating Robust, Aligned, and Nuanced Text Embedding for Diffusion Models.
CoRR, December, 2025

Hybrid Attribution Priors for Explainable and Robust Model Training.
CoRR, December, 2025

Scone: Bridging Composition and Distinction in Subject-Driven Image Generation via Unified Understanding-Generation Modeling.
CoRR, December, 2025

The Unseen Bias: How Norm Discrepancy in Pre-Norm MLLMs Leads to Visual Information Loss.
CoRR, December, 2025

Monet: Reasoning in Latent Visual Space Beyond Images and Language.
CoRR, November, 2025

When Modalities Conflict: How Unimodal Reasoning Uncertainty Governs Preference Dynamics in MLLMs.
CoRR, November, 2025

MorphoBench: A Benchmark with Difficulty Adaptive to Model Reasoning.
CoRR, October, 2025

AVoCaDO: An Audiovisual Video Captioner Driven by Temporal Orchestration.
CoRR, October, 2025

OpenGPT-4o-Image: A Comprehensive Dataset for Advanced Image Generation and Editing.
CoRR, September, 2025

RealUnify: Do Unified Models Truly Benefit from Unification? A Comprehensive Benchmark.
CoRR, September, 2025

BaseReward: A Strong Baseline for Multimodal Reward Model.
CoRR, September, 2025

VersaVid-R1: A Versatile Video Understanding and Reasoning Model from Question Answering to Captioning Tasks.
CoRR, June, 2025

MME-VideoOCR: Evaluating OCR-Based Capabilities of Multimodal LLMs in Video Scenarios.
CoRR, May, 2025

MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models.
CoRR, April, 2025

MM-RLHF: The Next Step Forward in Multimodal LLM Alignment.
CoRR, February, 2025

Mavors: Multi-granularity Video Representation for Multimodal Large Language Model.
Proceedings of the 33rd ACM International Conference on Multimedia, 2025

Debiasing Multimodal Large Language Models via Penalization of Language Priors.
Proceedings of the 33rd ACM International Conference on Multimedia, 2025

MM-RLHF: The Next Step Forward in Multimodal LLM Alignment.
Proceedings of the Forty-second International Conference on Machine Learning, 2025


  Loading...