Shuhuai Ren

Orcid: 0009-0001-9998-864X

According to our database1, Shuhuai Ren authored at least 41 papers between 2009 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
SpeechParaling-Bench: A Comprehensive Benchmark for Paralinguistic-Aware Speech Generation.
CoRR, April, 2026

TEMPLE: Incentivizing Temporal Understanding of Video Large Language Models via Progressive Pre-SFT Alignment.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025
GroundingME: Exposing the Visual Grounding Gap in MLLMs through Multi-Dimensional Evaluation.
CoRR, December, 2025

MiMo-Embodied: X-Embodied Foundation Model Technical Report.
CoRR, November, 2025

MiMo-VL Technical Report.
CoRR, June, 2025

MiMo: Unlocking the Reasoning Potential of Language Model - From Pretraining to Posttraining.
CoRR, May, 2025

TEMPLE:Temporal Preference Learning of Video LLMs via Difficulty Scheduling and Pre-SFT Alignment.
CoRR, March, 2025

UVE: Are MLLMs Unified Evaluators for AI-Generated Videos?
CoRR, March, 2025

Next Block Prediction: Video Generation via Semi-Autoregressive Modeling.
CoRR, February, 2025

TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos.
Proceedings of the 33rd ACM International Conference on Multimedia, 2025

Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

RICO: Improving Accuracy and Completeness in Image Recaptioning via Visual Reconstruction.
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

Parallelized Autoregressive Visual Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024
Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey.
CoRR, 2024

Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis.
CoRR, 2024

DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models.
CoRR, 2024

Towards Multimodal Video Paragraph Captioning Models Robust to Missing Modality.
CoRR, 2024

LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation?
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024

VITATECS: A Diagnostic Dataset for Temporal Concept Understanding of Video-Language Models.
Proceedings of the Computer Vision - ECCV 2024, 2024

TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

TempCompass: Do Video LLMs Really Understand Videos?
Proceedings of the Findings of the Association for Computational Linguistics, 2024

PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

2023
Towards End-to-End Embodied Decision Making via Multi-modal Large Language Model: Explorations with GPT4-Vision and Beyond.
CoRR, 2023

M<sup>3</sup>IT: A Large-Scale Dataset towards Multi-Modal Multilingual Instruction Tuning.
CoRR, 2023

Prompt Pre-Training with Twenty-Thousand Classes for Open-Vocabulary Visual Recognition.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

FETV: A Benchmark for Fine-Grained Evaluation of Open-Domain Text-to-Video Generation.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Delving into the Openness of CLIP.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

2022
Rethinking the Openness of CLIP.
CoRR, 2022

2021
CUGE: A Chinese Language Understanding and Generation Evaluation Benchmark.
CoRR, 2021

Text AutoAugment: Learning Compositional Augmentation Policy for Text Classification.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

Dynamic Knowledge Distillation for Pre-trained Language Models.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

CascadeBERT: Accelerating Inference of Pre-trained Language Models via Calibrated Complete Models Cascade.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2021, 2021

Learning Relation Alignment for Calibrated Cross-modal Retrieval.
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

2020
Accelerating Pre-trained Language Models via Calibrated Cascade.
CoRR, 2020

DCA: Diversified Co-attention Towards Informative Live Video Commenting.
Proceedings of the Natural Language Processing and Chinese Computing, 2020

2019
Diversified Co-Attention towards Informative Live Video Commenting.
CoRR, 2019

Generating Natural Language Adversarial Examples through Probability Weighted Word Saliency.
Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019

2010
The Virtual Learning Commons Architecture Based on Semantic Technologies.
Proceedings of the New Horizons in Web-Based Learning - ICWL 2010 Workshops, 2010

2009
From information commons to knowledge commons: Building a collaborative knowledge sharing environment for innovative communities.
Electron. Libr., 2009


  Loading...