Wenqi Shao
Orcid: 0000-0003-3781-4086
According to our database1,
Wenqi Shao
authored at least 121 papers
between 2019 and 2025.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
2025
From Diagnosis to Improvement: Probing Spatio-Physical Reasoning in Vision Language Models.
CoRR, August, 2025
MDK12-Bench: A Comprehensive Evaluation of Multimodal Large Language Models on Multidisciplinary Exams.
CoRR, August, 2025
CoRR, July, 2025
CoRR, July, 2025
CoRR, July, 2025
TinyLVLM-eHub: Towards Comprehensive and Efficient Evaluation for Large Vision-Language Models.
IEEE Trans. Big Data, June, 2025
InternSpatial: A Comprehensive Dataset for Spatial Reasoning in Vision-Language Models.
CoRR, June, 2025
UniFork: Exploring Modality Alignment for Unified Multimodal Understanding and Generation.
CoRR, June, 2025
Flow-Anything: Learning Real-World Optical Flow Estimation from Large-Scale Single-view Images.
CoRR, June, 2025
CoRR, June, 2025
Truly Assessing Fluid Intelligence of Large Language Models through Dynamic Reasoning Evaluation.
CoRR, June, 2025
MM-PRM: Enhancing Multimodal Mathematical Reasoning with Scalable Step-Level Supervision.
CoRR, May, 2025
CoRR, May, 2025
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models.
CoRR, April, 2025
MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language Models.
CoRR, April, 2025
CoRR, April, 2025
IEEE Trans. Pattern Anal. Mach. Intell., March, 2025
PEBench: A Fictitious Dataset to Benchmark Machine Unlearning for Multimodal Large Language Models.
CoRR, March, 2025
MPBench: A Comprehensive Multimodal Reasoning Benchmark for Process Errors Identification.
CoRR, March, 2025
CoRR, March, 2025
FCaS: Fine-grained Cardiac Image Synthesis based on 3D Template Conditional Diffusion Model.
CoRR, March, 2025
MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning.
CoRR, March, 2025
Temporal Overlapping Prediction: A Self-supervised Pre-training Method for LiDAR Moving Object Segmentation.
CoRR, March, 2025
CoRR, January, 2025
B-AVIBench: Toward Evaluating the Robustness of Large Vision-Language Model on Black-Box Adversarial Visual-Instructions.
IEEE Trans. Inf. Forensics Secur., 2025
Dynamic Multimodal Evaluation with Flexible Complexity by Vision-Language Bootstrapping.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
Lumina-T2X: Scalable Flow-based Large Diffusion Transformer for Flexible Resolution Generation.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025
Forensics-Bench: A Comprehensive Forgery Detection Benchmark Suite for Large Vision Language Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025
DexHandDiff: Interaction-aware Diffusion Planning for Adaptive Dexterous Manipulation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025
JiSAM: Alleviate Labeling Burden and Corner Case Problems in Autonomous Driving via Minimal Real-World Data.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025
MPBench: A Comprehensive Multimodal Reasoning Benchmark for Process Errors Identification.
Proceedings of the Findings of the Association for Computational Linguistics, 2025
Proceedings of the Findings of the Association for Computational Linguistics, 2025
HiAgent: Hierarchical Working Memory Management for Solving Long-Horizon Agent Tasks with Large Language Model.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025
2024
Int. J. Comput. Vis., December, 2024
CoRR, 2024
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling.
CoRR, 2024
CLAP: Unsupervised 3D Representation Learning for Fusion 3D Perception via Curvature Sampling and Prototype Learning.
CoRR, 2024
TREND: Unsupervised 3D Representation Learning via Temporal Forecasting for LiDAR Perception.
CoRR, 2024
DexDiffuser: Interaction-aware Diffusion Planning for Adaptive Dexterous Manipulation.
CoRR, 2024
GATE OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation.
CoRR, 2024
EMOS: <i>E</i>mbodiment-aware Heterogeneous <i>M</i>ulti-robot <i>O</i>perating <i>S</i>ystem with LLM Agents.
CoRR, 2024
CoRR, 2024
Dynamic Multimodal Evaluation with Flexible Complexity by Vision-Language Bootstrapping.
CoRR, 2024
ZipVL: Efficient Large Vision-Language Models with Dynamic Token Sparsification and KV Cache Compression.
CoRR, 2024
Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation.
CoRR, 2024
CoRR, 2024
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models.
CoRR, 2024
CoRR, 2024
CoRR, 2024
CoRR, 2024
Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers.
CoRR, 2024
ConvBench: A Multi-Turn Conversation Evaluation Benchmark with Hierarchical Capability for Large Vision-Language Models.
CoRR, 2024
AVIBench: Towards Evaluating the Robustness of Large Vision-Language Model on Adversarial Visual-Instructions.
CoRR, 2024
RoboScript: Code Generation for Free-Form Manipulation Tasks across Real and Simulation.
CoRR, 2024
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models.
CoRR, 2024
CoRR, 2024
ChartAssisstant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning.
CoRR, 2024
Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing Reliability, Reproducibility, and Practicality.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
ConvBench: A Multi-Turn Conversation Evaluation Benchmark with Hierarchical Ablation Capability for Large Vision-Language Models.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
SearchLVLMs: A Plug-and-Play Framework for Augmenting Large Vision-Language Models by Searching Up-to-Date Internet Knowledge.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI.
Proceedings of the Forty-first International Conference on Machine Learning, 2024
Proceedings of the Forty-first International Conference on Machine Learning, 2024
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models.
Proceedings of the Forty-first International Conference on Machine Learning, 2024
Proceedings of the Forty-first International Conference on Machine Learning, 2024
BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity Allocation.
Proceedings of the Twelfth International Conference on Learning Representations, 2024
Proceedings of the Twelfth International Conference on Learning Representations, 2024
Proceedings of the Twelfth International Conference on Learning Representations, 2024
Proceedings of the IEEE International Conference on Acoustics, 2024
Proceedings of the Computer Vision - ECCV 2024 Workshops, 2024
SPHINX: A Mixer of Weights, Visual Embeddings and Image Scales for Multi-modal Large Language Models.
Proceedings of the Computer Vision - ECCV 2024, 2024
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
ChartAssistant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning.
Proceedings of the Findings of the Association for Computational Linguistics, 2024
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024
2023
CoRR, 2023
SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models.
CoRR, 2023
CoRR, 2023
CoRR, 2023
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023
Proceedings of the Eleventh International Conference on Learning Representations, 2023
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
2022
CoRR, 2022
Proceedings of the Tenth International Conference on Learning Representations, 2022
Not All Models Are Equal: Predicting Model Transferability in a Self-challenging Fisher Space.
Proceedings of the Computer Vision - ECCV 2022, 2022
2021
CoRR, 2021
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021
Proceedings of the 38th International Conference on Machine Learning, 2021
Proceedings of the 38th International Conference on Machine Learning, 2021
2020
Int. J. Comput. Vis., 2020
Proceedings of the 37th International Conference on Machine Learning, 2020
2019
Proceedings of the 36th International Conference on Machine Learning, 2019
Proceedings of the 7th International Conference on Learning Representations, 2019
Differentiable Learning-to-Group Channels via Groupable Convolutional Neural Networks.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019
Proceedings of the 30th British Machine Vision Conference 2019, 2019