Haodong Duan

Orcid: 0000-0002-3052-4177

According to our database1, Haodong Duan authored at least 63 papers between 2017 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
Intern-S1: A Scientific Multimodal Foundation Model.
CoRR, August, 2025

MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents.
CoRR, July, 2025

OPT-BENCH: Evaluating LLM Agent on Large-Scale Search Spaces Optimization Problems.
CoRR, June, 2025

GOBench: Benchmarking Geometric Optics Generation and Understanding of MLLMs.
CoRR, June, 2025

Affordance Benchmark for MLLMs.
CoRR, June, 2025

MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence.
CoRR, May, 2025

Visual Agentic Reinforcement Fine-Tuning.
CoRR, May, 2025

GDI-Bench: A Benchmark for General Document Intelligence with Vision and Reasoning Decoupling.
CoRR, May, 2025

MM-IFEngine: Towards Multimodal Instruction Following.
CoRR, April, 2025

Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing.
CoRR, April, 2025

LEGO-Puzzles: How Good Are MLLMs at Multi-Step Spatial Reasoning?
CoRR, March, 2025

Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM.
CoRR, March, 2025

VisualPRM: An Effective Process Reward Model for Multimodal Reasoning.
CoRR, March, 2025

Information Density Principle for MLLM Benchmarks.
CoRR, March, 2025

Visual-RFT: Visual Reinforcement Fine-Tuning.
CoRR, March, 2025

OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference.
CoRR, February, 2025

VideoRoPE: What Makes for Good Video Rotary Position Embedding?
CoRR, February, 2025

Condor: Enhance LLM Alignment with Knowledge-Driven Data Synthesis and Refinement.
CoRR, January, 2025

BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning.
CoRR, January, 2025

MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Image Quality Assessment: From Human to Machine Preference.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Redundancy Principles for MLLMs Benchmarks.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

Condor: Enhance LLM Alignment with Knowledge-Driven Data Synthesis and Refinement.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Towards Storage-Efficient Visual Document Retrieval: An Empirical Study on Reducing Patch-Level Embeddings.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

2024
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions.
CoRR, 2024

MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs.
CoRR, 2024

CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and Evolution.
CoRR, 2024

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output.
CoRR, 2024

MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning.
CoRR, 2024

ShareGPT4Video: Improving Video Understanding and Generation with Better Captions.
CoRR, 2024

Are We on the Right Way for Evaluating Large Vision-Language Models?
CoRR, 2024

InternLM2 Technical Report.
CoRR, 2024

InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model.
CoRR, 2024

Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Are We on the Right Way for Evaluating Large Vision-Language Models?
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

ShareGPT4Video: Improving Video Understanding and Generation with Better Captions.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks.
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024

BotChat: Evaluating LLMs' Capabilities of Having Multi-Turn Dialogues.
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2024, 2024

VLMEvalKit: An Open-Source ToolKit for Evaluating Large Multi-Modality Models.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

MMBench: Is Your Multi-modal Model an All-Around Player?
Proceedings of the Computer Vision - ECCV 2024, 2024

MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

2023
InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition.
CoRR, 2023

SkeleTR: Towrads Skeleton-based Action Recognition in the Wild.
CoRR, 2023

JourneyDB: A Benchmark for Generative Image Understanding.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

SkeleTR: Towards Skeleton-based Action Recognition in the Wild.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Self-Supervised Action Representation Learning from Partial Spatio-Temporal Skeleton Sequences.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
DG-STGCN: Dynamic Spatial-Temporal Modeling for Skeleton-based Action Recognition.
CoRR, 2022

PYSKL: Towards Good Practices for Skeleton Action Recognition.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Mitigating Representation Bias in Action Recognition: Algorithms and Benchmarks.
Proceedings of the Computer Vision - ECCV 2022 Workshops, 2022

OCSampler: Compressing Videos to One Clip with Single-step Sampling.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

TransRank: Self-supervised Video Representation Learning via Ranking-based Transformation Recognition.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Revisiting Skeleton-based Action Recognition.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021
Revisiting Skeleton-based Action Recognition.
CoRR, 2021

2020
Omni-Sourced Webly-Supervised Learning for Video Recognition.
Proceedings of the Computer Vision - ECCV 2020, 2020

2019
TRB: A Novel Triplet Representation for Understanding 2D Human Body.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

2017
SRPGAN: Perceptual Generative Adversarial Network for Single Image Super Resolution.
CoRR, 2017


  Loading...