Peng Xia

Orcid: 0000-0003-2676-9128

Affiliations:
  • University of North Carolina at Chapel Hill, NC, USA
  • Monash University, Faculty of Information Technology, Melbourne, VIC, USA (PhD 2024)
  • Soochow University, School of Computer Science and Technology, Suzhou, China (former)


According to our database1, Peng Xia authored at least 19 papers between 2019 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
MDocAgent: A Multi-Modal Multi-Agent Framework for Document Understanding.
CoRR, March, 2025

MJ-VIDEO: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation.
CoRR, February, 2025

Anyprefer: An Agentic Framework for Preference Data Synthesis.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language Models.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

HGCLIP: Exploring Vision-Language Models with Graph Representations for Hierarchical Understanding.
Proceedings of the 31st International Conference on Computational Linguistics, 2025

Neighbor Does Matter: Density-Aware Contrastive Learning for Medical Semi-supervised Segmentation.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

Towards Realistic Semi-supervised Medical Image Classification.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024
MMedPO: Aligning Medical Vision-Language Models with Clinical-Aware Multimodal Preference Optimization.
CoRR, 2024

Diffusion Model Driven Test-Time Image Adaptation for Robust Skin Lesion Classification.
CoRR, 2024

CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Generalizing to Unseen Domains in Diabetic Retinopathy with Disentangled Representations.
Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2024, 2024

TP-DRSeg: Improving Diabetic Retinopathy Lesion Segmentation with Explicit Text-Prompts Assisted SAM.
Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2024, 2024

RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

OphNet: A Large-Scale Video Benchmark for Ophthalmic Surgical Workflow Understanding.
Proceedings of the Computer Vision - ECCV 2024, 2024

2023
LMPT: Prompt Tuning with Class-Specific Embedding Loss for Long-tailed Multi-Label Visual Recognition.
CoRR, 2023

NurViD: A Large Expert-Level Video Database for Nursing Procedure Activity Understanding.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

2022
Chinese grammatical error correction based on knowledge distillation.
CoRR, 2022

2019
Topic model with incremental vocabulary based on Belief Propagation.
Knowl. Based Syst., 2019


  Loading...