Chaoya Jiang
Orcid: 0009-0009-7282-159X
  According to our database1,
  Chaoya Jiang
  authored at least 21 papers
  between 2020 and 2025.
  
  
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
  2025
Decoupling Reasoning and Perception: An LLM-LMM Framework for Faithful Visual Reasoning.
    
  
    CoRR, September, 2025
    
  
HSSBench: Benchmarking Humanities and Social Sciences Ability for Multimodal Large Language Models.
    
  
    CoRR, June, 2025
    
  
VLM-R<sup>3</sup>: Region Recognition, Reasoning, and Refinement for Enhanced Multimodal Chain-of-Thought.
    
  
    CoRR, May, 2025
    
  
SymDPO: Boosting In-Context Learning of Large Multimodal Models with Symbol Demonstration Direct Preference Optimization.
    
  
    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025
    
  
  2024
    CoRR, 2024
    
  
MaVEn: An Effective Multi-granularity Hybrid Visual Encoding Framework for Multimodal Large Language Model.
    
  
    Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
    
  
Hal-Eval: A Universal and Fine-grained Hallucination Evaluation Framework for Large Vision Language Models.
    
  
    Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024
    
  
    Proceedings of the Twelfth International Conference on Learning Representations, 2024
    
  
    Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024
    
  
    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
    
  
    Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024
    
  
    Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024
    
  
  2023
    CoRR, 2023
    
  
Vision Langauge Pre-training by Contrastive Learning with Cross-Modal Similarity Regulation.
    
  
    CoRR, 2023
    
  
COPA : Efficient Vision-Language Pre-training through Collaborative Object- and Patch-Text Alignment.
    
  
    Proceedings of the 31st ACM International Conference on Multimedia, 2023
    
  
BUS : Efficient and Effective Vision-language Pre-training with Bottom-Up Patch Summarization.
    
  
    Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023
    
  
Vision Language Pre-training by Contrastive Learning with Cross-Modal Similarity Regulation.
    
  
    Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023
    
  
    Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023
    
  
  2022
TRIPS: Efficient Vision-and-Language Pre-training with Text-Relevant Image Patch Selection.
    
  
    Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022
    
  
  2020
Learn A Robust Representation For Cover Song Identification Via Aggregating Local And Global Music Temporal Context.
    
  
    Proceedings of the IEEE International Conference on Multimedia and Expo, 2020
    
  
Similarity Learning For Cover Song Identification Using Cross-Similarity Matrices of Multi-Level Deep Sequences.
    
  
    Proceedings of the 2020 IEEE International Conference on Acoustics, 2020