Ji Zhang

Orcid: 0000-0002-3835-7975

Affiliations:
  • Alibaba Group, DAMO Academy, Hangzhou, China


According to our database1, Ji Zhang authored at least 72 papers between 2018 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Unifying Latent and Lexicon Representations for Effective Video-Text Retrieval.
CoRR, 2024

Budget-Constrained Tool Learning with Planning.
CoRR, 2024

Hal-Eval: A Universal and Fine-grained Hallucination Evaluation Framework for Large Vision Language Models.
CoRR, 2024

PANDA: Preference Adaptation for Enhancing Domain-Specific Abilities of LLMs.
CoRR, 2024

Model Composition for Multimodal Large Language Models.
CoRR, 2024

Browse and Concentrate: Comprehending Multimodal Content via prior-LLM Context Fusion.
CoRR, 2024

Meta Ranking: Less Capable Language Models are Capable for Single Response Judgement.
CoRR, 2024

Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception.
CoRR, 2024

Small LLMs Are Weak Tool Learners: A Multi-LLM Agent.
CoRR, 2024

Knowledge Distillation for Closed-Source Language Models.
CoRR, 2024

SiTunes: A Situational Music Recommendation Dataset with Physiological and Psychological Signals.
Proceedings of the 2024 ACM SIGIR Conference on Human Information Interaction and Retrieval, 2024

TiMix: Text-Aware Image Mixing for Effective Vision-Language Pre-training.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Achieving Human Parity on Visual Question Answering.
ACM Trans. Inf. Syst., 2023

Hallucination Augmented Contrastive Learning for Multimodal Large Language Model.
CoRR, 2023

mPLUG-PaperOwl: Scientific Diagram Analysis with the Multimodal Large Language Model.
CoRR, 2023

An LLM-free Multi-dimensional Benchmark for MLLMs Hallucination Evaluation.
CoRR, 2023

mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration.
CoRR, 2023

MCC-KD: Multi-CoT Consistent Knowledge Distillation.
CoRR, 2023

UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model.
CoRR, 2023

ModelScope-Agent: Building Your Customizable Agent System with Open-source Large Language Models.
CoRR, 2023

Evaluation and Analysis of Hallucination in Large Vision-Language Models.
CoRR, 2023

CValues: Measuring the Values of Chinese Large Language Models from Safety to Responsibility.
CoRR, 2023

mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding.
CoRR, 2023

Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks.
CoRR, 2023

AMTSS: An Adaptive Multi-Teacher Single-Student Knowledge Distillation Framework For Multilingual Language Inference.
CoRR, 2023

mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality.
CoRR, 2023

ChatPLUG: Open-Domain Generative Dialogue System with Internet-Augmented Instruction Tuning for Digital Human.
CoRR, 2023

mPLUG-Octopus: The Versatile Assistant Empowered by A Modularized End-to-End Multimodal LLM.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Learning Semantics-Grounded Vocabulary Representation for Video-Text Retrieval.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

COPA : Efficient Vision-Language Pre-training through Collaborative Object- and Patch-Text Alignment.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video.
Proceedings of the International Conference on Machine Learning, 2023

HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

ModelScope-Agent: Building Your Customizable Agent System with Open-source Large Language Models.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

MCC-KD: Multi-CoT Consistent Knowledge Distillation.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Improving Seq2Seq Grammatical Error Correction via Decoding Interventions.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

DialoGPS: Dialogue Path Sampling in Continuous Semantic Space for Data Augmentation in Multi-Turn Conversations.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Distinguish Before Answer: Generating Contrastive Explanation as Knowledge for Commonsense Question Answering.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

2022
HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training.
CoRR, 2022

Zero-shot Image Captioning by Anchor-augmented Vision-Language Space Alignment.
CoRR, 2022

mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections.
CoRR, 2022

Auto-MLM: Improved Contrastive Learning for Self-supervised Multi-lingual Knowledge Retrieval.
CoRR, 2022

Multi-label Masked Language Modeling on Zero-shot Code-switched Sentiment Analysis.
Proceedings of the SIGIR '22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, July 11, 2022

MGIMN: Multi-Grained Interactive Matching Network for Few-shot Text Classification.
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

Comprehensive Relationship Reasoning for Composed Query Based Image Retrieval.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text Retrieval.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Relative Alignment Network for Source-Free Multimodal Video Domain Adaptation.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

DictBERT: Dictionary Description Knowledge Enhanced Language Model Pre-training via Contrastive Learning.
Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, 2022

CAT-MNER: Multimodal Named Entity Recognition with Knowledge-Refined Cross-Modal Attention.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2022

mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Shifting More Attention to Visual Backbone: Query-modulated Refinement Networks for End-to-End Visual Grounding.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Incorporating Casual Analysis into Diversified and Logical Response Generation.
Proceedings of the 29th International Conference on Computational Linguistics, 2022

Continual Few-shot Intent Detection.
Proceedings of the 29th International Conference on Computational Linguistics, 2022

Generating Persuasive Responses to Customer Reviews with Multi-Source Prior Knowledge in E-commerce.
Proceedings of the 31st ACM International Conference on Information & Knowledge Management, 2022

2021
Achieving Human Parity on Visual Question Answering.
CoRR, 2021

SPMoE: Generate Multiple Pattern-Aware Outputs with Sparse Pattern Mixture of Experts.
CoRR, 2021

OneStop QAMaker: Extract Question-Answer Pairs from Text in a One-Stop Approach.
CoRR, 2021

AliMe DA: A Data Augmentation Framework for Question Answering in Cold-start Scenarios.
Proceedings of the SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2021

AliMe Avatar: Multi-modal Content Production and Presentation for Live-streaming E-commerce.
Proceedings of the SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2021

ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Segment, Mask, and Predict: Augmenting Chinese Word Segmentation with Self-Supervision.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

AliMe MKG: A Multi-modal Knowledge Graph for Live-streaming E-commerce.
Proceedings of the CIKM '21: The 30th ACM International Conference on Information and Knowledge Management, Virtual Event, Queensland, Australia, November 1, 2021

K-AID: Enhancing Pre-trained Language Models with Domain Knowledge for Question Answering.
Proceedings of the CIKM '21: The 30th ACM International Conference on Information and Knowledge Management, Virtual Event, Queensland, Australia, November 1, 2021

GGP: A Graph-based Grouping Planner for Explicit Control of Long Text Generation.
Proceedings of the CIKM '21: The 30th ACM International Conference on Information and Knowledge Management, Virtual Event, Queensland, Australia, November 1, 2021

KACE: Generating Knowledge Aware Contrastive Explanations for Natural Language Inference.
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

2020
AliMe KG: Domain Knowledge Graph Construction and Application in E-commerce.
CoRR, 2020

Keep it Consistent: Topic-Aware Storytelling from an Image Stream via Iterative Multi-agent Communication.
Proceedings of the 28th International Conference on Computational Linguistics, 2020

AliMeKG: Domain Knowledge Graph Construction and Application in E-commerce.
Proceedings of the CIKM '20: The 29th ACM International Conference on Information and Knowledge Management, 2020

Query-to-Session Matching: Do NOT Forget History and Future during Response Selection for Multi-Turn Dialogue Systems.
Proceedings of the CIKM '20: The 29th ACM International Conference on Information and Knowledge Management, 2020

2019
Keep it Consistent: Topic-Aware Storytelling from an Image Stream via Iterative Multi-agent Communication.
CoRR, 2019

A Deep Cascade Model for Multi-Document Reading Comprehension.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018
Semi-Autoregressive Neural Machine Translation.
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31, 2018


  Loading...