Zheng Liu

Orcid: 0000-0002-0405-2348

Affiliations:
  • Beijing Academy of Artificial Intelligence, China
  • Huawei Poisson Lab, Shenzhen, China
  • Microsoft Research Asia, Beijing, China
  • Hong Kong University of Science & Technology, Department of Computer Science and Engineering, Hong Kong


According to our database1, Zheng Liu authored at least 151 papers between 2014 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
Pre-Trained Models for Search and Recommendation: Introduction to the Special Issue - Part 2.
ACM Trans. Inf. Syst., September, 2025

Loki's Dance of Illusions: A Comprehensive Survey of Hallucination in Large Language Models.
CoRR, July, 2025

Task-Aware KV Compression For Cost-Effective Long Video Understanding.
CoRR, June, 2025

Video-XL-2: Towards Very Long-Video Understanding Through Task-Aware KV Sparsification.
CoRR, June, 2025

OmniGen2: Exploration to Advanced Multimodal Generation.
CoRR, June, 2025

VideoDeepResearch: Long Video Understanding With Agentic Tool Using.
CoRR, June, 2025

Fact in Fragments: Deconstructing Complex Claims via LLM-based Atomic Fact Extraction and Verification.
CoRR, June, 2025

Towards Effective Code-Integrated Reasoning.
CoRR, May, 2025

SimpleDeepSearcher: Deep Information Seeking via Web-Powered Reasoning Trajectory Synthesis.
CoRR, May, 2025

Single LLM, Multiple Roles: A Unified Retrieval-Augmented Generation Framework Using Role-Specific Token Optimization.
CoRR, May, 2025

Towards A Generalist Code Embedding Model Based On Massive Data Synthesis.
CoRR, May, 2025

Pre-Trained Models for Search and Recommendation: Introduction to the Special Issue - Part 1.
ACM Trans. Inf. Syst., March, 2025

Challenging the Boundaries of Reasoning: An Olympiad-Level Math Benchmark for Large Language Models.
CoRR, March, 2025

Video-XL-Pro: Reconstructive Token Compression for Extremely Long Video Understanding.
CoRR, March, 2025

Memory-enhanced Retrieval Augmentation for Long Video Understanding.
CoRR, March, 2025

An Empirical Study on Eliciting and Improving R1-like Reasoning Models.
CoRR, March, 2025

MMTEB: Massive Multilingual Text Embedding Benchmark.
CoRR, February, 2025

HawkBench: Investigating Resilience of RAG Methods on Stratified Information-Seeking Tasks.
CoRR, February, 2025

MomentSeeker: A Comprehensive Benchmark and A Strong Baseline For Moment Retrieval Within Long Videos.
CoRR, February, 2025

Reinforced Information Retrieval.
CoRR, February, 2025

Does RAG Really Perform Bad For Long-Context Processing?
CoRR, February, 2025

Any Information Is Just Worth One Single Screenshot: Unifying Search With Visualized Information Retrieval.
CoRR, February, 2025

O1 Embedder: Let Retrievers Think Before Action.
CoRR, February, 2025

Matryoshka Re-Ranker: A Flexible Re-Ranking Architecture With Configurable Depth and Width.
CoRR, January, 2025

Virgo: A Preliminary Exploration on Reproducing o1-like MLLM.
CoRR, January, 2025

MemoRAG: Boosting Long Context Processing with Global Memory-Enhanced Retrieval Augmentation.
Proceedings of the ACM on Web Conference 2025, 2025

Fitting Into Any Shape: A Flexible LLM-Based Re-Ranker With Configurable Depth and Width.
Proceedings of the ACM on Web Conference 2025, 2025

Lighter And Better: Towards Flexible Context Adaptation For Retrieval Augmented Generation.
Proceedings of the Eighteenth ACM International Conference on Web Search and Data Mining, 2025

Tackling the Length Barrier: Dynamic Context Browsing for Knowledge-Intensive Task.
Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, V.1, 2025

Long Context Compression with Activation Beacon.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Making Text Embedders Few-Shot Learners.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

OmniGen: Unified Image Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Video-XL: Extra-Long Vision Language Model for Hour-Scale Video Understanding.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

Unveiling the Ignorance of MLLMs: Seeing Clearly, Answering Incorrectly.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

MLVU: Benchmarking Multi-task Long Video Understanding.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

FineRAG: Fine-grained Retrieval-Augmented Text-to-Image Generation.
Proceedings of the 31st International Conference on Computational Linguistics, 2025

How Credible Is an Answer From Retrieval-Augmented LLMs? Investigation and Evaluation With Multi-Hop QA.
Proceedings of the 31st International Conference on Computational Linguistics, 2025

Boosting Long-Context Information Seeking via Query-Guided Activation Refilling.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Any Information Is Just Worth One Single Screenshot: Unifying Search With Visualized Information Retrieval.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Reinforced IR: A Self-Boosting Framework For Domain-Adapted Information Retrieval.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

AIR-Bench: Automated Heterogeneous Information Retrieval Benchmark.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

MegaPairs: Massive Data Synthesis for Universal Multimodal Retrieval.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

MRR-FV: Unlocking Complex Fact Verification with Multi-Hop Retrieval and Reasoning.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024
When large language models meet personalization: perspectives of challenges and opportunities.
World Wide Web (WWW), July, 2024

Boosting Long-Context Management via Query-Guided Activation Refilling.
CoRR, 2024

Imitate, Explore, and Self-Improve: A Reproduction Report on Slow-thinking Reasoning Systems.
CoRR, 2024

Technical Report: Enhancing LLM Reasoning with Reward-guided Tree Search.
CoRR, 2024

AssistRAG: Boosting the Potential of Large Language Models with an Intelligent Information Assistant.
CoRR, 2024

Feint and Attack: Attention-Based Strategies for Jailbreaking and Protecting LLMs.
CoRR, 2024

Elephant in the Room: Unveiling the Impact of Reward Model Quality in Alignment.
CoRR, 2024

Lighter And Better: Towards Flexible Context Adaptation For Retrieval Augmented Generation.
CoRR, 2024

Video-XL: Extra-Long Vision Language Model for Hour-Scale Video Understanding.
CoRR, 2024

OmniGen: Unified Image Generation.
CoRR, 2024

Trustworthiness in Retrieval-Augmented Generation Systems: A Survey.
CoRR, 2024

MemoRAG: Moving towards Next-Gen RAG Via Memory-Inspired Knowledge Discovery.
CoRR, 2024

SEA-SQL: Semantic-Enhanced Text-to-SQL with Adaptive Refinement.
CoRR, 2024

MLVU: A Comprehensive Benchmark for Multi-Task Long Video Understanding.
CoRR, 2024

Are Long-LLMs A Necessity For Long-Context Tasks?
CoRR, 2024

Extending Llama-3's Context Ten-Fold Overnight.
CoRR, 2024

Understanding Privacy Risks of Embeddings Induced by Large Language Models.
CoRR, 2024

Extensible Embedding: A Flexible Multipler For LLM's Context Length.
CoRR, 2024

BGE Landmark Embedding: A Chunking-Free Embedding Method For Retrieval Augmented Long-Context Large Language Models.
CoRR, 2024

BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation.
CoRR, 2024

Flexibly Scaling Large Language Models Contexts Through Extensible Tokenization.
CoRR, 2024

Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon.
CoRR, 2024

Information Retrieval Meets Large Language Models.
Proceedings of the Companion Proceedings of the ACM on Web Conference 2024, 2024


Metacognitive Retrieval-Augmented Large Language Models.
Proceedings of the ACM on Web Conference 2024, 2024

Generative Retrieval via Term Set Generation.
Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2024

C-Pack: Packed Resources For General Chinese Embeddings.
Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2024

Boosting the Potential of Large Language Models with an Intelligent Information Assistant.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

ChatRetriever: Adapting Large Language Models for Generalized and Robust Conversational Dense Retrieval.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

RAG-Studio: Towards In-Domain Adaptation of Retrieval Augmented Generation Through Self-Alignment.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

Large Language Models as Foundations for Next-Gen Dense Retrieval: A Comprehensive Empirical Assessment.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Negating Negatives: Alignment with Human Negative Samples via Distributional Dispreference Optimization.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

VISTA: Visualized Text Embedding For Universal Multi-Modal Retrieval.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

A Multi-Task Embedder For Retrieval Augmented LLMs.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

LM-Cocktail: Resilient Tuning of Language Models via Model Merging.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

Grounding Language Model with Chunking-Free In-Context Retrieval.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

Landmark Embedding: A Chunking-Free Embedding Method For Retrieval Augmented Long-Context Large Language Models.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

Llama2Vec: Unsupervised Adaptation of Large Language Models for Dense Retrieval.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

M3-Embedding: Multi-Linguality, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

INTERS: Unlocking the Power of Large Language Models in Search with Instruction Tuning.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023
Semi-Supervised Variational User Identity Linkage via Noise-Aware Self-Learning.
IEEE Trans. Knowl. Data Eng., October, 2023

An Adaptive Graph Pre-training Framework for Localized Collaborative Filtering.
ACM Trans. Inf. Syst., April, 2023

CDSM: Cascaded Deep Semantic Matching on Textual Graphs Leveraging Ad-hoc Neighbor Selection.
ACM Trans. Intell. Syst. Technol., April, 2023

Reinforcement Routing on Proximity Graph for Efficient Recommendation.
ACM Trans. Inf. Syst., January, 2023

Making Large Language Models A Better Foundation For Dense Retrieval.
CoRR, 2023

Retrieve Anything To Augment Large Language Models.
CoRR, 2023

C-Pack: Packaged Resources To Advance General Chinese Embedding.
CoRR, 2023

When Large Language Models Meet Personalization: Perspectives of Challenges and Opportunities.
CoRR, 2023

Term-Sets Can Be Strong Document Identifiers For Auto-Regressive Search Engines.
CoRR, 2023

WebBrain: Learning to Generate Factually Correct Articles for Queries by Grounding on Large Web Corpus.
CoRR, 2023

Cooperative Retriever and Ranker in Deep Recommenders.
Proceedings of the ACM Web Conference 2023, 2023

RecStudio: Towards a Highly-Modularized Recommender System.
Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2023

LibVQ: A Toolkit for Optimizing Vector Quantization and Efficient Neural Retrieval.
Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2023

Constructing Tree-based Index for Efficient and Effective Dense Retrieval.
Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2023

Hybrid Inverted Index Is a Robust Accelerator for Dense Retrieval.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Longtriever: a Pre-trained Long Text Encoder for Dense Document Retrieval.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Towards Efficient and Effective Transformers for Sequential Recommendation.
Proceedings of the Database Systems for Advanced Applications, 2023

RetroMAE-2: Duplex Masked Auto-Encoder For Pre-Training Retrieval-Oriented Language Models.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

2022
RetroMAE v2: Duplex Masked Auto-Encoder For Pre-Training Retrieval-Oriented Language Models.
CoRR, 2022

Bi-Phase Enhanced IVFPQ for Time-Efficient Ad-hoc Retrieval.
CoRR, 2022

Pre-training for Information Retrieval: Are Hyperlinks Fully Explored?
CoRR, 2022

A Neural Corpus Indexer for Document Retrieval.
CoRR, 2022

RetroMAE: Pre-training Retrieval-oriented Transformers via Masked Auto-Encoder.
CoRR, 2022

Distill-VQ: Learning Retrieval Oriented Vector Quantization By Distilling Knowledge from Dense Embeddings.
CoRR, 2022

A Mutually Reinforced Framework for Pretrained Sentence Embeddings.
CoRR, 2022

Uni-Retriever: Towards Learning The Unified Embedding Based Retriever in Bing Sponsored Search.
CoRR, 2022

GateFormer: Speeding Up News Feed Recommendation with Input Gated Transformers.
CoRR, 2022

Progressively Optimized Bi-Granular Document Representation for Scalable Embedding Based Retrieval.
Proceedings of the WWW '22: The ACM Web Conference 2022, Virtual Event, Lyon, France, April 25, 2022

MINDSim: User Simulator for News Recommenders.
Proceedings of the WWW '22: The ACM Web Conference 2022, Virtual Event, Lyon, France, April 25, 2022

Distill-VQ: Learning Retrieval Oriented Vector Quantization By Distilling Knowledge from Dense Embeddings.
Proceedings of the SIGIR '22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, July 11, 2022

Forest-based Deep Recommender.
Proceedings of the SIGIR '22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, July 11, 2022

Ada-Ranker: A Data Distribution Adaptive Ranking Paradigm for Sequential Recommendation.
Proceedings of the SIGIR '22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, July 11, 2022

A Neural Corpus Indexer for Document Retrieval.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Recommender Forest for Efficient Retrieval.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Uni-Retriever: Towards Learning the Unified Embedding Based Retriever in Bing Sponsored Search.
Proceedings of the KDD '22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, August 14, 2022

Training Large-Scale News Recommenders with Pretrained Language Models in the Loop.
Proceedings of the KDD '22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, August 14, 2022

RetroMAE: Pre-Training Retrieval-oriented Language Models Via Masked Auto-Encoder.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Coarse-to-Fine: Hierarchical Multi-task Learning for Natural Language Understanding.
Proceedings of the 29th International Conference on Computational Linguistics, 2022

Anisotropic Additive Quantization for Fast Inner Product Search.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
Semi-Supervised Variational User Identity Linkage via Noise-Aware Self-Learning.
CoRR, 2021

GraphFormers: GNN-nested Language Models for Linked Text Representation.
CoRR, 2021

Hybrid Encoder: Towards Efficient and Precise Native AdsRecommendation via Hybrid Transformer Encoding Networks.
CoRR, 2021

Search-oriented Differentiable Product Quantization.
CoRR, 2021

Training Microsoft News Recommenders with Pretrained Language Models in the Loop.
CoRR, 2021

Multi-Interest-Aware User Modeling for Large-Scale Sequential Recommendations.
CoRR, 2021

AdsGNN: Behavior-Graph Augmented Relevance Modeling in Sponsored Search.
Proceedings of the SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2021

Lighter and Better: Low-Rank Decomposed Self-Attention Networks for Next-Item Recommendation.
Proceedings of the SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2021

GraphFormers: GNN-nested Transformers for Representation Learning on Textual Graph.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Reinforced Anchor Knowledge Graph Generation for News Recommendation Reasoning.
Proceedings of the KDD '21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2021

Matching-oriented Embedding Quantization For Ad-hoc Retrieval.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

Leveraging Bidding Graphs for Advertiser-Aware Relevance Modeling in Sponsored Search.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2021, 2021

2020
LightRec: A Memory and Search-Efficient Recommender System.
Proceedings of the WWW '20: The Web Conference 2020, Taipei, Taiwan, April 20-24, 2020, 2020

Leveraging Demonstrations for Reinforcement Recommendation Reasoning over Knowledge Graphs.
Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, 2020

Octopus: Comprehensive and Elastic User Representation for the Generation of Recommendation Candidates.
Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, 2020

Sampling-Decomposable Generative Adversarial Recommender.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Fine-grained Interest Matching for Neural News Recommendation.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

2019
A Novel User Representation Paradigm for Making Personalized Candidate Retrieval.
CoRR, 2019

Hi-Fi Ark: Deep User Representation via High-Fidelity Archive Network.
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019

Neural News Recommendation with Long- and Short-term User Representations.
Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019

2018
Context-aware Academic Collaborator Recommendation.
Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018

Realtime Traffic Speed Estimation with Sparse Crowdsourced Data.
Proceedings of the 34th IEEE International Conference on Data Engineering, 2018

2017
Worker Recommendation for Crowdsourced Q&A Services: A Triple-Factor Aware Approach.
Proc. VLDB Endow., 2017

Speaker Direction-of-Arrival Estimation Based on Frequency-Independent Beampattern.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Tuning Crowdsourced Human Computation.
Proceedings of the 33rd IEEE International Conference on Data Engineering, 2017

2016
Tuning Crowdsourced Human Computation.
CoRR, 2016

Mutual benefit aware task assignment in a bipartite labor market.
Proceedings of the 32nd IEEE International Conference on Data Engineering, 2016

2015
Cleaning uncertain data with a noisy crowd.
Proceedings of the 31st IEEE International Conference on Data Engineering, 2015

2014
gMission: A General Spatial Crowdsourcing Platform.
Proc. VLDB Endow., 2014


  Loading...