We stand with Ukraine

We stand with Ukraine

Shitao Xiao

Orcid: 0000-0003-2567-6843

According to our database¹, Shitao Xiao authored at least 63 papers between 2021 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2025

InfoFlow: Reinforcing Search Agent Via Reward Density Optimization.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

CoRR, October, 2025

MR<sup>2</sup>-Bench: Going Beyond Matching to Reasoning in Multimodal Retrieval.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, September, 2025

EditScore: Unlocking Online RL for Image Editing via High-Fidelity Reward Modeling.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

CoRR, September, 2025

Task-Aware KV Compression For Cost-Effective Long Video Understanding.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

CoRR, June, 2025

Video-XL-2: Towards Very Long-Video Understanding Through Task-Aware KV Sparsification.

[BibT_eX]

[DOI]

,

,

Zhengyang Liang

,

,

,

,

,

,

CoRR, June, 2025

OmniGen2: Exploration to Advanced Multimodal Generation.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, June, 2025

MMTEB: Massive Multilingual Text Embedding Benchmark.

[BibT_eX]

[DOI]

Kenneth C. Enevoldsen

,

,

,

,

,

,

,

,

Dominik Krzeminski

,

Genta Indra Winata

,

,

,

Mathieu Ciancone

,

Marion Schaeffer

,

Gabriel Sequeira

,

,

,

Jonathan Rystrøm

,

Roman Solomatin

,

,

,

Martin Bernstorff

,

,

Akshita Sukhlecha

,

,

,

Kranthi Kiran GV

,

,

,

Björn Plüster

,

Jan Philipp Harries

,

,

,

Mariya Hendriksen

,

,

Hippolyte Gisserot-Boukhlef

,

,

,

Konrad Wojtasik

,

,

,

,

,

,

Andrianos Michail

,

,

,

Aleksei Vatolin

,

,

,

,

Pranjal A. Chitale

,

Simone Tedeschi

,

,

,

Michael Günther

,

,

,

,

,

Gayatri Krishnakumar

,

,

,

Maria Tikhonova

,

,

Aleksandr Abramov

,

Malte Ostendorff

,

,

Simon Clematide

,

Lester James V. Miranda

,

Alena Fenogenova

,

,

Ruqiya Bin Safi

,

,

Alessia Borghini

,

Federico Cassano

,

,

,

,

,

,

,

Vaibhav Adlakha

,

,

,

Niklas Muennighoff

CoRR, February, 2025

EfficientLLM: Scalable Pruning-Aware Pretraining for Architecture-Agnostic Edge Language Models.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

CoRR, February, 2025

Matryoshka Re-Ranker: A Flexible Re-Ranking Architecture With Configurable Depth and Width.

[BibT_eX]

[DOI]

,

,

,

,

,

CoRR, January, 2025

Fitting Into Any Shape: A Flexible LLM-Based Re-Ranker With Configurable Depth and Width.

[BibT_eX]

[DOI]

,

,

,

,

Chen Jason Zhang

,

,

,

Proceedings of the ACM on Web Conference 2025, 2025

Lighter And Better: Towards Flexible Context Adaptation For Retrieval Augmented Generation.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

Proceedings of the Eighteenth ACM International Conference on Web Search and Data Mining, 2025

Long Context Compression with Activation Beacon.

[BibT_eX]

[DOI]

,

,

,

,

,

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

SpikeLLM: Scaling up Spiking Neural Network to Large Language Models via Saliency-based Spiking.

[BibT_eX]

[DOI]

,

,

,

David A. Clifton

,

,

,

,

,

,

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Making Text Embedders Few-Shot Learners.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

OmniGen: Unified Image Generation.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

MLVU: Benchmarking Multi-task Long Video Understanding.

[BibT_eX]

[DOI]

,

,

,

,

Zhengyang Liang

,

,

,

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

FineRAG: Fine-grained Retrieval-Augmented Text-to-Image Generation.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the 31st International Conference on Computational Linguistics, 2025

Any Information Is Just Worth One Single Screenshot: Unifying Search With Visualized Information Retrieval.

[BibT_eX]

[DOI]

,

,

Zhengyang Liang

,

,

,

,

Chen Jason Zhang

,

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

AIR-Bench: Automated Heterogeneous Information Retrieval Benchmark.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

MegaPairs: Massive Data Synthesis for Universal Multimodal Retrieval.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

Chen Jason Zhang

,

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024

Lighter And Better: Towards Flexible Context Adaptation For Retrieval Augmented Generation.

[BibT_eX]

[DOI]

,

,

,

,

,

CoRR, 2024

OmniGen: Unified Image Generation.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

CoRR, 2024

SpikeLLM: Scaling up Spiking Neural Network to Large Language Models via Saliency-based Spiking.

[BibT_eX]

[DOI]

,

,

,

David A. Clifton

,

,

,

,

CoRR, 2024

MLVU: A Comprehensive Benchmark for Multi-Task Long Video Understanding.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

CoRR, 2024

Extending Llama-3's Context Ten-Fold Overnight.

[BibT_eX]

[DOI]

,

,

,

,

,

,

CoRR, 2024

Extensible Embedding: A Flexible Multipler For LLM's Context Length.

[BibT_eX]

[DOI]

,

,

,

CoRR, 2024

BGE Landmark Embedding: A Chunking-Free Embedding Method For Retrieval Augmented Long-Context Large Language Models.

[BibT_eX]

[DOI]

,

,

,

CoRR, 2024

BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation.

[BibT_eX]

[DOI]

,

,

,

,

,

CoRR, 2024

Flexibly Scaling Large Language Models Contexts Through Extensible Tokenization.

[BibT_eX]

[DOI]

,

,

,

CoRR, 2024

Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon.

[BibT_eX]

[DOI]

,

,

,

,

,

CoRR, 2024

C-Pack: Packed Resources For General Chinese Embeddings.

[BibT_eX]

[DOI]

,

,

,

Niklas Muennighoff

,

,

Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2024

SpikeLM: Towards General Spike-Driven Language Modeling via Elastic Bi-Spiking Mechanisms.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Large Language Models as Foundations for Next-Gen Dense Retrieval: A Comprehensive Empirical Assessment.

[BibT_eX]

[DOI]

,

,

,

,

,

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

VISTA: Visualized Text Embedding For Universal Multi-Modal Retrieval.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

A Multi-Task Embedder For Retrieval Augmented LLMs.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

LM-Cocktail: Resilient Tuning of Language Models via Model Merging.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the Findings of the Association for Computational Linguistics, 2024

Landmark Embedding: A Chunking-Free Embedding Method For Retrieval Augmented Long-Context Large Language Models.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

Llama2Vec: Unsupervised Adaptation of Large Language Models for Dense Retrieval.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

M3-Embedding: Multi-Linguality, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation.

[BibT_eX]

[DOI]

,

,

,

,

,

Proceedings of the Findings of the Association for Computational Linguistics, 2024

2023

Making Large Language Models A Better Foundation For Dense Retrieval.

[BibT_eX]

[DOI]

,

,

,

CoRR, 2023

Retrieve Anything To Augment Large Language Models.

[BibT_eX]

[DOI]

,

,

,

,

CoRR, 2023

C-Pack: Packaged Resources To Advance General Chinese Embedding.

[BibT_eX]

[DOI]

,

,

,

Niklas Muennighoff

CoRR, 2023

LibVQ: A Toolkit for Optimizing Vector Quantization and Efficient Neural Retrieval.

[BibT_eX]

[DOI]

,

,

,

,

,

Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2023

Hybrid Inverted Index Is a Robust Accelerator for Dense Retrieval.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Diversity-aware Deep Ranking Network for Recommendation.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, 2023

RetroMAE-2: Duplex Masked Auto-Encoder For Pre-Training Retrieval-Oriented Language Models.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

2022

An overview of brain-like computing: Architecture, applications, and future trends.

[BibT_eX]

[DOI]

,

,

,

,

Frontiers Neurorobotics, 2022

RetroMAE v2: Duplex Masked Auto-Encoder For Pre-Training Retrieval-Oriented Language Models.

[BibT_eX]

[DOI]

,

CoRR, 2022

Distill-VQ: Learning Retrieval Oriented Vector Quantization By Distilling Knowledge from Dense Embeddings.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, 2022

A Mutually Reinforced Framework for Pretrained Sentence Embeddings.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

CoRR, 2022

Uni-Retriever: Towards Learning The Unified Embedding Based Retriever in Bing Sponsored Search.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

Premkumar Srinivasan

,

,

,

CoRR, 2022

LECF: recommendation via learnable edge collaborative filtering.

[BibT_eX]

[DOI]

,

,

,

,

,

Sci. China Inf. Sci., 2022

Progressively Optimized Bi-Granular Document Representation for Scalable Embedding Based Retrieval.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the WWW '22: The ACM Web Conference 2022, Virtual Event, Lyon, France, April 25, 2022

MINDSim: User Simulator for News Recommenders.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the WWW '22: The ACM Web Conference 2022, Virtual Event, Lyon, France, April 25, 2022

Distill-VQ: Learning Retrieval Oriented Vector Quantization By Distilling Knowledge from Dense Embeddings.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

Proceedings of the SIGIR '22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, July 11, 2022

Uni-Retriever: Towards Learning the Unified Embedding Based Retriever in Bing Sponsored Search.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

Premkumar Srinivasan

,

,

,

Proceedings of the KDD '22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, August 14, 2022

Training Large-Scale News Recommenders with Pretrained Language Models in the Loop.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the KDD '22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, August 14, 2022

RetroMAE: Pre-Training Retrieval-oriented Language Models Via Masked Auto-Encoder.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

2021

GraphFormers: GNN-nested Language Models for Linked Text Representation.

[BibT_eX]

[DOI]

,

,

,

,

,

CoRR, 2021

Search-oriented Differentiable Product Quantization.

[BibT_eX]

[DOI]

,

,

,

,

CoRR, 2021

Training Microsoft News Recommenders with Pretrained Language Models in the Loop.

[BibT_eX]

[DOI]

,

,

,

,

CoRR, 2021

GraphFormers: GNN-nested Transformers for Representation Learning on Textual Graph.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Matching-oriented Embedding Quantization For Ad-hoc Retrieval.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

Loading...