Shitao Xiao

Orcid: 0000-0003-2567-6843

According to our database1, Shitao Xiao authored at least 30 papers between 2021 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Extensible Embedding: A Flexible Multipler For LLM's Context Length.
CoRR, 2024

BGE Landmark Embedding: A Chunking-Free Embedding Method For Retrieval Augmented Long-Context Large Language Models.
CoRR, 2024

BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation.
CoRR, 2024

Flexibly Scaling Large Language Models Contexts Through Extensible Tokenization.
CoRR, 2024

Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon.
CoRR, 2024

2023
Making Large Language Models A Better Foundation For Dense Retrieval.
CoRR, 2023

LM-Cocktail: Resilient Tuning of Language Models via Model Merging.
CoRR, 2023

Retrieve Anything To Augment Large Language Models.
CoRR, 2023

C-Pack: Packaged Resources To Advance General Chinese Embedding.
CoRR, 2023

LibVQ: A Toolkit for Optimizing Vector Quantization and Efficient Neural Retrieval.
Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2023

Hybrid Inverted Index Is a Robust Accelerator for Dense Retrieval.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Diversity-aware Deep Ranking Network for Recommendation.
Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, 2023

RetroMAE-2: Duplex Masked Auto-Encoder For Pre-Training Retrieval-Oriented Language Models.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

2022
An overview of brain-like computing: Architecture, applications, and future trends.
Frontiers Neurorobotics, 2022

RetroMAE v2: Duplex Masked Auto-Encoder For Pre-Training Retrieval-Oriented Language Models.
CoRR, 2022

Distill-VQ: Learning Retrieval Oriented Vector Quantization By Distilling Knowledge from Dense Embeddings.
CoRR, 2022

A Mutually Reinforced Framework for Pretrained Sentence Embeddings.
CoRR, 2022

Uni-Retriever: Towards Learning The Unified Embedding Based Retriever in Bing Sponsored Search.
CoRR, 2022

LECF: recommendation via learnable edge collaborative filtering.
Sci. China Inf. Sci., 2022

Progressively Optimized Bi-Granular Document Representation for Scalable Embedding Based Retrieval.
Proceedings of the WWW '22: The ACM Web Conference 2022, Virtual Event, Lyon, France, April 25, 2022

MINDSim: User Simulator for News Recommenders.
Proceedings of the WWW '22: The ACM Web Conference 2022, Virtual Event, Lyon, France, April 25, 2022

Distill-VQ: Learning Retrieval Oriented Vector Quantization By Distilling Knowledge from Dense Embeddings.
Proceedings of the SIGIR '22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, July 11, 2022

Uni-Retriever: Towards Learning the Unified Embedding Based Retriever in Bing Sponsored Search.
Proceedings of the KDD '22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, August 14, 2022

Training Large-Scale News Recommenders with Pretrained Language Models in the Loop.
Proceedings of the KDD '22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, August 14, 2022

RetroMAE: Pre-Training Retrieval-oriented Language Models Via Masked Auto-Encoder.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

2021
GraphFormers: GNN-nested Language Models for Linked Text Representation.
CoRR, 2021

Search-oriented Differentiable Product Quantization.
CoRR, 2021

Training Microsoft News Recommenders with Pretrained Language Models in the Loop.
CoRR, 2021

GraphFormers: GNN-nested Transformers for Representation Learning on Textual Graph.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Matching-oriented Embedding Quantization For Ad-hoc Retrieval.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021


  Loading...