Xinyu Zhang

Orcid: 0009-0009-0756-8110

Affiliations:
  • University of Waterloo, David R. Cheriton School of Computer Science, Canada


According to our database1, Xinyu Zhang authored at least 44 papers between 2020 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
BrowseComp-Plus: A More Fair and Transparent Evaluation Benchmark of Deep-Research Agent.
CoRR, August, 2025

MAGMaR Shared Task System Description: Video Retrieval with OmniEmbed.
CoRR, June, 2025

Fixing Data That Hurts Performance: Cascading LLMs to Relabel Hard Negatives for Robust Information Retrieval.
CoRR, May, 2025

A Survey of Model Architectures in Information Retrieval.
CoRR, February, 2025

MMTEB: Massive Multilingual Text Embedding Benchmark.
CoRR, February, 2025

Tomato, Tomahto, Tomate: Do Multilingual Language Models Understand Based on Subword-Level Semantic Concepts?
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2025, Albuquerque, New Mexico, USA, April 29, 2025

Rank-Without-GPT: Building GPT-Independent Listwise Rerankers on Open-Source Large Language Models.
Proceedings of the Advances in Information Retrieval, 2025

The Impact of Incidental Multilingual Text on Cross-Lingual Transfer in Monolingual Retrieval.
Proceedings of the Advances in Information Retrieval, 2025

2024
Toward Best Practices for Training Multilingual Dense Retrieval Models.
ACM Trans. Inf. Syst., March, 2024

Tomato, Tomahto, Tomate: Measuring the Role of Shared Semantics among Subwords in Multilingual Language Models.
CoRR, 2024

Debatrix: Multi-dimensinal Debate Judge with Iterative Chronological Analysis Based on LLM.
CoRR, 2024


CELI: Simple yet Effective Approach to Enhance Out-of-Domain Generalization of Cross-Encoders.
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Short Papers, 2024

Found in the Middle: Permutation Self-Consistency Improves Listwise Ranking in Large Language Models.
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024

"Knowing When You Don't Know": A Multilingual Relevance Assessment Dataset for Robust Retrieval-Augmented Generation.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

Words Worth a Thousand Pictures: Measuring and Understanding Perceptual Variability in Text-to-Image Generation.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

FoodieQA: A Multimodal Dataset for Fine-Grained Understanding of Chinese Food Culture.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Multi-Objective Forward Reasoning and Multi-Reward Backward Refinement for Product Review Summarization.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

Debatrix: Multi-dimensional Debate Judge with Iterative Chronological Analysis Based on LLM.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

2023
MIRACL: A Multilingual Retrieval Dataset Covering 18 Diverse Languages.
Trans. Assoc. Comput. Linguistics, 2023

NoMIRACL: Knowing When You Don't Know for Robust Multilingual Retrieval-Augmented Generation.
CoRR, 2023

What Do Llamas Really Think? Revealing Preference Biases in Language Model Representations.
CoRR, 2023

HAGRID: A Human-LLM Collaborative Dataset for Generative Information-Seeking with Attribution.
CoRR, 2023

Zero-Shot Listwise Document Reranking with a Large Language Model.
CoRR, 2023

Improving Out-of-Distribution Generalization of Neural Rerankers with Contextualized Late Interaction.
CoRR, 2023

Overview of the CIRAL Track at FIRE 2023: Cross-lingual Information Retrieval for African Languages.
Proceedings of the Working Notes of FIRE 2023, 2023

CIRAL at FIRE 2023: Cross-Lingual Information Retrieval for African Languages.
Proceedings of the 15th Annual Meeting of the Forum for Information Retrieval Evaluation, 2023

Spacerini: Plug-and-play Search Engines with Pyserini and Hugging Face.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

GAIA Search: Hugging Face and Pyserini Interoperability for NLP Training Data Exploration.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 2023

Evaluating Embedding APIs for Information Retrieval.
Proceedings of the The 61st Annual Meeting of the Association for Computational Linguistics: Industry Track, 2023

2022
Making a MIRACL: Multilingual Information Retrieval Across a Continuum of Languages.
CoRR, 2022

Better Than Whitespace: Information Retrieval for Languages without Custom Tokenizers.
CoRR, 2022

Towards Best Practices for Training Multilingual Dense Retrieval Models.
CoRR, 2022

Simple Yet Effective Neural Ranking and Reranking Baselines for Cross-Lingual Information Retrieval.
Proceedings of the Thirty-First Text REtrieval Conference, 2022

AfriCLIRMatrix: Enabling Cross-Lingual Information Retrieval for African Languages.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Certified Error Control of Candidate Set Pruning for Two-Stage Relevance Ranking.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Squeezing Water from a Stone: A Bag of Tricks for Further Improving Cross-Encoder Effectiveness for Reranking.
Proceedings of the Advances in Information Retrieval, 2022

2021
Mr. TyDi: A Multi-lingual Benchmark for Dense Retrieval.
CoRR, 2021

Comparing Score Aggregation Approaches for Document Retrieval with Pretrained Transformers.
Proceedings of the Advances in Information Retrieval, 2021

Approach Zero and Anserini at the CLEF-2021 ARQMath Track: Applying Substructure Search and BM25 on Operator Tree Path Tokens.
Proceedings of the Working Notes of CLEF 2021 - Conference and Labs of the Evaluation Forum, Bucharest, Romania, September 21st - to, 2021

2020
Capreolus: A Toolkit for End-to-End Neural Ad Hoc Retrieval.
Proceedings of the WSDM '20: The Thirteenth ACM International Conference on Web Search and Data Mining, 2020

H2oloo at TREC 2020: When all you got is a hammer... Deep Learning, Health Misinformation, and Precision Medicine.
Proceedings of the Twenty-Ninth Text REtrieval Conference, 2020

A Little Bit Is Worse Than None: Ranking with Limited Training Data.
Proceedings of SustaiNLP: Workshop on Simple and Efficient Natural Language Processing, 2020

Flexible IR Pipelines with Capreolus.
Proceedings of the CIKM '20: The 29th ACM International Conference on Information and Knowledge Management, 2020


  Loading...