Luke Merrick

According to our database1, Luke Merrick authored at least 11 papers between 2019 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
The Finetuner's Fallacy: When to Pretrain with Your Finetuning Data.
CoRR, March, 2026

ÜberWeb: Insights from Multilingual Curation for a 20-Trillion-Token Dataset.
CoRR, February, 2026

DatBench: Discriminative, Faithful, and Efficient VLM Evaluations.
CoRR, January, 2026

2025
Luxical: High-Speed Lexical-Dense Text Embeddings.
CoRR, December, 2025

BeyondWeb: Lessons from Scaling Synthetic Data for Trillion-scale Pretraining.
CoRR, August, 2025

2024
Arctic-Embed 2.0: Multilingual Retrieval Without Compromise.
CoRR, 2024

Embedding And Clustering Your Data Can Improve Contrastive Pretraining.
CoRR, 2024

Arctic-Embed: Scalable, Efficient, and Accurate Text Embedding Models.
CoRR, 2024

2020
The Explanation Game: Explaining Machine Learning Models Using Shapley Values.
Proceedings of the Machine Learning and Knowledge Extraction, 2020

2019
Randomized Ablation Feature Importance.
CoRR, 2019

The Explanation Game: Explaining Machine Learning Models with Cooperative Game Theory.
CoRR, 2019


  Loading...