Darren Teh

According to our database1, Darren Teh authored at least 5 papers between 2025 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
The Finetuner's Fallacy: When to Pretrain with Your Finetuning Data.
CoRR, March, 2026

ÜberWeb: Insights from Multilingual Curation for a 20-Trillion-Token Dataset.
CoRR, February, 2026

DatBench: Discriminative, Faithful, and Efficient VLM Evaluations.
CoRR, January, 2026

2025
Luxical: High-Speed Lexical-Dense Text Embeddings.
CoRR, December, 2025

BeyondWeb: Lessons from Scaling Synthetic Data for Trillion-scale Pretraining.
CoRR, August, 2025


  Loading...