Dirk Groeneveld

According to our database1, Dirk Groeneveld authored at least 13 papers between 2016 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
OLMo: Accelerating the Science of Language Models.
CoRR, 2024

Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research.
CoRR, 2024

2023
Paloma: A Benchmark for Evaluating Language Model Fit.
CoRR, 2023

Catwalk: A Unified Language Model Evaluation Framework for Many Datasets.
CoRR, 2023

What's In My Big Data?
CoRR, 2023

Large Language Model Distillation Doesn't Need a Teacher.
CoRR, 2023

2022
Continued Pretraining for Better Zero- and Few-Shot Promptability.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

2021
Documenting the English Colossal Clean Crawled Corpus.
CoRR, 2021

Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

2020
From 'F' to 'A' on the N.Y. Regents Science Exams: An Overview of the Aristo Project.
AI Mag., 2020

A Simple Yet Strong Pipeline for HotpotQA.
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

2018
Construction of the Literature Graph in Semantic Scholar.
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018

2016
IKE - An Interactive Tool for Knowledge Extraction.
Proceedings of the 5th Workshop on Automated Knowledge Base Construction, 2016


  Loading...