Dirk Groeneveld

Orcid: 0000-0002-8274-768X

According to our database¹, Dirk Groeneveld authored at least 28 papers between 2016 and 2026.

Collaborative distances:

Dijkstra number² of three.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

Olmo Hybrid: From Theory to Practice and Back.

[BibT_eX]

[DOI]

CoRR, April, 2026

2025

Olmo 3.

[BibT_eX]

[DOI]

Lester James V. Miranda

CoRR, December, 2025

FlexOlmo: Open Language Models for Flexible Data Use.

[BibT_eX]

[DOI]

CoRR, July, 2025

OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training Tokens.

[BibT_eX]

[DOI]

CoRR, April, 2025

2 OLMo 2 Furious.

[BibT_eX]

[DOI]

CoRR, January, 2025

FlexOLMo: Open Language Models for Flexible Data Use.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

Critical Batch Size Revisited: A Simple Empirical Approach to Large-Batch Language Model Training.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

DataDecide: How to Predict Best Pretraining Data with Small Experiments.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

OLMoE: Open Mixture-of-Experts Language Models.

[BibT_eX]

[DOI]

et al.

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training Tokens.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), 2025

2024

Establishing Task Scaling Laws via Compute-Efficient Model Ladders.

[BibT_eX]

[DOI]

CoRR, 2024

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models.

[BibT_eX]

[DOI]

CoRR, 2024

OLMoE: Open Mixture-of-Experts Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

Paloma: A Benchmark for Evaluating Language Model Fit.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

DataComp-LM: In search of the next generation of training sets for language models.

[BibT_eX]

[DOI]

Khyathi Raghavi Chandu

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

What's In My Big Data?

[BibT_eX]

[DOI]

Yanai Elazar

Akshita Bhagia

Ian Magnusson

Abhilasha Ravichander

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

OLMo: Accelerating the Science of Language Models.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023

Catwalk: A Unified Language Model Evaluation Framework for Many Datasets.

[BibT_eX]

[DOI]

CoRR, 2023

Large Language Model Distillation Doesn't Need a Teacher.

[BibT_eX]

[DOI]

CoRR, 2023

2022

Continued Pretraining for Better Zero- and Few-Shot Promptability.

[BibT_eX]

[DOI]

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

2021

Documenting the English Colossal Clean Crawled Corpus.

[BibT_eX]

[DOI]

CoRR, 2021

Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus.

[BibT_eX]

[DOI]

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

2020

From 'F' to 'A' on the N.Y. Regents Science Exams: An Overview of the Aristo Project.

[BibT_eX]

[DOI]

Sumithra Bhakthavatsalam

Dirk Groeneveld

Michal Guerquin

Michael Schmitz

AI Mag., 2020

A Simple Yet Strong Pipeline for HotpotQA.

[BibT_eX]

[DOI]

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

2018

Construction of the Literature Graph in Semantic Scholar.

[BibT_eX]

[DOI]

Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018

2016

IKE - An Interactive Tool for Knowledge Extraction.

[BibT_eX]

[DOI]

Bhavana Dalvi

Sumithra Bhakthavatsalam

Proceedings of the 5th Workshop on Automated Knowledge Base Construction, 2016

Dirk Groeneveld

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...