Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024

What's In My Big Data?

[BibT_eX]

[DOI]

Yanai Elazar

Akshita Bhagia

Ian Magnusson

Abhilasha Ravichander

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Scalable Data Ablation Approximations for Language Models through Modular Training and Merging.

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Merge to Learn: Efficiently Adding Skills to Language Models with Model Merging.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

AboutMe: Using Self-Descriptions in Webpages to Document the Effects of English Pretraining Data Filters.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

OLMo: Accelerating the Science of Language Models.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023

Efficient Methods for Natural Language Processing: A Survey.

[BibT_eX]

[DOI]

Trans. Assoc. Comput. Linguistics, 2023

Catwalk: A Unified Language Model Evaluation Framework for Many Datasets.

[BibT_eX]

[DOI]

CoRR, 2023

The Rise of Open Science: Tracking the Evolution and Perceived Value of Data and Methods Link-Sharing Practices.

[BibT_eX]

[DOI]

CoRR, 2023

Efficiency Pentathlon: A Standardized Arena for Efficiency Evaluation.

[BibT_eX]

[DOI]

CoRR, 2023

Surveying (Dis)Parities and Concerns of Compute Hungry NLP Research.

[BibT_eX]

[DOI]

CoRR, 2023

Evaluating the Social Impact of Generative AI Systems in Systems and Society.

[BibT_eX]

[DOI]

Alexandra Sasha Luccioni

CoRR, 2023

Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved with Text.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

AdapterSoup: Weight Averaging to Improve Generalization of Pretrained Language Models.

[BibT_eX]

[DOI]

Alexandra Chronopoulou

Matthew E. Peters

Alexander Fraser

Jesse Dodge

Proceedings of the Findings of the Association for Computational Linguistics: EACL 2023, 2023

Stubborn Lexical Bias in Data and Models.

[BibT_eX]

[DOI]

Sofia Serrano

Jesse Dodge

Noah A. Smith

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

Reproducibility in NLP: What Have We Learned from the Checklist?

[BibT_eX]

[DOI]

Ian Magnusson

Noah A. Smith

Jesse Dodge

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

Words as Gatekeepers: Measuring Discipline-specific Terms and Meanings in Scholarly Publications.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

2022

Efficient and Equitable Natural Language Processing in the Age of Deep Learning (Dagstuhl Seminar 22232).

[BibT_eX]

[DOI]

Dagstuhl Reports, 2022

Data Governance in the Age of Large-Scale Data-Driven Language Technology.

[BibT_eX]

[DOI]

CoRR, 2022

Findings of the WMT'22 Shared Task on Large-Scale Machine Translation Evaluation for African Languages.

[BibT_eX]

[DOI]

David Ifeoluwa Adelani

Md Mahfuz Ibn Alam

Antonios Anastasopoulos

Proceedings of the Seventh Conference on Machine Translation, 2022

Modeling the Machine Learning Multiverse.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Efficient Hierarchical Domain Adaptation for Pretrained Language Models.

[BibT_eX]

[DOI]

Alexandra Chronopoulou

Matthew E. Peters

Jesse Dodge

Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

Staged Training for Transformer Language Models.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2022

Data Governance in the Age of Large-Scale Data-Driven Language Technology.

[BibT_eX]

[DOI]

Proceedings of the FAccT '22: 2022 ACM Conference on Fairness, Accountability, and Transparency, Seoul, Republic of Korea, June 21, 2022

Measuring the Carbon Intensity of AI in Cloud Instances.

[BibT_eX]

[DOI]

Jesse Dodge

Taylor Prewitt

Remi Tachet des Combes

Erika Odmark

Roy Schwartz

Emma Strubell

Alexandra Sasha Luccioni

Noah A. Smith

Nicole DeCario

Will Buchanan

Proceedings of the FAccT '22: 2022 ACM Conference on Fairness, Accountability, and Transparency, Seoul, Republic of Korea, June 21, 2022

Towards Reproducible Machine Learning Research in Natural Language Processing.

[BibT_eX]

[DOI]

Ana Lucic

Maurits J. R. Bleeker

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, 2022

2021

Documenting the English Colossal Clean Crawled Corpus.

[BibT_eX]

[DOI]

CoRR, 2021

Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus.

[BibT_eX]

[DOI]

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

Expected Validation Performance and Estimation of a Random Variable's Maximum.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2021, 2021

Competency Problems: On Finding and Removing Artifacts in Language Data.

[BibT_eX]

[DOI]

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

2020

Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping.

[BibT_eX]

[DOI]

CoRR, 2020

Green AI.

[BibT_eX]

[DOI]

Commun. ACM, 2020

The Right Tool for the Job: Matching Model and Instance Complexities.

[BibT_eX]

[DOI]

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

2019

RNN Architecture Learning with Sparse Regularization.

[BibT_eX]

[DOI]

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019

Show Your Work: Improved Reporting of Experimental Results.

[BibT_eX]

[DOI]

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019

2017

Random Search for Hyperparameters using Determinantal Point Processes.

[BibT_eX]

[DOI]

Jesse Dodge

Catrìona Anderson

Noah A. Smith

CoRR, 2017

2016

Large Scale Retrieval and Generation of Image Descriptions.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., 2016

Evaluating Prerequisite Qualities for Learning End-to-End Dialog Systems.

[BibT_eX]

[DOI]

Proceedings of the 4th International Conference on Learning Representations, 2016

Key-Value Memory Networks for Directly Reading Documents.

[BibT_eX]

[DOI]

Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016

2015

Retrofitting Word Vectors to Semantic Lexicons.

[BibT_eX]

[DOI]

Proceedings of the NAACL HLT 2015, The 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Denver, Colorado, USA, May 31, 2015

2014

CMU: Arc-Factored, Discriminative Semantic Dependency Parsing.

[BibT_eX]

[DOI]