Colin Raffel

Guillaume Rabusseau

CoRR, April, 2026

The Appeal and Reality of Recycling LoRAs with Adaptive Merging.

[BibT_eX]

[DOI]

CoRR, February, 2026

FineInstructions: Scaling Synthetic Instructions to Pre-Training Scale.

[BibT_eX]

[DOI]

Ajay Patel

Chris Callison-Burch

CoRR, January, 2026

Uncovering Language Model Processing Strategies with Non-Negative Per-Example Fisher Factorization.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2026

2025

Efficiently Estimating Data Efficiency for Language Model Fine-tuning.

[BibT_eX]

[DOI]

Gyung Hyun Je

CoRR, December, 2025

TokSuite: Measuring the Impact of Tokenizer Choice on Language Model Behavior.

[BibT_eX]

[DOI]

CoRR, December, 2025

FineWeb2: One Pipeline to Scale Them All - Adapting Pre-Training Data Processing to Every Language.

[BibT_eX]

[DOI]

Amir Hossein Kargaran

CoRR, June, 2025

The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text.

[BibT_eX]

[DOI]

CoRR, June, 2025

SmolLM2: When Smol Goes Big - Data-Centric Training of a Small Language Model.

[BibT_eX]

[DOI]

CoRR, February, 2025

A Survey on Model MoErging: Recycling and Routing Among Specialized Experts for Collaborative Learning.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2025

ComPEFT: Compression for Communicating Parameter Efficient Updates via Sparsification and Quantization.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2025

Enhancing Training Data Attribution with Representational Optimization.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

Fishers for Free? Approximating the Fisher Information Matrix by Recycling the Squared Gradient Accumulator.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Position: The Most Expensive Part of an LLM *should* be its Training Data.

[BibT_eX]

[DOI]

Nikhil Kandpal

Proceedings of the Forty-second International Conference on Machine Learning, 2025

The Butterfly Effect: Neural Network Training Trajectories Are Highly Sensitive to Initial Conditions.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

AttriBoT: A Bag of Tricks for Efficiently Approximating Leave-One-Out Context Attribution.

[BibT_eX]

[DOI]

Fengyuan Liu

Nikhil Kandpal

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024

Combining Machine Learning and Lifetime-Based Resource Management for Memory Allocation and Beyond.

[BibT_eX]

[DOI]

Martin Maas

David G. Andersen

Michael Isard

Mohammad Mahdi Javanmard

Kathryn S. McKinley

Commun. ACM, April, 2024

Merging by Matching Models in Task Parameter Subspaces.

[BibT_eX]

[DOI]

Derek Tam

Mohit Bansal

Trans. Mach. Learn. Res., 2024

Soft Merging of Experts with Adaptive Routing.

[BibT_eX]

[DOI]

Mohammed Muqeeth

Haokun Liu

Trans. Mach. Learn. Res., 2024

A Survey on Data Selection for Language Models.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2024

Realistic Evaluation of Model Merging for Compositional Generalization.

[BibT_eX]

[DOI]

CoRR, 2024

Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Learning to Route Among Specialized Experts for Zero-Shot Generalization.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows.

[BibT_eX]

[DOI]

Ajay Patel

Chris Callison-Burch

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023

Building Machine Learning Models Like Open Source Software.

[BibT_eX]

[DOI]

Giambattista Parascandolo

Commun. ACM, February, 2023

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models.

[BibT_eX]

[DOI]

Bartlomiej Bojanowski

Christopher D. Manning

Daniel Moseguí González

Eunice Engefu Manyasi

Evgenii Zheltonozhskii

Fanyue Xia

Fatemeh Siar

Fernando Martínez-Plumed

Giorgio Mariani

Gloria Wang

Gonzalo Jaimovitch-López

Jaime Fernández Fisac

Jascha Sohl-Dickstein

José Hernández-Orallo

Karthik Gopalakrishnan

Lidia Contreras Ochando

Louis-Philippe Morency

María José Ramírez-Quintana

Michael I. Ivanitskiy

Neta Gur-Ari Krakover

Nitish Shirish Keskar

Pablo Antonio Moreno Casares

Pegah Alipoormolabashi

Shyamolima (Shammie) Debnath

Sneha Priscilla Makini

Yadollah Yaghoobzadeh

Trans. Mach. Learn. Res., 2023

Efficient Methods for Natural Language Processing: A Survey.

[BibT_eX]

[DOI]

Trans. Assoc. Comput. Linguistics, 2023

An Empirical Survey of Data Augmentation for Limited Data Learning in NLP.

[BibT_eX]

[DOI]

Trans. Assoc. Comput. Linguistics, 2023

Scaling Up Models and Data with t5x and seqio.

[BibT_eX]

[DOI]

J. Mach. Learn. Res., 2023

Merging by Matching Models in Task Subspaces.

[BibT_eX]

[DOI]

Derek Tam

Mohit Bansal

CoRR, 2023

Efficient Online Data Mixing For Language Model Pre-Training.

[BibT_eX]

[DOI]

CoRR, 2023

NPEFF: Non-Negative Per-Example Fisher Factorization.

[BibT_eX]

[DOI]

CoRR, 2023

Resolving Interference When Merging Models.

[BibT_eX]

[DOI]

CoRR, 2023

TIES-Merging: Resolving Interference When Merging Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Scaling Data-Constrained Language Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Distributed Inference and Fine-tuning of Large Language Models Over The Internet.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Improving Few-Shot Generalization by Exploring and Exploiting Auxiliary Data.

[BibT_eX]

[DOI]

Alon Albalak

Colin A. Raffel

William Yang Wang

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Git-Theta: A Git Extension for Collaborative Development of Machine Learning Models.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

Large Language Models Struggle to Learn Long-Tail Knowledge.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

Bidirectional Language Models Are Also Few-shot Learners.

[BibT_eX]

[DOI]

Ajay Patel

Bryan Li

Mohammad Sadegh Rasooli

Noah Constant

Chris Callison-Burch

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Knowledge is a Region in Weight Space for Fine-tuned Language Models.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model.

[BibT_eX]

[DOI]

Haikang Deng

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Evaluating the Factual Consistency of Large Language Models Through News Summarization.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

Crosslingual Generalization through Multitask Finetuning.

[BibT_eX]

[DOI]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

ColD Fusion: Collaborative Descent for Distributed Multitask Finetuning.

[BibT_eX]

[DOI]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Petals: Collaborative Inference and Fine-tuning of Large Models.

[BibT_eX]

[DOI]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), 2023

2022

Emergent Abilities of Large Language Models.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2022

ByT5: Towards a Token-Free Future with Pre-trained Byte-to-Byte Models.

[BibT_eX]

[DOI]

Trans. Assoc. Comput. Linguistics, 2022

ColD Fusion: Collaborative Descent for Distributed Multitask Finetuning.

[BibT_eX]

[DOI]

CoRR, 2022

Evaluating the Factual Consistency of Large Language Models Through Summarization.

[BibT_eX]

[DOI]

CoRR, 2022

Petals: Collaborative Inference and Fine-tuning of Large Models.

[BibT_eX]

[DOI]

CoRR, 2022

Efficient Methods for Natural Language Processing: A Survey.

[BibT_eX]

[DOI]

Pedro Henrique Martins

Niranjan Balasubramanian

Leon Derczynski

Roy Schwartz

CoRR, 2022

What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?

[BibT_eX]

[DOI]

CoRR, 2022

Scaling Up Models and Data with t5x and seqio.

[BibT_eX]

[DOI]

CoRR, 2022

PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts.

[BibT_eX]

[DOI]

CoRR, 2022

Compositional Generalization in Unsupervised Compositional Representation Learning: A Study on Disentanglement and Emergent Language.

[BibT_eX]

[DOI]

Zhenlin Xu

Marc Niethammer

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

A Combinatorial Perspective on the Optimization of Shallow ReLU Networks.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Merging Models with Fisher-Weighted Averaging.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

What Language Model Architecture and Pretraining Objective Works Best for Zero-Shot Generalization?

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2022

Deduplicating Training Data Mitigates Privacy Risks in Language Models.

[BibT_eX]

[DOI]

Nikhil Kandpal

Eric Wallace

Proceedings of the International Conference on Machine Learning, 2022

Multitask Prompted Training Enables Zero-Shot Task Generalization.

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Learning Representations, 2022

What Language Model to Train if You Have One Million GPU Hours?

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

Learning with Limited Text Data.

[BibT_eX]

[DOI]

Diyi Yang

Ankur P. Parikh

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, 2022

PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts.

[BibT_eX]

[DOI]

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, 2022

2021

Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP.

[BibT_eX]

[DOI]

CoRR, 2021

Multitask Prompted Training Enables Zero-Shot Task Generalization.

[BibT_eX]

[DOI]

CoRR, 2021

Extracting Training Data from Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the 30th USENIX Security Symposium, 2021

Training Neural Networks with Fixed Sparse Masks.

[BibT_eX]

[DOI]

Yi-Lin Sung

Varun Nair

Michael Sejr Schlichtkrull

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer.

[BibT_eX]

[DOI]

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

On Training Sample Memorization: Lessons from Benchmarking Generative Modeling with a Large-scale Competition.

[BibT_eX]

[DOI]

Proceedings of the KDD '21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2021

Robust and Generalizable Visual Representation Learning via Random Convolutions.

[BibT_eX]

[DOI]

Proceedings of the 9th International Conference on Learning Representations, 2021

Improving and Simplifying Pattern Exploiting Training.

[BibT_eX]

[DOI]

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

Do Transformer Modifications Transfer Across Implementations and Applications?

[BibT_eX]

[DOI]

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

2020

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer.

[BibT_eX]

[DOI]

J. Mach. Learn. Res., 2020

WT5?! Training Text-to-Text Models to Explain their Predictions.

[BibT_eX]

[DOI]

CoRR, 2020

Deflecting Adversarial Attacks.

[BibT_eX]

[DOI]

CoRR, 2020

Top-K Training of GANs: Improving Generators by Making Critics Less Critical.

[BibT_eX]

[DOI]

CoRR, 2020

FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Top-k Training of GANs: Improving GAN Performance by Throwing Away Bad Samples.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

NeurIPS 2020 EfficientQA Competition: Systems, Analyses and Lessons Learned.

[BibT_eX]

[DOI]

Sewon Min

Jordan L. Boyd-Graber

Sonal Gupta

Yashar Mehdad

Wen-tau Yih

Proceedings of the NeurIPS 2020 Competition and Demonstration Track, 2020

Detecting and Diagnosing Adversarial Images with Class-Conditional Capsule Reconstructions.

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Learning Representations, 2020

ReMixMatch: Semi-Supervised Learning with Distribution Matching and Augmentation Anchoring.

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Learning Representations, 2020

How Much Knowledge Can You Pack Into the Parameters of a Language Model?

[BibT_eX]

[DOI]

Adam Roberts

Noam Shazeer

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

Learning-based Memory Allocation for C++ Server Workloads.

[BibT_eX]

[DOI]

Martin Maas

David G. Andersen

Michael Isard

Mohammad Mahdi Javanmard

Kathryn S. McKinley

Proceedings of the ASPLOS '20: Architectural Support for Programming Languages and Operating Systems, 2020

2019

ReMixMatch: Semi-Supervised Learning with Distribution Alignment and Augmentation Anchoring.

[BibT_eX]

[DOI]

CoRR, 2019

Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling.

[BibT_eX]

[DOI]

CoRR, 2019

MixMatch: A Holistic Approach to Semi-Supervised Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Imperceptible, Robust, and Targeted Adversarial Examples for Automatic Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 36th International Conference on Machine Learning, 2019

Towards GAN Benchmarks Which Require Generalization.

[BibT_eX]

[DOI]

Ishaan Gulrajani

Luke Metz

Proceedings of the 7th International Conference on Learning Representations, 2019

Understanding and Improving Interpolation in Autoencoders via an Adversarial Regularizer.

[BibT_eX]

[DOI]

Proceedings of the 7th International Conference on Learning Representations, 2019

Monotonic Infinite Lookback Attention for Simultaneous Machine Translation.

[BibT_eX]

[DOI]

Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019

2018

Hickle: A HDF5-based python pickle replacement.

[BibT_eX]

[DOI]

J. Open Source Softw., 2018

Learning a Latent Space of Multitrack Measures.

[BibT_eX]

[DOI]

CoRR, 2018

Realistic Evaluation of Deep Semi-Supervised Learning Algorithms.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Onsets and Frames: Dual-Objective Piano Transcription.

[BibT_eX]

[DOI]

Proceedings of the 19th International Society for Music Information Retrieval Conference, 2018

A Hierarchical Latent Vector Model for Learning Long-Term Structure in Music.

[BibT_eX]

[DOI]

Proceedings of the 35th International Conference on Machine Learning, 2018

Is Generator Conditioning Causally Related to GAN Performance?

[BibT_eX]

[DOI]

Proceedings of the 35th International Conference on Machine Learning, 2018

Realistic Evaluation of Semi-Supervised Learning Algorithms.

[BibT_eX]

[DOI]

Proceedings of the 6th International Conference on Learning Representations, 2018

Monotonic Chunkwise Attention.

[BibT_eX]

[DOI]

Chung-Cheng Chiu

Proceedings of the 6th International Conference on Learning Representations, 2018

Thermometer Encoding: One Hot Way To Resist Adversarial Examples.

[BibT_eX]

[DOI]

Proceedings of the 6th International Conference on Learning Representations, 2018

Learning Hard Alignments with Variational Inference.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017

Online and Linear-Time Attention by Enforcing Monotonic Alignments.

[BibT_eX]

[DOI]

Proceedings of the 34th International Conference on Machine Learning, 2017

Training a Subsampling Mechanism in Expectation.

[BibT_eX]

[DOI]

Dieterich Lawson

Proceedings of the 5th International Conference on Learning Representations, 2017

Explaining the Learning Dynamics of Direct Feedback Alignment.

[BibT_eX]

[DOI]

Jascha Sohl-Dickstein

Proceedings of the 5th International Conference on Learning Representations, 2017

2016

Learning-Based Methods for Comparing Sequences, with Applications to Audio-to-MIDI Alignment and Matching.

[BibT_eX]

[DOI]

Nicolas Boulanger-Lewandowski

PhD thesis, 2016

Theano: A Python framework for fast computation of mathematical expressions.

[BibT_eX]

[DOI]

Xavier Bouthillier

Alexandre de Brébisson

Samira Ebrahimi Kahou

Pierre-Antoine Manzagol

Christopher Joseph Pal

S. Ramana Subramanyam

CoRR, 2016

Extracting Ground-Truth Information from MIDI Files: A MIDIfesto.

[BibT_eX]

[DOI]

Proceedings of the 17th International Society for Music Information Retrieval Conference, 2016

Pruning subsequence search with attention-based embedding.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Optimizing DTW-based audio-to-MIDI alignment and matching.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Poker-CNN: A Pattern Learning Strategy for Making Draws and Bets in Poker Games Using Convolutional Networks.

[BibT_eX]

[DOI]

Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016

2015

Poker-CNN: A Pattern Learning Strategy for Making Draws and Bets in Poker Games.

[BibT_eX]

[DOI]

CoRR, 2015

Feed-Forward Networks with Attention Can Solve Some Long-Term Memory Problems.

[BibT_eX]

[DOI]

CoRR, 2015

librosa: Audio and Music Signal Analysis in Python.

[BibT_eX]

[DOI]

Proceedings of the 14th Python in Science Conference, 2015

Large-Scale Content-Based Matching of MIDI and Audio Files.

[BibT_eX]

[DOI]

Proceedings of the 16th International Society for Music Information Retrieval Conference, 2015

2014

MIR_EVAL: A Transparent Implementation of Common MIR Metrics.

[BibT_eX]

[DOI]

Proceedings of the 15th International Society for Music Information Retrieval Conference, 2014

Estimating timing and channel distortion across related signals.

[BibT_eX]

[DOI]