Aaron Mueller

Orcid: 0009-0005-1148-5001

According to our database1, Aaron Mueller authored at least 54 papers between 2019 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
Pitfalls in Evaluating Interpretability Agents.
CoRR, March, 2026

BabyLM Turns 4 and Goes Multilingual: Call for Papers for the 2026 BabyLM Workshop.
CoRR, February, 2026

Causality is Key for Interpretability Claims to Generalise.
CoRR, February, 2026

Mechanisms of AI Protein Folding in ESMFold.
CoRR, February, 2026

Improving the OOD Performance of Closed-Source LLMs on NLI Through Strategic Data Selection.
Proceedings of the Findings of the Association for Computational Linguistics: EACL 2026, 2026

Measuring Mechanistic Independence: Can Bias Be Removed Without Erasing Demographics?
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics, 2026

2025
From Isolation to Entanglement: When Do Interpretability Methods Identify and Disentangle Known Concepts?
CoRR, December, 2025

BabyVLM-V2: Toward Developmentally Grounded Pretraining and Benchmarking of Vision Foundation Models.
CoRR, December, 2025

Findings of the BlackboxNLP 2025 Shared Task: Localizing Circuits and Causal Variables in Language Models.
CoRR, November, 2025

In-Context Learning Without Copying.
CoRR, November, 2025

Priors in Time: Missing Inductive Biases for Language Model Interpretability.
CoRR, November, 2025

Crosscoding Through Time: Tracking Emergence & Consolidation Of Linguistic Representations Throughout LLM Pretraining.
CoRR, September, 2025

CRISP: Persistent Concept Unlearning via Sparse Autoencoders.
CoRR, August, 2025

How to Improve the Robustness of Closed-Source Models on NLI.
CoRR, May, 2025

Findings of the BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora.
CoRR, April, 2025

BabyLM Turns 3: Call for papers for the 2025 BabyLM workshop.
CoRR, February, 2025

Disjoint Processing Mechanisms of Hierarchical and Linear Grammars in Large Language Models.
CoRR, January, 2025

Characterizing the Role of Similarity in the Property Inferences of Language Models.
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

Incremental Sentence Processing Mechanisms in Autoregressive Transformer Language Models.
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

Large Language Models Share Representations of Latent Grammatical Concepts Across Typologically Diverse Languages.
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025


Arithmetic Without Algorithms: Language Models Solve Math with a Bag of Heuristics.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

NNsight and NDIF: Democratizing Access to Open-Weight Foundation Model Internals.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

SAEs Are Good for Steering - If You Select the Right Features.
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

Position-aware Automatic Circuit Discovery.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024
Findings of the Second BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora.
CoRR, 2024

The Quest for the Right Mediator: A History, Survey, and Theoretical Grounding of Causal Interpretability.
CoRR, 2024

NNsight and NDIF: Democratizing Access to Foundation Model Internals.
CoRR, 2024

Missed Causes and Ambiguous Effects: Counterfactuals Pose Challenges for Interpreting Neural Networks.
CoRR, 2024

[Call for Papers] The 2nd BabyLM Challenge: Sample-efficient pretraining on a developmentally plausible corpus.
CoRR, 2024

In-context Learning Generalizes, But Not Always Robustly: The Case of Syntax.
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024

Function Vectors in Large Language Models.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Insights from the first BabyLM Challenge: Training sample-efficient language models on a developmentally plausible corpus.
Proceedings of the 46th Annual Meeting of the Cognitive Science Society, 2024

2023
Inverse Scaling: When Bigger Isn't Better.
Trans. Mach. Learn. Res., 2023

Call for Papers - The BabyLM Challenge: Sample-efficient pretraining on a developmentally plausible corpus.
CoRR, 2023

Language model acceptability judgements are not always robust to context.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Meta-training with Demonstration Retrieval for Efficient Few-shot Learning.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

How to Plant Trees in Language Models: Data and Architectural Effects on the Emergence of Syntactic Inductive Biases.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

What Do NLP Researchers Believe? Results of the NLP Community Metasurvey.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

2022
Bernice: A Multilingual Pre-trained Encoder for Twitter.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Causal Analysis of Syntactic Agreement Neurons in Multilingual Language Models.
Proceedings of the 26th Conference on Computational Natural Language Learning, 2022

Label Semantic Aware Pre-training for Few-shot Text Classification.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

Coloring the Blank Slate: Pre-training Imparts a Hierarchical Inductive Bias to Sequence-to-sequence Models.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022, 2022

2021
Demographic Representation and Collective Storytelling in the Me Too Twitter Hashtag Activism Movement.
Proc. ACM Hum. Comput. Interact., 2021

Fine-tuning Encoders for Improved Monolingual and Zero-shot Polylingual Neural Topic Modeling.
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

Causal Analysis of Syntactic Agreement Mechanisms in Neural Language Models.
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

2020
Decoding Methods for Neural Narrative Generation.
CoRR, 2020

Fine-grained Morphosyntactic Analysis and Generation Tools for More Than One Thousand Languages.
Proceedings of The 12th Language Resources and Evaluation Conference, 2020

An Analysis of Massively Multilingual Neural Machine Translation for Low-Resource Languages.
Proceedings of The 12th Language Resources and Evaluation Conference, 2020

The Johns Hopkins University Bible Corpus: 1600+ Tongues for Typological Exploration.
Proceedings of The 12th Language Resources and Evaluation Conference, 2020

Cross-Linguistic Syntactic Evaluation of Word Prediction Models.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

2019
Quantity doesn't buy quality syntax with neural language models.
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019

Modeling Color Terminology Across Thousands of Languages.
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019


  Loading...