Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: System Demonstrations, 2024

Is This the Subspace You Are Looking for? An Interpretability Illusion for Subspace Activation Patching.

[BibT_eX]

[DOI]

Aleksandar Makelov

Georg Lange

Atticus Geiger

Neel Nanda

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Updating CLIP to Prefer Descriptions Over Captions.

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations.

[BibT_eX]

[DOI]

Proceedings of the Causal Learning and Reasoning, 2024

RAVEL: Evaluating Interpretability Methods on Disentangling Language Model Representations.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023

Linear Representations of Sentiment in Large Language Models.

[BibT_eX]

[DOI]

Curt Tigges

Oskar John Hollinsworth

Atticus Geiger

Neel Nanda

CoRR, 2023

Interpretability at Scale: Identifying Causal Mechanisms in Alpaca.

[BibT_eX]

[DOI]

CoRR, 2023

Causal Abstraction for Faithful Model Interpretation.

[BibT_eX]

[DOI]

Atticus Geiger

Christopher Potts

Thomas Icard

CoRR, 2023

Interpretability at Scale: Identifying Causal Mechanisms in Alpaca.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Causal Proxy Models for Concept-based Model Explanations.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

A Semantics for Causing, Enabling, and Preventing Verbs Using Structural Causal Models.

[BibT_eX]

[DOI]

Proceedings of the 45th Annual Meeting of the Cognitive Science Society, 2023

Causal Abstraction with Soft Interventions.

[BibT_eX]

[DOI]

Proceedings of the Conference on Causal Learning and Reasoning, 2023

Rigorously Assessing Natural Language Explanations of Neurons.

[BibT_eX]

[DOI]

Proceedings of the 6th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP, 2023

ScoNe: Benchmarking Negation Reasoning in Language Models With Fine-Tuning and In-Context Learning.

[BibT_eX]

[DOI]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2023

2022

CEBaB: Estimating the Causal Effects of Real-World Concepts on NLP Model Behavior.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Causal Distillation for Language Models.

[BibT_eX]

[DOI]

Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

Inducing Causal Structure for Interpretable Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2022

2021

Causal Abstractions of Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Dynabench: Rethinking Benchmarking in NLP.

[BibT_eX]

[DOI]

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

DynaSent: A Dynamic Benchmark for Sentiment Analysis.

[BibT_eX]

[DOI]

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

2020

Modular Representation Underlies Systematic Generalization in Neural Natural Language Inference Models.

[BibT_eX]

[DOI]

Atticus Geiger

Kyle Richardson

Christopher Potts

CoRR, 2020

Relational reasoning and generalization using non-symbolic neural networks.

[BibT_eX]

[DOI]

Proceedings of the 42th Annual Meeting of the Cognitive Science Society, 2020

Neural Natural Language Inference Models Partially Embed Theories of Lexical Entailment and Negation.

[BibT_eX]

[DOI]

Atticus Geiger

Kyle Richardson

Christopher Potts

Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, 2020

2019

Recursive Routing Networks: Learning to Compose Modules for Language Understanding.

[BibT_eX]

[DOI]

Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019

Posing Fair Generalization Tasks for Natural Language Inference.

[BibT_eX]

[DOI]

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019

2018

Stress-Testing Neural Models of Natural Language Inference with Multiply-Quantified Sentences.

[BibT_eX]

[DOI]

CoRR, 2018

Atticus Geiger

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...