Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

NNsight and NDIF: Democratizing Access to Open-Weight Foundation Model Internals.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Steering Language Models in Multi-Token Generation: A Case Study on Tense and Aspect.

[BibT_eX]

[DOI]

Alina Klerings

Jannik Brinkmann

Daniel Ruffinelli

Simone Paolo Ponzetto

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

2024

The Quest for the Right Mediator: A History, Survey, and Theoretical Grounding of Causal Interpretability.

[BibT_eX]

[DOI]

Aruna Sankaranarayanan

CoRR, 2024

NNsight and NDIF: Democratizing Access to Foundation Model Internals.

[BibT_eX]

[DOI]

CoRR, 2024

Measuring Progress in Dictionary Learning for Language Model Interpretability with Board Game Models.

[BibT_eX]

[DOI]

Claudio Mayrink Verdun

David Bau

Samuel Marks

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Unsupervised Extraction of Test Scenarios from Time-Series Sensor Data using Trace Graphs.

[BibT_eX]

[DOI]

Jannik Brinkmann

Noah Metzger

Christian Bartelt

Proceedings of the 57th Hawaii International Conference on System Sciences, 2024

GOV-REK: Governed Reward Engineering Kernels for Designing Robust Multi-Agent Reinforcement Learning Systems.

[BibT_eX]

[DOI]

Ashish Rana

Michael Oesterle

Jannik Brinkmann

Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems, 2024

A Mechanistic Analysis of a Transformer Trained on a Symbolic Multi-Step Reasoning Task.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2024

2023

A Multidimensional Analysis of Social Biases in Vision Transformers.

[BibT_eX]

[DOI]

Jannik Brinkmann

Paul Swoboda

Christian Bartelt

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Bias Mitigation for Large Language Models using Adversarial Learning.

[BibT_eX]

[DOI]

Proceedings of the 1st Workshop on Fairness and Bias in AI co-located with 26th European Conference on Artificial Intelligence (ECAI 2023), 2023

Jannik Brinkmann

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...