Hannah Cyberey

Jonathan Richard Schwarz

Ahmed Alaa

Thomas Hartvigsen

CoRR, February, 2026

White-Box Sensitivity Auditing with Steering Vectors.

[BibT_eX]

[DOI]

CoRR, January, 2026

2025

Steering the CensorShip: Uncovering Representation Vectors for LLM "Thought" Control.

[BibT_eX]

[DOI]

CoRR, April, 2025

Sensing and Steering Stereotypes: Extracting and Applying Gender Representation Vectors in LLMs.

[BibT_eX]

[DOI]

CoRR, February, 2025

Unsupervised Concept Vector Extraction for Bias Control in LLMs.

[BibT_eX]

[DOI]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

2024

The Mismeasure of Man and Models: Evaluating Allocational Harms in Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

Addressing Both Statistical and Causal Gender Fairness in NLP Models.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2024, 2024

2022

Balanced Adversarial Training: Balancing Tradeoffs between Fickleness and Obstinacy in NLP Models.

[BibT_eX]

[DOI]

David E. Evans

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

2020

Finding Friends and Flipping Frenemies: Automatic Paraphrase Dataset Augmentation Using Graph Theory.

[BibT_eX]

[DOI]

David E. Evans

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020, 2020

Pointwise Paraphrase Appraisal is Potentially Problematic.

[BibT_eX]

[DOI]