Hannah Kirk

Orcid: 0000-0002-7419-5993

According to our database1, Hannah Kirk authored at least 26 papers between 2021 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Adversarial Nibbler: An Open Red-Teaming Method for Identifying Diverse Harms in Text-to-Image Generation.
CoRR, 2024

Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models.
CoRR, 2024

2023
SimpleSafetyTests: a Test Suite for Identifying Critical Safety Risks in Large Language Models.
CoRR, 2023

The Empty Signifier Problem: Towards Clearer Paradigms for Operationalising "Alignment" in Large Language Models.
CoRR, 2023

Casteist but Not Racist? Quantifying Disparities in Large Language Model Bias between India and the West.
CoRR, 2023

XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models.
CoRR, 2023

DoDo Learning: DOmain-DemOgraphic Transfer in Language Models for Detecting Abuse Targeted at Public Figures.
CoRR, 2023

Balancing the Picture: Debiasing Vision-Language Datasets with Synthetic Contrast Sets.
CoRR, 2023

Adversarial Nibbler: A Data-Centric Challenge for Improving the Safety of Text-to-Image Models.
CoRR, 2023

Assessing Language Model Deployment with Risk Cards.
CoRR, 2023

Personalisation within bounds: A risk taxonomy and policy framework for the alignment of large language models with personalised feedback.
CoRR, 2023

Auditing large language models: a three-layered approach.
CoRR, 2023

SemEval-2023 Task 10: Explainable Detection of Online Sexism.
Proceedings of the The 17th International Workshop on Semantic Evaluation, 2023


VisoGender: A dataset for benchmarking gender bias in image-text pronoun resolution.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

The Past, Present and Better Future of Feedback Learning in Large Language Models for Subjective Human Preferences and Values.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

2022
Is More Data Better? Re-thinking the Importance of Efficiency in Abusive Language Detection with Transformers-Based Active Learning.
CoRR, 2022

Looking for a Handsome Carpenter! Debiasing GPT-3 Job Advertisements.
CoRR, 2022

Handling and Presenting Harmful Text.
CoRR, 2022

A Prompt Array Keeps the Bias Away: Debiasing Vision-Language Models with Adversarial Learning.
CoRR, 2022

Hatemoji: A Test Suite and Adversarially-Generated Dataset for Benchmarking and Detecting Emoji-Based Hate.
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

A Prompt Array Keeps the Bias Away: Debiasing Vision-Language Models with Adversarial Learning.
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing, 2022

Handling and Presenting Harmful Text in NLP Research.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

2021
Memes in the Wild: Assessing the Generalizability of the Hateful Memes Challenge Dataset.
CoRR, 2021

How True is GPT-2? An Empirical Analysis of Intersectional Occupational Biases.
CoRR, 2021

Bias Out-of-the-Box: An Empirical Analysis of Intersectional Occupational Biases in Popular Generative Language Models.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021


  Loading...