Paul Röttger

According to our database1, Paul Röttger authored at least 20 papers between 2021 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Improving Adversarial Data Collection by Supporting Annotators: Lessons from GAHD, a German Hate Speech Dataset.
CoRR, 2024

Evaluating the Elementary Multilingual Capabilities of Large Language Models with MultiQ.
CoRR, 2024

Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models.
CoRR, 2024

"My Answer is C": First-Token Probabilities Do Not Match Text Answers in Instruction-Tuned Language Models.
CoRR, 2024

2023
SimpleSafetyTests: a Test Suite for Identifying Critical Safety Risks in Large Language Models.
CoRR, 2023

The Empty Signifier Problem: Towards Clearer Paradigms for Operationalising "Alignment" in Large Language Models.
CoRR, 2023

Safety-Tuned LLaMAs: Lessons From Improving the Safety of Large Language Models that Follow Instructions.
CoRR, 2023

XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models.
CoRR, 2023

The Ecological Fallacy in Annotation: Modelling Human Label Variation goes beyond Sociodemographics.
CoRR, 2023

Personalisation within bounds: A risk taxonomy and policy framework for the alignment of large language models with personalised feedback.
CoRR, 2023

SemEval-2023 Task 10: Explainable Detection of Online Sexism.
Proceedings of the The 17th International Workshop on Semantic Evaluation, 2023

The Past, Present and Better Future of Feedback Learning in Large Language Models for Subjective Human Preferences and Values.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

The Ecological Fallacy in Annotation: Modeling Human Label Variation goes beyond Sociodemographics.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2023

Improving the Detection of Multilingual Online Attacks with Rich Social Media Data from Singapore.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

2022
Multilingual HateCheck: Functional Tests for Multilingual Hate Speech Detection Models.
CoRR, 2022

Two Contrasting Data Annotation Paradigms for Subjective NLP Tasks.
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

Hatemoji: A Test Suite and Adversarially-Generated Dataset for Benchmarking and Detecting Emoji-Based Hate.
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

Data-Efficient Strategies for Expanding Hate Speech Detection into Under-Resourced Languages.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

2021
Temporal Adaptation of BERT and Performance on Downstream Document Classification: Insights from Social Media.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2021, 2021

HateCheck: Functional Tests for Hate Speech Detection Models.
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021


  Loading...