Amanda Askell

According to our database1, Amanda Askell authored at least 24 papers between 2019 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training.
CoRR, 2024

2023
Evaluating and Mitigating Discrimination in Language Model Decisions.
CoRR, 2023

Specific versus General Principles for Constitutional AI.
CoRR, 2023

Towards Understanding Sycophancy in Language Models.
CoRR, 2023

Towards Measuring the Representation of Subjective Global Opinions in Language Models.
CoRR, 2023

The Capacity for Moral Self-Correction in Large Language Models.
CoRR, 2023


2022
Discovering Language Model Behaviors with Model-Written Evaluations.
CoRR, 2022

Constitutional AI: Harmlessness from AI Feedback.
CoRR, 2022

Measuring Progress on Scalable Oversight for Large Language Models.
CoRR, 2022

In-context Learning and Induction Heads.
CoRR, 2022

Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned.
CoRR, 2022

Language Models (Mostly) Know What They Know.
CoRR, 2022

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models.
CoRR, 2022

Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback.
CoRR, 2022

Predictability and Surprise in Large Generative Models.
CoRR, 2022

Training language models to follow instructions with human feedback.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022


2021
A General Language Assistant as a Laboratory for Alignment.
CoRR, 2021

Learning Transferable Visual Models From Natural Language Supervision.
Proceedings of the 38th International Conference on Machine Learning, 2021

2020
Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims.
CoRR, 2020


2019
Release Strategies and the Social Impacts of Language Models.
CoRR, 2019

The Role of Cooperation in Responsible AI Development.
CoRR, 2019


  Loading...