Amanda Askell

According to our database¹, Amanda Askell authored at least 26 papers between 2019 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2025

Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming.

[BibT_eX]

[DOI]

CoRR, January, 2025

2024

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training.

[BibT_eX]

[DOI]

CoRR, 2024

Towards Understanding Sycophancy in Language Models.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

2023

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models.

[BibT_eX]

[DOI]

Bartlomiej Bojanowski

Christopher D. Manning

Daniel Moseguí González

Eunice Engefu Manyasi

Evgenii Zheltonozhskii

Fanyue Xia

Fatemeh Siar

Fernando Martínez-Plumed

Giambattista Parascandolo

Giorgio Mariani

Gloria Wang

Gonzalo Jaimovitch-López

Jaime Fernández Fisac

Jascha Sohl-Dickstein

José Hernández-Orallo

Karthik Gopalakrishnan

Lidia Contreras Ochando

Louis-Philippe Morency

María José Ramírez-Quintana

Michael I. Ivanitskiy

Neta Gur-Ari Krakover

Nitish Shirish Keskar

Pablo Antonio Moreno Casares

Pegah Alipoormolabashi

Shyamolima (Shammie) Debnath

Sneha Priscilla Makini

Yadollah Yaghoobzadeh

Trans. Mach. Learn. Res., 2023

Evaluating and Mitigating Discrimination in Language Model Decisions.

[BibT_eX]

[DOI]

CoRR, 2023

Specific versus General Principles for Constitutional AI.

[BibT_eX]

[DOI]

CoRR, 2023

Towards Understanding Sycophancy in Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

Towards Measuring the Representation of Subjective Global Opinions in Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

The Capacity for Moral Self-Correction in Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

Discovering Language Model Behaviors with Model-Written Evaluations.

[BibT_eX]

[DOI]

Timothy Telleen-Lawton

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

2022

Discovering Language Model Behaviors with Model-Written Evaluations.

[BibT_eX]

[DOI]

Timothy Telleen-Lawton

CoRR, 2022

Constitutional AI: Harmlessness from AI Feedback.

[BibT_eX]

[DOI]

Timothy Telleen-Lawton

CoRR, 2022

Measuring Progress on Scalable Oversight for Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2022

In-context Learning and Induction Heads.

[BibT_eX]

[DOI]

CoRR, 2022

Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned.

[BibT_eX]

[DOI]

CoRR, 2022

Language Models (Mostly) Know What They Know.

[BibT_eX]

[DOI]

CoRR, 2022

Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback.

[BibT_eX]

[DOI]

CoRR, 2022

Predictability and Surprise in Large Generative Models.

[BibT_eX]

[DOI]

CoRR, 2022

Training language models to follow instructions with human feedback.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Predictability and Surprise in Large Generative Models.

[BibT_eX]

[DOI]

Proceedings of the FAccT '22: 2022 ACM Conference on Fairness, Accountability, and Transparency, Seoul, Republic of Korea, June 21, 2022

2021

A General Language Assistant as a Laboratory for Alignment.

[BibT_eX]

[DOI]

CoRR, 2021

Learning Transferable Visual Models From Natural Language Supervision.

[BibT_eX]

[DOI]

Proceedings of the 38th International Conference on Machine Learning, 2021

2020

Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims.

[BibT_eX]

[DOI]

Thomas Krendl Gilbert

CoRR, 2020

Language Models are Few-Shot Learners.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

2019

Release Strategies and the Social Impacts of Language Models.

[BibT_eX]

[DOI]

CoRR, 2019

The Role of Cooperation in Responsible AI Development.

[BibT_eX]

[DOI]

Amanda Askell

Miles Brundage

Gillian K. Hadfield

CoRR, 2019

Amanda Askell

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...