Dario Amodei

Affiliations:
  • Anthropic


According to our database1, Dario Amodei authored at least 34 papers between 2013 and 2023.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2023
The Capacity for Moral Self-Correction in Large Language Models.
CoRR, 2023


2022
Discovering Language Model Behaviors with Model-Written Evaluations.
CoRR, 2022

Constitutional AI: Harmlessness from AI Feedback.
CoRR, 2022

Measuring Progress on Scalable Oversight for Large Language Models.
CoRR, 2022

In-context Learning and Induction Heads.
CoRR, 2022

Toy Models of Superposition.
CoRR, 2022

Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned.
CoRR, 2022

Language Models (Mostly) Know What They Know.
CoRR, 2022

Scaling Laws and Interpretability of Learning from Repeated Data.
CoRR, 2022

Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback.
CoRR, 2022

Predictability and Surprise in Large Generative Models.
CoRR, 2022


2021
A General Language Assistant as a Laboratory for Alignment.
CoRR, 2021

Evaluating Large Language Models Trained on Code.
CoRR, 2021

2020
Scaling Laws for Autoregressive Generative Modeling.
CoRR, 2020

Learning to summarize from human feedback.
CoRR, 2020

Scaling Laws for Neural Language Models.
CoRR, 2020

Learning to summarize with human feedback.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020


2019
Fine-Tuning Language Models from Human Preferences.
CoRR, 2019

2018
An Empirical Model of Large-Batch Training.
CoRR, 2018

Supervising strong learners by amplifying weak experts.
CoRR, 2018

Variational Option Discovery Algorithms.
CoRR, 2018

AI safety via debate.
CoRR, 2018

The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation.
CoRR, 2018

Reward learning from human preferences and demonstrations in Atari.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

2017
Deep Reinforcement Learning from Human Preferences.
Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

Learning a Natural Language Interface with Neural Programmer.
Proceedings of the 5th International Conference on Learning Representations, 2017

2016
Concrete Problems in AI Safety.
CoRR, 2016


2015
Deep Speech 2: End-to-End Speech Recognition in English and Mandarin.
CoRR, 2015

2014
Searching for Collective Behavior in a Large Network of Sensory Neurons.
PLoS Comput. Biol., 2014

2013
Physical principles for scalable neural recording.
Frontiers Comput. Neurosci., 2013


  Loading...