Chris Olah

According to our database1, Chris Olah authored at least 20 papers between 2015 and 2023.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2023
The Capacity for Moral Self-Correction in Large Language Models.
CoRR, 2023


2022
Discovering Language Model Behaviors with Model-Written Evaluations.
CoRR, 2022

Constitutional AI: Harmlessness from AI Feedback.
CoRR, 2022

Measuring Progress on Scalable Oversight for Large Language Models.
CoRR, 2022

In-context Learning and Induction Heads.
CoRR, 2022

Toy Models of Superposition.
CoRR, 2022

Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned.
CoRR, 2022

Language Models (Mostly) Know What They Know.
CoRR, 2022

Scaling Laws and Interpretability of Learning from Repeated Data.
CoRR, 2022

Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback.
CoRR, 2022

Predictability and Surprise in Large Generative Models.
CoRR, 2022


2021
A General Language Assistant as a Laboratory for Alignment.
CoRR, 2021

2018
Is Generator Conditioning Causally Related to GAN Performance?
Proceedings of the 35th International Conference on Machine Learning, 2018

2017
Conditional Image Synthesis with Auxiliary Classifier GANs.
Proceedings of the 34th International Conference on Machine Learning, 2017

Changing Model Behavior at Test-time Using Reinforcement Learning.
Proceedings of the 5th International Conference on Learning Representations, 2017

2016
Concrete Problems in AI Safety.
CoRR, 2016

TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems.
CoRR, 2016

2015
Document Embedding with Paragraph Vectors.
CoRR, 2015


  Loading...