Jared Kaplan

According to our database¹, Jared Kaplan authored at least 43 papers between 2007 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Book In proceedings Article PhD thesis Dataset Other

Links

Bibliography

Constitutional Classifiers++: Efficient Production-Grade Defenses against Universal Jailbreaks.

[BibT_eX]

CoRR, January, 2026

Reasoning Models Don't Always Say What They Think.

[BibT_eX]

CoRR, May, 2025

Which Economic Tasks are Performed with AI? Evidence from Millions of Claude Conversations.

[BibT_eX]

CoRR, March, 2025

Forecasting Rare Language Model Behaviors.

[BibT_eX]

CoRR, February, 2025

Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming.

[BibT_eX]

CoRR, January, 2025

Alignment faking in large language models.

[BibT_eX]

Clio: Privacy-Preserving Insights into Real-World AI Use.

[BibT_eX]

Sabotage Evaluations for Frontier Models.

[BibT_eX]

Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models.

[BibT_eX]

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training.

[BibT_eX]

Many-shot Jailbreaking.

[BibT_eX]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models.

[BibT_eX]

Trans. Mach. Learn. Res., 2023

Evaluating and Mitigating Discrimination in Language Model Decisions.

[BibT_eX]

Specific versus General Principles for Constitutional AI.

[BibT_eX]

Studying Large Language Model Generalization with Influence Functions.

[BibT_eX]

Measuring Faithfulness in Chain-of-Thought Reasoning.

[BibT_eX]

Question Decomposition Improves the Faithfulness of Model-Generated Reasoning.

[BibT_eX]

Towards Measuring the Representation of Subjective Global Opinions in Language Models.

[BibT_eX]

The Capacity for Moral Self-Correction in Large Language Models.

[BibT_eX]

Discovering Language Model Behaviors with Model-Written Evaluations.

[BibT_eX]

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

Scaling Laws from the Data Manifold Dimension.

[BibT_eX]

J. Mach. Learn. Res., 2022

Discovering Language Model Behaviors with Model-Written Evaluations.

[BibT_eX]

Constitutional AI: Harmlessness from AI Feedback.

[BibT_eX]

Measuring Progress on Scalable Oversight for Large Language Models.

[BibT_eX]

In-context Learning and Induction Heads.

[BibT_eX]

Toy Models of Superposition.

[BibT_eX]

Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned.

[BibT_eX]

Language Models (Mostly) Know What They Know.

[BibT_eX]

Scaling Laws and Interpretability of Learning from Repeated Data.

[BibT_eX]

Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback.

[BibT_eX]

Predictability and Surprise in Large Generative Models.

[BibT_eX]

Predictability and Surprise in Large Generative Models.

[BibT_eX]

Proceedings of the FAccT '22: 2022 ACM Conference on Fairness, Accountability, and Transparency, Seoul, Republic of Korea, June 21, 2022

A General Language Assistant as a Laboratory for Alignment.

[BibT_eX]

Evaluating Large Language Models Trained on Code.

[BibT_eX]

Explaining Neural Scaling Laws.

[BibT_eX]

Scaling Laws for Transfer.

[BibT_eX]

Data and Parameter Scaling Laws for Neural Machine Translation.

[BibT_eX]

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

Scaling Laws for Autoregressive Generative Modeling.

[BibT_eX]

A Neural Scaling Law from the Dimension of the Data Manifold.

[BibT_eX]

Scaling Laws for Neural Language Models.

[BibT_eX]

Language Models are Few-Shot Learners.

[BibT_eX]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

An Empirical Model of Large-Batch Training.

[BibT_eX]

Explaining Debugging Strategies to End-User Programmers.

[BibT_eX]

Proceedings of the 2007 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC 2007), 2007

Jared Kaplan

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...