Carson Denison

According to our database1, Carson Denison authored at least 11 papers between 2023 and 2025.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
Reasoning Models Don't Always Say What They Think.
CoRR, May, 2025

Auditing language models for hidden objectives.
CoRR, March, 2025

2024
Alignment faking in large language models.
CoRR, 2024

Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models.
CoRR, 2024

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training.
CoRR, 2024


Gradient-Based Language Model Red Teaming.
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics, 2024

2023
How to DP-fy ML: A Practical Guide to Machine Learning with Differential Privacy.
J. Artif. Intell. Res., 2023

Measuring Faithfulness in Chain-of-Thought Reasoning.
CoRR, 2023

Question Decomposition Improves the Faithfulness of Model-Generated Reasoning.
CoRR, 2023

Private Ad Modeling with DP-SGD.
Proceedings of the Workshop on Data Mining for Online Advertising (AdKDD 2023) co-located with the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2023), 2023


  Loading...