Miles Turpin

According to our database1, Miles Turpin authored at least 7 papers between 2023 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
Teaching Models to Verbalize Reward Hacking in Chain-of-Thought Reasoning.
CoRR, June, 2025

Looking Inward: Language Models Can Learn About Themselves by Introspection.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024
Foundational Challenges in Assuring Alignment and Safety of Large Language Models.
Trans. Mach. Learn. Res., 2024

Looking Inward: Language Models Can Learn About Themselves by Introspection.
CoRR, 2024

Foundational Challenges in Assuring Alignment and Safety of Large Language Models.
CoRR, 2024

Bias-Augmented Consistency Training Reduces Biased Reasoning in Chain-of-Thought.
CoRR, 2024

2023
Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023


  Loading...