Severin Field

According to our database1, Severin Field authored at least 4 papers between 2024 and 2025.

Collaborative distances:
  • Dijkstra number2 of five.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
Why do Experts Disagree on Existential Risk and P(doom)? A Survey of AI Experts.
CoRR, February, 2025

2024
What Features in Prompts Jailbreak LLMs? Investigating the Mechanisms Behind Attacks.
CoRR, 2024

Meta-Models: An Architecture for Decoding LLM Behaviors Through Interpreted Embeddings and Natural Language.
CoRR, 2024

Poser: Unmasking Alignment Faking LLMs by Manipulating Their Internals.
CoRR, 2024


  Loading...