Dmitrii Volkov

According to our database1, Dmitrii Volkov authored at least 7 papers between 2024 and 2025.

Collaborative distances:
  • Dijkstra number2 of five.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
Evaluating AI cyber capabilities with crowdsourced elicitation.
CoRR, May, 2025

Demonstrating specification gaming in reasoning models.
CoRR, February, 2025

Resurrecting saturated LLM benchmarks with adversarial encoding.
CoRR, February, 2025

2024
BadGPT-4o: stripping safety finetuning from GPT models.
CoRR, 2024

Hacking CTFs with Plain Agents.
CoRR, 2024

LLM Agent Honeypot: Monitoring AI Hacking Agents in the Wild.
CoRR, 2024

Badllama 3: removing safety finetuning from Llama 3 in minutes.
CoRR, 2024


  Loading...