Andy Arditi

According to our database1, Andy Arditi authored at least 7 papers between 2022 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
Persona Vectors: Monitoring and Controlling Character Traits in Language Models.
CoRR, July, 2025

Inverse Scaling in Test-Time Compute.
CoRR, July, 2025

Adversarial Manipulation of Reasoning Models using Internal Representations.
CoRR, July, 2025

Teaching Models to Verbalize Reward Hacking in Chain-of-Thought Reasoning.
CoRR, June, 2025

2024
Refusal in Language Models Is Mediated by a Single Direction.
CoRR, 2024

Refusal in Language Models Is Mediated by a Single Direction.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

2022
A Framework for Single-Item NFT Auction Mechanism Design.
Proceedings of the 2022 ACM CCS Workshop on Decentralized Finance and Security, 2022


  Loading...