Andy Arditi

According to our database1, Andy Arditi authored at least 9 papers between 2022 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2025
Weird Generalization and Inductive Backdoors: New Ways to Corrupt LLMs.
CoRR, December, 2025

Real-Time Detection of Hallucinated Entities in Long-Form Generation.
CoRR, September, 2025

Persona Vectors: Monitoring and Controlling Character Traits in Language Models.
CoRR, July, 2025

Adversarial Manipulation of Reasoning Models using Internal Representations.
CoRR, July, 2025

Teaching Models to Verbalize Reward Hacking in Chain-of-Thought Reasoning.
CoRR, June, 2025

Inverse Scaling in Test-Time Compute.
Trans. Mach. Learn. Res., 2025

2024
Refusal in Language Models Is Mediated by a Single Direction.
CoRR, 2024

Refusal in Language Models Is Mediated by a Single Direction.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

2022
A Framework for Single-Item NFT Auction Mechanism Design.
Proceedings of the 2022 ACM CCS Workshop on Decentralized Finance and Security, 2022


  Loading...