Andy Arditi

According to our database¹, Andy Arditi authored at least 10 papers between 2022 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2025

Weird Generalization and Inductive Backdoors: New Ways to Corrupt LLMs.

[BibT_eX]

[DOI]

CoRR, December, 2025

Real-Time Detection of Hallucinated Entities in Long-Form Generation.

[BibT_eX]

[DOI]

CoRR, September, 2025

Persona Vectors: Monitoring and Controlling Character Traits in Language Models.

[BibT_eX]

[DOI]

CoRR, July, 2025

Adversarial Manipulation of Reasoning Models using Internal Representations.

[BibT_eX]

[DOI]

Kureha Yamaguchi

Benjamin Etheridge

Andy Arditi

CoRR, July, 2025

Teaching Models to Verbalize Reward Hacking in Chain-of-Thought Reasoning.

[BibT_eX]

[DOI]

CoRR, June, 2025

Inverse Scaling in Test-Time Compute.

[BibT_eX]

[DOI]

Jacob Goldman-Wetzler

Trans. Mach. Learn. Res., 2025

Structural Causal Bandits under Markov Equivalence.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

2024

Refusal in Language Models Is Mediated by a Single Direction.

[BibT_eX]

[DOI]

CoRR, 2024

Refusal in Language Models Is Mediated by a Single Direction.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

2022

A Framework for Single-Item NFT Auction Mechanism Design.

[BibT_eX]

[DOI]

Proceedings of the 2022 ACM CCS Workshop on Decentralized Finance and Security, 2022

Andy Arditi

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...