Judd Rosenblatt

According to our database1, Judd Rosenblatt authored at least 7 papers between 2024 and 2026.

Collaborative distances:
  • Dijkstra number2 of five.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
Learning Self-Interpretation from Interpretability Artifacts: Training Lightweight Adapters on Vector-Label Pairs.
CoRR, February, 2026

Endogenous Resistance to Activation Steering in Language Models.
CoRR, February, 2026

2025
Large Language Models Report Subjective Experience Under Self-Referential Processing.
CoRR, October, 2025

Momentum Point-Perplexity Mechanics in Large Language Models.
CoRR, August, 2025

2024
Towards Safe and Honest AI Agents with Neural Self-Other Overlap.
CoRR, 2024

Unexpected Benefits of Self-Modeling in Neural Systems.
CoRR, 2024

Rethinking harmless refusals when fine-tuning foundation models.
CoRR, 2024


  Loading...