Joan Velja

Orcid: 0009-0002-4032-0140

According to our database1, Joan Velja authored at least 5 papers between 2024 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
When can we trust untrusted monitoring? A safety case sketch across collusion strategies.
CoRR, February, 2026

2025
Modular Training of Neural Networks aids Interpretability.
CoRR, February, 2025

Hidden in Plain Text: Emergence & Mitigation of Steganographic Collusion in LLMs.
Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, 2025

2024
'Explaining RL Decisions with Trajectories': A Reproducibility Study.
Trans. Mach. Learn. Res., 2024

Dynamic Vocabulary Pruning in Early-Exit LLMs.
CoRR, 2024


  Loading...