Teun van der Weij

According to our database1, Teun van der Weij authored at least 12 papers between 2021 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2025
Stress Testing Deliberative Alignment for Anti-Scheming Training.
CoRR, September, 2025

SHADE-Arena: Evaluating Sabotage and Monitoring in LLM Agents.
CoRR, June, 2025

The Elicitation Game: Evaluating Capability Elicitation Techniques.
CoRR, February, 2025

CTRL-ALT-DECEIT Sabotage Evaluations for Automated AI R&D.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

Noise Injection Reveals Hidden Capabilities of Sandbagging Language Models.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

The Elicitation Game: Evaluating Capability Elicitation Techniques.
Proceedings of the Forty-second International Conference on Machine Learning, 2025

AI Sandbagging: Language Models can Strategically Underperform on Evaluations.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024
Noise Injection Reveals Hidden Capabilities of Sandbagging Language Models.
CoRR, 2024

AI Sandbagging: Language Models can Strategically Underperform on Evaluations.
CoRR, 2024

Extending Activation Steering to Broad Skills and Multiple Behaviours.
CoRR, 2024

2023
Evaluating Shutdown Avoidance of Language Models in Textual Scenarios.
CoRR, 2023

2021
Runtime Prediction of Filter Unsupervised Feature Selection Methods.
Res. Comput. Sci., 2021


  Loading...