Robert Kirk

According to our database1, Robert Kirk authored at least 32 papers between 1980 and 2025.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
Deep Ignorance: Filtering Pretraining Data Builds Tamper-Resistant Safeguards into Open-Weight LLMs.
CoRR, August, 2025

Security Challenges in AI Agent Deployment: Insights from a Large Scale Public Competition.
CoRR, July, 2025

STACK: Adversarial Attacks on LLM Safeguard Pipelines.
CoRR, June, 2025

Existing Large Language Model Unlearning Evaluations Are Inconclusive.
CoRR, June, 2025

Reward Model Overoptimisation in Iterated RLHF.
CoRR, May, 2025

An Example Safety Case for Safeguards Against Misuse.
CoRR, May, 2025

How Do Large Language Monkeys Get Their Power (Laws)?
CoRR, February, 2025

Dataset Featurization: Uncovering Natural Language Features through Unsupervised Data Reconstruction.
CoRR, February, 2025

Fundamental Limitations in Defending LLM Finetuning APIs.
CoRR, February, 2025

Investigating Non-Transitivity in LLM-as-a-Judge.
CoRR, February, 2025

Open Problems in Machine Unlearning for AI Safety.
CoRR, January, 2025

Model Tampering Attacks Enable More Rigorous Evaluations of LLM Capabilities.
Trans. Mach. Learn. Res., 2025

Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024
Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models.
CoRR, 2024

Analyzing the Generalization and Reliability of Steering Vectors.
CoRR, 2024

Analysing the Generalisation and Reliability of Steering Vectors.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Generalization to New Sequential Decision Making Tasks with In-Context Learning.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Understanding the Effects of RLHF on LLM Generalisation and Diversity.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Mechanistically analyzing the effects of fine-tuning on procedurally defined tasks.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Reward Model Ensembles Help Mitigate Overoptimization.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

2023
A Survey of Zero-shot Generalisation in Deep Reinforcement Learning.
J. Artif. Intell. Res., 2023

Leading the Pack: N-player Opponent Shaping.
CoRR, 2023

What Mechanisms Does Knowledge Distillation Distill?
Proceedings of UniReps: the First Workshop on Unifying Representations in Neural Models, 2023

2022
Domain Generalization for Robust Model-Based Offline Reinforcement Learning.
CoRR, 2022

Graph Backup: Data Efficient Backup Exploiting Markovian Transitions.
CoRR, 2022

2021
A Survey of Generalisation in Deep Reinforcement Learning.
CoRR, 2021

MiniHack the Planet: A Sandbox for Open-Ended Reinforcement Learning Research.
Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, 2021


2006
Physicalism and strict implication.
Synth., 2006

1996
How physicalists can avoid reductionism.
Synth., 1996

1981
A timing verification system based on extracted MOS/VLSI circuit parameters.
Proceedings of the 18th Design Automation Conference, 1981

1980
SIDS (A Symbolic Interactive Design System).
Proceedings of the 17th Design Automation Conference, 1980


  Loading...