Robert Kirk

According to our database¹, Robert Kirk authored at least 33 papers between 1980 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2025

Poisoning Attacks on LLMs Require a Near-constant Number of Poison Samples.

[BibT_eX]

[DOI]

CoRR, October, 2025

Deep Ignorance: Filtering Pretraining Data Builds Tamper-Resistant Safeguards into Open-Weight LLMs.

[BibT_eX]

[DOI]

CoRR, August, 2025

Security Challenges in AI Agent Deployment: Insights from a Large Scale Public Competition.

[BibT_eX]

[DOI]

Andy Zou

Maxwell Lin

Eliot Krzysztof Jones

CoRR, July, 2025

STACK: Adversarial Attacks on LLM Safeguard Pipelines.

[BibT_eX]

[DOI]

Ian R. McKenzie

Oskar John Hollinsworth

CoRR, June, 2025

Existing Large Language Model Unlearning Evaluations Are Inconclusive.

[BibT_eX]

[DOI]

CoRR, June, 2025

Reward Model Overoptimisation in Iterated RLHF.

[BibT_eX]

[DOI]

Lorenz Wolf

Robert Kirk

Mirco Musolesi

CoRR, May, 2025

An Example Safety Case for Safeguards Against Misuse.

[BibT_eX]

[DOI]

CoRR, May, 2025

Dataset Featurization: Uncovering Natural Language Features through Unsupervised Data Reconstruction.

[BibT_eX]

[DOI]

CoRR, February, 2025

Fundamental Limitations in Defending LLM Finetuning APIs.

[BibT_eX]

[DOI]

Christian Schröder de Witt

Yarin Gal

CoRR, February, 2025

Open Problems in Machine Unlearning for AI Safety.

[BibT_eX]

[DOI]

José Hernández-Orallo

Mor Geva

Yarin Gal

CoRR, January, 2025

Model Tampering Attacks Enable More Rigorous Evaluations of LLM Capabilities.

[BibT_eX]

[DOI]

Dylan Hadfield-Menell

Trans. Mach. Learn. Res., 2025

Investigating Non-Transitivity in LLM-as-a-Judge.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

How Do Large Language Monkeys Get Their Power (Laws)?

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models.

[BibT_eX]

[DOI]

Laura Ruis

Maximilian Mozes

Juhan Bae

Siddhartha Rao Kamalakara

Dwaraknath Gnaneshwar

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024

Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models.

[BibT_eX]

[DOI]

Laura Ruis

Maximilian Mozes

Juhan Bae

Siddhartha Rao Kamalakara

CoRR, 2024

Analyzing the Generalization and Reliability of Steering Vectors.

[BibT_eX]

[DOI]

CoRR, 2024

Analysing the Generalisation and Reliability of Steering Vectors.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Generalization to New Sequential Decision Making Tasks with In-Context Learning.

[BibT_eX]

[DOI]

Sharath Chandra Raparthy

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Understanding the Effects of RLHF on LLM Generalisation and Diversity.

[BibT_eX]

[DOI]

Robert Kirk

Ishita Mediratta

Christoforos Nalmpantis

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Mechanistically analyzing the effects of fine-tuning on procedurally defined tasks.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Reward Model Ensembles Help Mitigate Overoptimization.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

2023

A Survey of Zero-shot Generalisation in Deep Reinforcement Learning.

[BibT_eX]

[DOI]

J. Artif. Intell. Res., 2023

Leading the Pack: N-player Opponent Shaping.

[BibT_eX]

[DOI]

CoRR, 2023

What Mechanisms Does Knowledge Distillation Distill?

[BibT_eX]

[DOI]

Cindy Wu

Ekdeep Singh Lubana

Bruno Kacper Mlodozeniec

Robert Kirk

David Krueger

Proceedings of UniReps: the First Workshop on Unifying Representations in Neural Models, 2023

2022

Domain Generalization for Robust Model-Based Offline Reinforcement Learning.

[BibT_eX]

[DOI]

Alan Clark

Shoaib Ahmed Siddiqui

CoRR, 2022

Graph Backup: Data Efficient Backup Exploiting Markovian Transitions.

[BibT_eX]

[DOI]

CoRR, 2022

2021

A Survey of Generalisation in Deep Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, 2021

MiniHack the Planet: A Sandbox for Open-Ended Reinforcement Learning Research.

[BibT_eX]

[DOI]

Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, 2021

Insights From the NeurIPS 2021 NetHack Challenge.

[BibT_eX]

[DOI]

Proceedings of the NeurIPS 2021 Competitions and Demonstrations Track, 2021

2006

Physicalism and strict implication.

[BibT_eX]

[DOI]

Robert Kirk

Synth., 2006

1996

How physicalists can avoid reductionism.

[BibT_eX]

[DOI]

Robert Kirk

Synth., 1996

1981

A timing verification system based on extracted MOS/VLSI circuit parameters.

[BibT_eX]

[DOI]

Pauline Ng

Wolfram Glauert

Robert Kirk

Proceedings of the 18th Design Automation Conference, 1981

1980

SIDS (A Symbolic Interactive Design System).

[BibT_eX]

[DOI]

Dave Clary

Robert Kirk

Steve Sapiro

Proceedings of the 17th Design Automation Conference, 1980

Robert Kirk

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...