Rohin Shah

Orcid: 0000-0002-0656-2800

According to our database1, Rohin Shah authored at least 28 papers between 2014 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Evaluating Frontier Models for Dangerous Capabilities.
CoRR, 2024

AtP*: An efficient and scalable method for localizing LLM behaviour to components.
CoRR, 2024

2023
Challenges with unsupervised LLM knowledge discovery.
CoRR, 2023

Explaining grokking through circuit efficiency.
CoRR, 2023

Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla.
CoRR, 2023

Towards Solving Fuzzy Tasks with Human Feedback: A Retrospective of the MineRL BASALT 2022 Competition.
CoRR, 2023

BEDD: The MineRL BASALT Evaluation and Demonstrations Dataset for Training and Benchmarking Agents that Solve Fuzzy Tasks.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

SIRL: Similarity-based Implicit Representation Learning.
Proceedings of the 2023 ACM/IEEE International Conference on Human-Robot Interaction, 2023

2022
Goal Misgeneralization: Why Correct Specifications Aren't Enough For Correct Goals.
CoRR, 2022

An Empirical Investigation of Representation Learning for Imitation.
CoRR, 2022

Retrospective on the 2021 BASALT Competition on Learning from Human Feedback.
CoRR, 2022

2021
The MineRL BASALT Competition on Learning from Human Feedback.
CoRR, 2021

Combining Reward Information from Multiple Sources.
CoRR, 2021

Optimal Policies Tend To Seek Power.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Retrospective on the 2021 MineRL BASALT Competition on Learning from Human Feedback.
Proceedings of the NeurIPS 2021 Competitions and Demonstrations Track, 2021


An Empirical Investigation of Representation Learning for Imitation.
Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, 2021

Learning What To Do by Simulating the Past.
Proceedings of the 9th International Conference on Learning Representations, 2021

Evaluating the Robustness of Collaborative Agents.
Proceedings of the AAMAS '21: 20th International Conference on Autonomous Agents and Multiagent Systems, 2021

2020
Extracting and Using Preference Information from the State of the World.
PhD thesis, 2020

The MAGICAL Benchmark for Robust Imitation.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Choice Set Misspecification in Reward Inference.
Proceedings of the Workshop on Artificial Intelligence Safety 2020 co-located with the 29th International Joint Conference on Artificial Intelligence and the 17th Pacific Rim International Conference on Artificial Intelligence (IJCAI-PRICAI 2020), 2020

2019
On the Utility of Learning about Humans for Human-AI Coordination.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

On the Feasibility of Learning, Rather than Assuming, Human Biases for Reward Inference.
Proceedings of the 36th International Conference on Machine Learning, 2019

Preferences Implicit in the State of the World.
Proceedings of the 7th International Conference on Learning Representations, 2019

2018
Active Inverse Reward Design.
CoRR, 2018

2016
SIMPL: A DSL for Automatic Specialization of Inference Algorithms.
CoRR, 2016

2014
Chlorophyll: synthesis-aided compiler for low-power spatial architectures.
Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, 2014


  Loading...