According to our database1, Miljan Martic authored at least 7 papers between 2017 and 2019.
Legend:Book In proceedings Article PhD thesis Other
Penalizing Side Effects using Stepwise Relative Reachability.
Proceedings of the Workshop on Artificial Intelligence Safety 2019 co-located with the 28th International Joint Conference on Artificial Intelligence, 2019
Scaling shared model governance via model splitting.
Scalable agent alignment via reward modeling: a research direction.
Measuring and avoiding side effects using relative reachability.
AI Safety Gridworlds.
Deep reinforcement learning from human preferences.
Deep Reinforcement Learning from Human Preferences.
Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017