Adrià Garriga-Alonso
According to our database1,
Adrià Garriga-Alonso
authored at least 24 papers
between 2019 and 2025.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
2025
Interpreting learned search: finding a transition model and value function in an RNN that plays Sokoban.
CoRR, June, 2025
CoRR, May, 2025
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
2024
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
Catastrophic Goodhart: regularizing RLHF with KL divergence does not mitigate heavy-tailed reward misspecification.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
InterpBench: Semi-Synthetic Transformers for Evaluating Mechanistic Interpretability Techniques.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
2023
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models.
Trans. Mach. Learn. Res., 2023
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023
2022
Proceedings of the Uncertainty in Artificial Intelligence, 2022
Proceedings of the Tenth International Conference on Learning Representations, 2022
2021
<i>BNNpriors</i>: A library for Bayesian neural network inference with different prior distributions.
Softw. Impacts, 2021
BNNpriors: A library for Bayesian neural network inference with different prior distributions.
CoRR, 2021
Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence, 2021
2020
2019
Proceedings of the 7th International Conference on Learning Representations, 2019