Jacob Hilton

Orcid: 0009-0002-2002-5929

According to our database¹, Jacob Hilton authored at least 19 papers between 2016 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Bibliography

2026

Estimating the expected output of wide random MLPs more efficiently than sampling.

[BibT_eX]

[DOI]

CoRR, May, 2026

2025

Subliminal Learning: Language models transmit behavioral traits via hidden signals in data.

[BibT_eX]

[DOI]

CoRR, July, 2025

Backdoor Defense, Learnability and Obfuscation.

[BibT_eX]

[DOI]

Proceedings of the 16th Innovations in Theoretical Computer Science Conference, 2025

Estimating the Probabilities of Rare Outputs in Language Models.

[BibT_eX]

[DOI]

Gabriel Wu

Jacob Hilton

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024

Obfuscated Activations Bypass LLM Latent-Space Defenses.

[BibT_eX]

[DOI]

CoRR, 2024

Towards a Law of Iterated Expectations for Heuristic Estimators.

[BibT_eX]

[DOI]

CoRR, 2024

2023

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models.

[BibT_eX]

[DOI]

Bartlomiej Bojanowski

Christopher D. Manning

Daniel Moseguí González

Eunice Engefu Manyasi

Evgenii Zheltonozhskii

Fanyue Xia

Fatemeh Siar

Fernando Martínez-Plumed

Giambattista Parascandolo

Giorgio Mariani

Gloria Wang

Gonzalo Jaimovitch-López

Jaime Fernández Fisac

Jascha Sohl-Dickstein

José Hernández-Orallo

Karthik Gopalakrishnan

Lidia Contreras Ochando

Louis-Philippe Morency

María José Ramírez-Quintana

Michael I. Ivanitskiy

Neta Gur-Ari Krakover

Nitish Shirish Keskar

Pablo Antonio Moreno Casares

Pegah Alipoormolabashi

Shyamolima (Shammie) Debnath

Sneha Priscilla Makini

Yadollah Yaghoobzadeh

Trans. Mach. Learn. Res., 2023

Scaling laws for single-agent reinforcement learning.

[BibT_eX]

[DOI]

Jacob Hilton

Jie Tang

John Schulman

CoRR, 2023

Scaling Laws for Reward Model Overoptimization.

[BibT_eX]

[DOI]

Leo Gao

John Schulman

Jacob Hilton

Proceedings of the International Conference on Machine Learning, 2023

2022

Teaching Models to Express Their Uncertainty in Words.

[BibT_eX]

[DOI]

Stephanie Lin

Jacob Hilton

Owain Evans

Trans. Mach. Learn. Res., 2022

Training language models to follow instructions with human feedback.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Batch size-invariance for policy optimization.

[BibT_eX]

[DOI]

Jacob Hilton

Karl Cobbe

John Schulman

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

TruthfulQA: Measuring How Models Mimic Human Falsehoods.

[BibT_eX]

[DOI]

Stephanie Lin

Jacob Hilton

Owain Evans

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

2021

WebGPT: Browser-assisted question-answering with human feedback.

[BibT_eX]

[DOI]

CoRR, 2021

Training Verifiers to Solve Math Word Problems.

[BibT_eX]

[DOI]

CoRR, 2021

Phasic Policy Gradient.

[BibT_eX]

[DOI]

Proceedings of the 38th International Conference on Machine Learning, 2021

2020

Measuring Sample Efficiency and Generalization in Reinforcement Learning Benchmarks: NeurIPS 2020 Procgen Benchmark.

[BibT_eX]

[DOI]

Proceedings of the NeurIPS 2020 Competition and Demonstration Track, 2020

Leveraging Procedural Generation to Benchmark Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the 37th International Conference on Machine Learning, 2020

2016

The Topological Pigeonhole Principle for Ordinals.

[BibT_eX]

[DOI]

Jacob Hilton

J. Symb. Log., 2016

Jacob Hilton

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...