Erik Jones

According to our database1, Erik Jones authored at least 20 papers between 2020 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
AI Organizations are More Effective but Less Aligned than Individual Agents.
CoRR, April, 2026

Abstractive Red-Teaming of Language Model Character.
CoRR, February, 2026

Eliciting Harmful Capabilities by Fine-Tuning On Safeguarded Outputs.
CoRR, January, 2026

2025
Beyond Data Filtering: Knowledge Localization for Capability Removal in LLMs.
CoRR, December, 2025

Poisoning Attacks on LLMs Require a Near-constant Number of Poison Samples.
CoRR, October, 2025

Forecasting Rare Language Model Behaviors.
CoRR, February, 2025

How Do Large Language Monkeys Get Their Power (Laws)?
Proceedings of the Forty-second International Conference on Machine Learning, 2025

Adversaries Can Misuse Combinations of Safe Models.
Proceedings of the Forty-second International Conference on Machine Learning, 2025

Uncovering Gaps in How Humans and LLMs Interpret Subjective Language.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024
Best-of-N Jailbreaking.
CoRR, 2024

Feedback Loops With Language Models Drive In-Context Reward Hacking.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of Language Models.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Teaching Language Models to Hallucinate Less with Synthetic Tasks.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

2023
Orca 2: Teaching Small Language Models How to Reason.
CoRR, 2023

Mass-Producing Failures of Multimodal Systems with Language Models.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Automatically Auditing Large Language Models via Discrete Optimization.
Proceedings of the International Conference on Machine Learning, 2023

2022
Capturing Failures of Large Language Models via Human Cognitive Biases.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

2021
Selective Classification Can Magnify Disparities Across Groups.
Proceedings of the 9th International Conference on Learning Representations, 2021

2020
Impact of a deep learning assistant on the histopathologic classification of liver cancer.
npj Digit. Medicine, 2020

Robust Encodings: A Framework for Combating Adversarial Typos.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020


  Loading...