Luke Marks

According to our database1, Luke Marks authored at least 8 papers between 2023 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2025
DiFR: Inference Verification Despite Nondeterminism.
CoRR, November, 2025

Output Supervision Can Obfuscate the Chain of Thought.
CoRR, November, 2025

Beyond Linear Steering: Unified Multi-Attribute Control for Language Models.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2025, 2025

TinySQL: A Progressive Text-to-SQL Dataset for Mechanistic Interpretability Research.
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

2024
Enhancing Neural Network Interpretability with Feature-Aligned Sparse Autoencoders.
CoRR, 2024

Informal Safety Guarantees for Simulated Optimizers Through Extrapolation from Partial Simulations.
CoRR, 2024

Interpreting Learned Feedback Patterns in Large Language Models.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

2023
Interpreting Reward Models in RLHF-Tuned Language Models Using Sparse Autoencoders.
CoRR, 2023


  Loading...