Luke Marks

According to our database¹, Luke Marks authored at least 8 papers between 2023 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2025

DiFR: Inference Verification Despite Nondeterminism.

[BibT_eX]

[DOI]

CoRR, November, 2025

Output Supervision Can Obfuscate the Chain of Thought.

[BibT_eX]

[DOI]

Alexander Matt Turner

CoRR, November, 2025

Beyond Linear Steering: Unified Multi-Attribute Control for Language Models.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2025, 2025

TinySQL: A Progressive Text-to-SQL Dataset for Mechanistic Interpretability Research.

[BibT_eX]

[DOI]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

2024

Enhancing Neural Network Interpretability with Feature-Aligned Sparse Autoencoders.

[BibT_eX]

[DOI]

CoRR, 2024

Informal Safety Guarantees for Simulated Optimizers Through Extrapolation from Partial Simulations.

[BibT_eX]

[DOI]

Luke Marks

CoRR, 2024

Interpreting Learned Feedback Patterns in Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

2023

Interpreting Reward Models in RLHF-Tuned Language Models Using Sparse Autoencoders.

[BibT_eX]

[DOI]

CoRR, 2023

Luke Marks

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...