Satvik Golechha

Orcid: 0009-0000-5274-1060

According to our database1, Satvik Golechha authored at least 13 papers between 2022 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
Building Better Deception Probes Using Targeted Instruction Pairs.
CoRR, February, 2026

2025
ABBEL: LLM Agents Acting through Belief Bottlenecks Expressed in Language.
CoRR, December, 2025

Auditing Games for Sandbagging.
CoRR, December, 2025

Who's the Evil Twin? Differential Auditing for Undesired Behavior.
CoRR, August, 2025

CataractBot: An LLM-powered Expert-in-the-Loop Chatbot for Cataract Patients.
Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., June, 2025

Among Us: A Sandbox for Agentic Deception.
CoRR, April, 2025

Auditing language models for hidden objectives.
CoRR, March, 2025

Modular Training of Neural Networks aids Interpretability.
CoRR, February, 2025

A is for Absorption: Studying Feature Splitting and Absorption in Sparse Autoencoders.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

2024
Progress Measures for Grokking on Real-world Datasets.
CoRR, 2024

Position Paper: Toward New Frameworks for Studying Model Representations.
CoRR, 2024

NICE: To Optimize In-Context Examples or Not?
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2022
Predicting Treatment Adherence of Tuberculosis Patients at Scale.
Proceedings of the Machine Learning for Health, 2022


  Loading...