Vikrant Varma

According to our database1, Vikrant Varma authored at least 11 papers between 2020 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
An Approach to Technical AGI Safety and Security.
CoRR, April, 2025

MONA: Myopic Optimization with Non-myopic Approval Can Mitigate Multi-step Reward Hacking.
CoRR, January, 2025

2024
Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2.
CoRR, 2024

Jumping Ahead: Improving Reconstruction Fidelity with JumpReLU Sparse Autoencoders.
CoRR, 2024

Improving Dictionary Learning with Gated Sparse Autoencoders.
CoRR, 2024

Improving Sparse Decomposition of Language Model Activations with Gated Sparse Autoencoders.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

2023
Challenges with unsupervised LLM knowledge discovery.
CoRR, 2023

Explaining grokking through circuit efficiency.
CoRR, 2023

2022
Goal Misgeneralization: Why Correct Specifications Aren't Enough For Correct Goals.
CoRR, 2022

Safe Deep RL in 3D Environments using Human Feedback.
CoRR, 2022

2020
Imitating Interactive Intelligence.
CoRR, 2020


  Loading...