Maheep Chaudhary
According to our database1,
Maheep Chaudhary authored at least 22 papers
between 2022 and 2026.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
2026
Mechanistic origins of catastrophic forgetting: why RL preserves circuits better than SFT?
CoRR, May, 2026
Playing Devil's Advocate: Off-the-Shelf Persona Vectors Rival Targeted Steering for Sycophancy.
CoRR, May, 2026
CoRR, March, 2026
CoRR, February, 2026
Proceedings of the Findings of the Association for Computational Linguistics: EACL 2026, 2026
2025
CoRR, November, 2025
Alignment-Constrained Dynamic Pruning for LLMs: Identifying and Preserving Alignment-Critical Circuits.
CoRR, November, 2025
CoRR, November, 2025
CoRR, September, 2025
CoRR, September, 2025
CoRR, September, 2025
CoRR, September, 2025
SafetyNet: Detecting Harmful Outputs in LLMs by Modeling and Monitoring Deceptive Behaviors.
CoRR, May, 2025
J. Mach. Learn. Res., 2025
2024
Evaluating Open-Source Sparse Autoencoders on Disentangling Factual Knowledge in GPT-2 Small.
CoRR, 2024
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024
2023
Towards Trustworthy and Aligned Machine Learning: A Data-centric Survey with Causality Perspectives.
CoRR, 2023
2022
Proceedings of the CODS-COMAD 2022: 5th Joint International Conference on Data Science & Management of Data (9th ACM IKDD CODS and 27th COMAD), Bangalore, India, January 8, 2022