We stand with Ukraine

We stand with Ukraine

Maheep Chaudhary

According to our database¹, Maheep Chaudhary authored at least 22 papers between 2022 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

Mechanistic origins of catastrophic forgetting: why RL preserves circuits better than SFT?

[DOI]

Jeanmely Rojas Nunez

,

,

,

Nomgondalai Amgalanbaatar

,

,

,

Maheep Chaudhary

CoRR, May, 2026

Playing Devil's Advocate: Off-the-Shelf Persona Vectors Rival Targeted Steering for Sycophancy.

[DOI]

,

,

,

,

,

Maheep Chaudhary

CoRR, May, 2026

In-Context Environments Induce Evaluation-Awareness in Language Models.

[DOI]

Maheep Chaudhary

CoRR, March, 2026

MANATEE: Inference-Time Lightweight Diffusion Based Safety Defense for LLMs.

[DOI]

Chun Yan Ryan Kan

,

,

,

,

,

,

Maheep Chaudhary

CoRR, February, 2026

Weight space Detection of Backdoors in LoRA Adapters.

[DOI]

David Puertolas Merenciano

,

Ekaterina Vasyagina

,

,

,

,

Javier Ferrando

,

Maheep Chaudhary

CoRR, February, 2026

Broken Chains: The Cost of Incomplete Reasoning in LLMs.

[DOI]

,

Gaurav Purushothaman

,

,

,

,

,

,

Maheep Chaudhary

CoRR, February, 2026

Punctuations and Predicates in Language Models.

[DOI]

Sonakshi Chauhan

,

Maheep Chaudhary

,

,

Samuel Nellessen

,

Proceedings of the Findings of the Association for Computational Linguistics: EACL 2026, 2026

2025

SALT: Steering Activations towards Leakage-free Thinking in Chain of Thought.

[DOI]

,

,

,

Shashank Kesineni

,

,

,

,

,

Maheep Chaudhary

CoRR, November, 2025

Alignment-Constrained Dynamic Pruning for LLMs: Identifying and Preserving Alignment-Critical Circuits.

[DOI]

,

Gabrielle Gervacio

,

,

,

,

,

,

Maheep Chaudhary

CoRR, November, 2025

Optimizing Chain-of-Thought Confidence via Topological and Dirichlet Risk Analysis.

[DOI]

,

,

,

,

,

Parham Sharafoleslami

,

Maheep Chaudhary

CoRR, November, 2025

PALADIN: Self-Correcting Language Model Agents to Cure Tool-Failure Cases.

[DOI]

Sri Vatsa Vuddanti

,

,

Satwik Kumar Chittiprolu

,

,

,

,

Maheep Chaudhary

CoRR, September, 2025

Amortized Latent Steering: Low-Cost Alternative to Test-Time Optimization.

[DOI]

,

,

,

,

Maheep Chaudhary

CoRR, September, 2025

FRIT: Using Causal Importance to Improve Chain-of-Thought Faithfulness.

[DOI]

,

,

Saksham Uboweja

,

Adiliia Uzdenova

,

,

,

,

,

,

Maheep Chaudhary

CoRR, September, 2025

Evaluation Awareness Scales Predictably in Open-Weights Large Language Models.

[DOI]

Maheep Chaudhary

,

,

,

Nishith Shankar

,

,

,

,

,

CoRR, September, 2025

Punctuation and Predicates in Language Models.

[DOI]

Sonakshi Chauhan

,

Maheep Chaudhary

,

,

Samuel Nellessen

,

CoRR, August, 2025

SafetyNet: Detecting Harmful Outputs in LLMs by Modeling and Monitoring Deceptive Behaviors.

[DOI]

Maheep Chaudhary

,

CoRR, May, 2025

Modular Training of Neural Networks aids Interpretability.

[DOI]

Satvik Golechha

,

Maheep Chaudhary

,

,

Alessandro Abate

,

CoRR, February, 2025

Causal Abstraction: A Theoretical Foundation for Mechanistic Interpretability.

[DOI]

,

Duligur Ibeling

,

,

Maheep Chaudhary

,

Sonakshi Chauhan

,

,

,

,

Noah D. Goodman

,

Christopher Potts

,

J. Mach. Learn. Res., 2025

2024

Evaluating Open-Source Sparse Autoencoders on Disentangling Factual Knowledge in GPT-2 Small.

[DOI]

Maheep Chaudhary

,

CoRR, 2024

MemeCLIP: Leveraging CLIP Representations for Multimodal Meme Classification.

[DOI]

Siddhant Bikram Shah

,

Shuvam Shiwakoti

,

Maheep Chaudhary

,

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

2023

Towards Trustworthy and Aligned Machine Learning: A Data-centric Survey with Causality Perspectives.

[DOI]

,

Maheep Chaudhary

,

CoRR, 2023

2022

An Intelligent Recommendation-cum-Reminder System.

[DOI]

,

Maheep Chaudhary

,

Chandresh Kumar Maurya

,

Proceedings of the CODS-COMAD 2022: 5th Joint International Conference on Data Science & Management of Data (9th ACM IKDD CODS and 27th COMAD), Bangalore, India, January 8, 2022

Loading...