Maheep Chaudhary

According to our database1, Maheep Chaudhary authored at least 22 papers between 2022 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
Mechanistic origins of catastrophic forgetting: why RL preserves circuits better than SFT?
CoRR, May, 2026

Playing Devil's Advocate: Off-the-Shelf Persona Vectors Rival Targeted Steering for Sycophancy.
CoRR, May, 2026

In-Context Environments Induce Evaluation-Awareness in Language Models.
CoRR, March, 2026

MANATEE: Inference-Time Lightweight Diffusion Based Safety Defense for LLMs.
CoRR, February, 2026

Weight space Detection of Backdoors in LoRA Adapters.
CoRR, February, 2026

Broken Chains: The Cost of Incomplete Reasoning in LLMs.
CoRR, February, 2026

Punctuations and Predicates in Language Models.
Proceedings of the Findings of the Association for Computational Linguistics: EACL 2026, 2026

2025
SALT: Steering Activations towards Leakage-free Thinking in Chain of Thought.
CoRR, November, 2025

Alignment-Constrained Dynamic Pruning for LLMs: Identifying and Preserving Alignment-Critical Circuits.
CoRR, November, 2025

Optimizing Chain-of-Thought Confidence via Topological and Dirichlet Risk Analysis.
CoRR, November, 2025

PALADIN: Self-Correcting Language Model Agents to Cure Tool-Failure Cases.
CoRR, September, 2025

Amortized Latent Steering: Low-Cost Alternative to Test-Time Optimization.
CoRR, September, 2025

FRIT: Using Causal Importance to Improve Chain-of-Thought Faithfulness.
CoRR, September, 2025

Evaluation Awareness Scales Predictably in Open-Weights Large Language Models.
CoRR, September, 2025

Punctuation and Predicates in Language Models.
CoRR, August, 2025

SafetyNet: Detecting Harmful Outputs in LLMs by Modeling and Monitoring Deceptive Behaviors.
CoRR, May, 2025

Modular Training of Neural Networks aids Interpretability.
CoRR, February, 2025

Causal Abstraction: A Theoretical Foundation for Mechanistic Interpretability.
J. Mach. Learn. Res., 2025

2024
Evaluating Open-Source Sparse Autoencoders on Disentangling Factual Knowledge in GPT-2 Small.
CoRR, 2024

MemeCLIP: Leveraging CLIP Representations for Multimodal Meme Classification.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

2023
Towards Trustworthy and Aligned Machine Learning: A Data-centric Survey with Causality Perspectives.
CoRR, 2023

2022
An Intelligent Recommendation-cum-Reminder System.
Proceedings of the CODS-COMAD 2022: 5th Joint International Conference on Data Science & Management of Data (9th ACM IKDD CODS and 27th COMAD), Bangalore, India, January 8, 2022


  Loading...