Pankayaraj Pathmanathan
According to our database1,
Pankayaraj Pathmanathan authored at least 10 papers
between 2024 and 2026.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
2026
Deliberative Alignment is Deep, but Uncertainty Remains: Inference time safety improvement in reasoning via attribution of unsafe behavior to base model.
CoRR, April, 2026
Teach a Reward Model to Correct Itself: Reward Guided Adversarial Failure Discovery for Robust Reward Modeling.
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2026
AdvBDGen: A Robust Framework for Generating Adaptive and Stealthy Backdoors in LLM Alignment.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026
2025
RAGPart & RAGMask: Retrieval-Stage Defenses Against Corpus Poisoning in Retrieval-Augmented Generation.
CoRR, December, 2025
Reward Models Can Improve Themselves: Reward-Guided Adversarial Failure Mode Discovery for Robust Reward Modeling.
CoRR, July, 2025
PoisonedParrot: Subtle Data Poisoning Attacks to Elicit Copyright-Infringing Content from Large Language Models.
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025
Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025
Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data?
Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025
2024
AdvBDGen: Adversarially Fortified Prompt-Specific Fuzzy Backdoor Generator Against LLM Alignment.
CoRR, 2024
CoRR, 2024