Phillip Guo
According to our database1,
Phillip Guo
authored at least 8 papers
between 2022 and 2025.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
2025
Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs.
Trans. Mach. Learn. Res., 2025
2024
Mechanistic Unlearning: Robust Knowledge Unlearning and Editing via Mechanistic Localization.
CoRR, 2024
Targeted Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs.
CoRR, 2024
2023
Localizing Lying in Llama: Understanding Instructed Dishonesty on True-False Questions Through Prompting, Probing, and Patching.
CoRR, 2023
Proceedings of the First Tiny Papers Track at ICLR 2023, 2023
2022
Proceedings of the Winter Simulation Conference, 2022