Phillip Guo

According to our database1, Phillip Guo authored at least 8 papers between 2022 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs.
Trans. Mach. Learn. Res., 2025

2024
Mechanistic Unlearning: Robust Knowledge Unlearning and Editing via Mechanistic Localization.
CoRR, 2024

Targeted Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs.
CoRR, 2024

Eight Methods to Evaluate Robust Unlearning in LLMs.
CoRR, 2024

2023
Localizing Lying in Llama: Understanding Instructed Dishonesty on True-False Questions Through Prompting, Probing, and Patching.
CoRR, 2023

Representation Engineering: A Top-Down Approach to AI Transparency.
CoRR, 2023

Prune and Tune: Improving Efficient Pruning Techniques for Massive Language Models.
Proceedings of the First Tiny Papers Track at ICLR 2023, 2023

2022
Bandit-Based Multi-Start Strategies for Global Continuous Optimization.
Proceedings of the Winter Simulation Conference, 2022


  Loading...