Mary Phuong

According to our database1, Mary Phuong authored at least 13 papers between 2019 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
LLMs Can Covertly Sandbag on Capability Evaluations Against Chain-of-Thought Monitoring.
CoRR, August, 2025

Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety.
CoRR, July, 2025

CoT Red-Handed: Stress Testing Chain-of-Thought Monitoring.
CoRR, May, 2025

Evaluating Frontier Models for Stealth and Situational Awareness.
CoRR, May, 2025

From Stability to Inconsistency: A Study of Moral Preferences in LLMs.
CoRR, April, 2025

2024
Evaluating Frontier Models for Dangerous Capabilities.
CoRR, 2024

2023
Model evaluation for extreme risks.
CoRR, 2023

2022
Goal Misgeneralization: Why Correct Specifications Aren't Enough For Correct Goals.
CoRR, 2022

Formal Algorithms for Transformers.
CoRR, 2022

2021
The inductive bias of ReLU networks on orthogonally separable data.
Proceedings of the 9th International Conference on Learning Representations, 2021

2020
Functional vs. parametric equivalence of ReLU networks.
Proceedings of the 8th International Conference on Learning Representations, 2020

2019
Towards Understanding Knowledge Distillation.
Proceedings of the 36th International Conference on Machine Learning, 2019

Distillation-Based Training for Multi-Exit Architectures.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019


  Loading...