Steven Basart

According to our database¹, Steven Basart authored at least 23 papers between 2019 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

Open Technical Problems in Open-Weight AI Model Risk Management.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2026

2025

Depth-Wise Activation Steering for Honest Language Models.

[BibT_eX]

[DOI]

Gracjan Góral

Marysia Winkels

Steven Basart

CoRR, December, 2025

Measuring Chain-of-Thought Monitorability Through Faithfulness and Verbosity.

[BibT_eX]

[DOI]

CoRR, October, 2025

Remote Labor Index: Measuring AI Automation of Remote Work.

[BibT_eX]

[DOI]

CoRR, October, 2025

Psychometric Personality Shaping Modulates Capabilities and Safety in Language Models.

[BibT_eX]

[DOI]

José Hernández-Orallo

CoRR, September, 2025

2024

The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning.

[BibT_eX]

[DOI]

Ann-Kathrin Dombrowski

Justin Tienken-Harder

Kallol Krishna Karmakar

Steven Basart

Stephen Fitz

Mindy Levine

Ponnurangam Kumaraguru

CoRR, 2024

Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

The WMDP Benchmark: Measuring and Reducing Malicious Use with Unlearning.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

2023

Representation Engineering: A Top-Down Approach to AI Transparency.

[BibT_eX]

[DOI]

CoRR, 2023

Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark.

[BibT_eX]

[DOI]

CoRR, 2023

Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the Machiavelli Benchmark.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

2022

How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Scaling Out-of-Distribution Detection for Real-World Settings.

[BibT_eX]

[DOI]

Mohammadreza Mostajabi

Jacob Steinhardt

Dawn Song

Proceedings of the International Conference on Machine Learning, 2022

2021

Towards Robustness of Neural Networks.

[BibT_eX]

[DOI]

Steven Basart

CoRR, 2021

Measuring Coding Challenge Competence With APPS.

[BibT_eX]

[DOI]

Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, 2021

Measuring Mathematical Problem Solving With the MATH Dataset.

[BibT_eX]

[DOI]

Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, 2021

Measuring Massive Multitask Language Understanding.

[BibT_eX]

[DOI]

Proceedings of the 9th International Conference on Learning Representations, 2021

Aligning AI With Shared Human Values.

[BibT_eX]

[DOI]

Proceedings of the 9th International Conference on Learning Representations, 2021

The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Natural Adversarial Examples.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2019

A Benchmark for Anomaly Segmentation.

[BibT_eX]

[DOI]

Dan Hendrycks

Steven Basart

Mantas Mazeika

Mohammadreza Mostajabi

Jacob Steinhardt

Dawn Song

CoRR, 2019

DIODE: A Dense Indoor and Outdoor DEpth Dataset.

[BibT_eX]

[DOI]

Mohammadreza Mostajabi

Steven Basart

Matthew R. Walter

Gregory Shakhnarovich

CoRR, 2019

Steven Basart

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...