Mantas Mazeika

According to our database¹, Mantas Mazeika authored at least 36 papers between 2018 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Book In proceedings Article PhD thesis Dataset Other

Links

Bibliography

Aggressive Compression Enables LLM Weight Theft.

[BibT_eX]

CoRR, January, 2026

Best Practices for Biorisk Evaluations on Open-Weight Bio-Foundation Models.

[BibT_eX]

CoRR, October, 2025

Remote Labor Index: Measuring AI Automation of Remote Work.

[BibT_eX]

CoRR, October, 2025

A Definition of AGI.

[BibT_eX]

CoRR, October, 2025

MoReBench: Evaluating Procedural and Pluralistic Moral Reasoning in Language Models, More than Outcomes.

[BibT_eX]

CoRR, October, 2025

TextQuests: How Good are LLMs at Text-Based Video Games?

[BibT_eX]

CoRR, July, 2025

The MASK Benchmark: Disentangling Honesty From Accuracy in AI Systems.

[BibT_eX]

CoRR, March, 2025

Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs.

[BibT_eX]

CoRR, February, 2025

International AI Safety Report.

[BibT_eX]

CoRR, January, 2025

Humanity's Last Exam.

[BibT_eX]

CoRR, January, 2025

Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs.

[BibT_eX]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

Tamper-Resistant Safeguards for Open-Weight LLMs.

[BibT_eX]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

International Scientific Report on the Safety of Advanced AI (Interim Report).

[BibT_eX]

The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning.

[BibT_eX]

Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?

[BibT_eX]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal.

[BibT_eX]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

The WMDP Benchmark: Measuring and Reducing Malicious Use with Unlearning.

[BibT_eX]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models.

[BibT_eX]

Trans. Mach. Learn. Res., 2023

Representation Engineering: A Top-Down Approach to AI Transparency.

[BibT_eX]

An Overview of Catastrophic AI Risks.

[BibT_eX]

DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models.

[BibT_eX]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

X-Risk Analysis for AI Research.

[BibT_eX]

Forecasting Future World Events With Neural Networks.

[BibT_eX]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios.

[BibT_eX]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

How to Steer Your Adversary: Targeted and Efficient Model Stealing Defenses with Gradient Redirection.

[BibT_eX]

Proceedings of the International Conference on Machine Learning, 2022

Scaling Out-of-Distribution Detection for Real-World Settings.

[BibT_eX]

Proceedings of the International Conference on Machine Learning, 2022

PixMix: Dreamlike Pictures Comprehensively Improve Safety Measures.

[BibT_eX]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

The Trojan Detection Challenge.

[BibT_eX]

Proceedings of the NeurIPS 2022 Competition Track, 2021

What Would Jiminy Cricket Do? Towards Agents That Behave Morally.

[BibT_eX]

Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, 2021

Measuring Coding Challenge Competence With APPS.

[BibT_eX]

Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, 2021

Measuring Massive Multitask Language Understanding.

[BibT_eX]

Proceedings of the 9th International Conference on Learning Representations, 2021

A Benchmark for Anomaly Segmentation.

[BibT_eX]

Using Self-Supervised Learning Can Improve Model Robustness and Uncertainty.

[BibT_eX]

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Using Pre-Training Can Improve Model Robustness and Uncertainty.

[BibT_eX]

Proceedings of the 36th International Conference on Machine Learning, 2019

Deep Anomaly Detection with Outlier Exposure.

[BibT_eX]

Proceedings of the 7th International Conference on Learning Representations, 2019

Using Trusted Data to Train Deep Networks on Labels Corrupted by Severe Noise.

[BibT_eX]

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Mantas Mazeika

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...