Maksym Andriushchenko

According to our database1, Maksym Andriushchenko authored at least 38 papers between 2017 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
Strategic Dishonesty Can Undermine AI Safety Evaluations of Frontier LLMs.
CoRR, September, 2025

OS-Harm: A Benchmark for Measuring Safety of Computer Use Agents.
CoRR, June, 2025

Monitoring Decomposition Attacks in LLMs with Lightweight Sequential Monitors.
CoRR, June, 2025

Capability-Based Scaling Laws for LLM Red-Teaming.
CoRR, May, 2025

Critical Influence of Overparameterization on Sharpness-aware Minimization.
Proceedings of the Conference on Uncertainty in Artificial Intelligence, 2025

Is In-Context Learning Sufficient for Instruction Following in LLMs?
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Does Refusal Training in LLMs Generalize to the Past Tense?
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024
Exploring Memorization and Copyright Violation in Frontier LLMs: A Study of the New York Times v. OpenAI 2023 Lawsuit.
CoRR, 2024

AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents.
CoRR, 2024

Improving Alignment and Robustness with Circuit Breakers.
CoRR, 2024

Competition Report: Finding Universal Jailbreak Backdoors in Aligned LLMs.
CoRR, 2024

Improving Alignment and Robustness with Circuit Breakers.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Why Do We Need Weight Decay in Modern Deep Learning?
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Layer-wise linear mode connectivity.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

2023
Scaling Compute Is Not All You Need for Adversarial Robustness.
CoRR, 2023

The Effects of Overparameterization on Sharpness-aware Minimization: An Empirical and Theoretical Analysis.
CoRR, 2023

Transferable Adversarial Robustness for Categorical Data via Universal Robust Embeddings.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Sharpness-Aware Minimization Leads to Low-Rank Features.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

SGD with Large Step Sizes Learns Sparse Features.
Proceedings of the International Conference on Machine Learning, 2023

A Modern Look at the Relationship between Sharpness and Generalization.
Proceedings of the International Conference on Machine Learning, 2023

2022
On the effectiveness of adversarial training against common corruptions.
Proceedings of the Uncertainty in Artificial Intelligence, 2022

Towards Understanding Sharpness-Aware Minimization.
Proceedings of the International Conference on Machine Learning, 2022

ARIA: Adversarially Robust Image Attribution for Content Provenance.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2022

Sparse-RS: A Versatile Framework for Query-Efficient Sparse Black-Box Adversarial Attacks.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
RobustBench: a standardized adversarial robustness benchmark.
Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, 2021

On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines.
Proceedings of the 9th International Conference on Learning Representations, 2021

2020
RobustBench: a standardized adversarial robustness benchmark.
CoRR, 2020

Understanding and Improving Fast Adversarial Training.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Square Attack: A Query-Efficient Black-Box Adversarial Attack via Random Search.
Proceedings of the Computer Vision - ECCV 2020, 2020

2019
Provably robust boosted decision stumps and trees against adversarial attacks.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Why ReLU Networks Yield High-Confidence Predictions Far Away From the Training Data and How to Mitigate the Problem.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Provable Robustness of ReLU networks via Maximization of Linear Regions.
Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, 2019

2018
Logit Pairing Methods Can Fool Gradient-Based Attacks.
CoRR, 2018

2017
Formal Guarantees on the Robustness of a Classifier against Adversarial Manipulation.
Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017


  Loading...