Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

RePD: Defending Jailbreak Attack through a Retrieval-based Prompt Decomposition Process.

[BibT_eX]

[DOI]

Peiran Wang

Xiaogeng Liu

Chaowei Xiao

Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2025, Albuquerque, New Mexico, USA, April 29, 2025

Can Watermarks be Used to Detect LLM IP Infringement For Free?

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

AGrail: A Lifelong Agent Guardrail with Effective and Adaptive Safety Detection.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

PIGuard: Prompt Injection Guardrail via Mitigating Overdefense for Free.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024

InjecGuard: Benchmarking and Mitigating Over-defense in Prompt Injection Guardrail Models.

[BibT_eX]

[DOI]

Hao Li

Xiaogeng Liu

Chaowei Xiao

CoRR, 2024

Visual-RolePlay: Universal Jailbreak Attack on MultiModal Large Language Models via Role-playing Image Characte.

[BibT_eX]

[DOI]

CoRR, 2024

JailBreakV-28K: A Benchmark for Assessing the Robustness of MultiModal Large Language Models against Jailbreak Attacks.

[BibT_eX]

[DOI]

CoRR, 2024

Automatic and Universal Prompt Injection Attacks against Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

Don't Listen To Me: Understanding and Exploring Jailbreak Prompts of Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the 33rd USENIX Security Symposium, 2024

Why Does Little Robustness Help? A Further Step Towards Understanding Adversarial Transferability.

[BibT_eX]

[DOI]

Proceedings of the IEEE Symposium on Security and Privacy, 2024

AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

AdaShield : Safeguarding Multimodal Large Language Models from Structure-Based Attack via Adaptive Shield Prompting.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

2023

DeceptPrompt: Exploiting LLM-driven Code Generation via Adversarial Natural Language Instructions.

[BibT_eX]

[DOI]

Fangzhou Wu

Xiaogeng Liu

Chaowei Xiao

CoRR, 2023

Why Does Little Robustness Help? Understanding Adversarial Transferability From Surrogate Training.

[BibT_eX]

[DOI]

CoRR, 2023

PointCRT: Detecting Backdoor in 3D Point Cloud via Corruption Robustness.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

Detecting Backdoors During the Inference Stage Based on Corruption Robustness Consistency.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

Towards Efficient Data-Centric Robust Machine Learning with Noise-based Augmentation.

[BibT_eX]

[DOI]

CoRR, 2022

Protecting Facial Privacy: Generating Adversarial Identity Masks via Style-robust Makeup Transfer.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021

AdvHash: Set-to-set Targeted Attack on Deep Hashing with One Single Adversarial Patch.

[BibT_eX]

[DOI]

Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Xiaogeng Liu

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...