Jiongxiao Wang

According to our database1, Jiongxiao Wang authored at least 12 papers between 2022 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Mitigating Fine-tuning Jailbreak Attack with Backdoor Enhanced Alignment.
CoRR, 2024

Preference Poisoning Attacks on Reward Model Learning.
CoRR, 2024

2023
Test-time Backdoor Mitigation for Black-Box Large Language Models with Defensive Demonstrations.
CoRR, 2023

On the Exploitability of Reinforcement Learning with Human Feedback for Large Language Models.
CoRR, 2023

ChatGPT-powered Conversational Drug Editing Using Retrieval and Domain Feedback.
CoRR, 2023

Adversarial Demonstration Attacks on Large Language Models.
CoRR, 2023

On the Exploitability of Instruction Tuning.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

A Critical Revisit of Adversarial Robustness in 3D Point Cloud Recognition with Diffusion-Driven Purification.
Proceedings of the International Conference on Machine Learning, 2023

DensePure: Understanding Diffusion Models for Adversarial Robustness.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Defending against Adversarial Audio via Diffusion Model.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

2022
DensePure: Understanding Diffusion Models towards Adversarial Robustness.
CoRR, 2022

Fast and Reliable Evaluation of Adversarial Robustness with Minimum-Margin Attack.
Proceedings of the International Conference on Machine Learning, 2022


  Loading...