Wenjie Wang

Orcid: 0000-0002-7559-6211

Affiliations:
  • ShanghaiTech University, China
  • Emory University, USA (Ph.D.)


According to our database1, Wenjie Wang authored at least 18 papers between 2020 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
Beyond Jailbreaks: Revealing Stealthier and Broader LLM Security Risks Stemming from Alignment Failures.
CoRR, June, 2025

DR.GAP: Mitigating Bias in Large Language Models using Gender-Aware Prompting with Demonstration and Reasoning.
CoRR, February, 2025

Auto-Search and Refinement: An Automated Framework for Gender Bias Mitigation in Large Language Models.
CoRR, February, 2025

Adversary-Aware DPO: Enhancing Safety Alignment in Vision Language Models via Adversarial Training.
CoRR, February, 2025

Don't Say No: Jailbreaking LLM by Suppressing Refusal.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

2024
<i>MMJ-Bench</i>: A Comprehensive Study on Jailbreak Attacks and Defenses for Vision Language Models.
CoRR, 2024

Defending Jailbreak Attack in VLMs via Cross-modality Information Detector.
CoRR, 2024

Don't Say No: Jailbreaking LLM by Suppressing Refusal.
CoRR, 2024

Certified PEFTSmoothing: Parameter-Efficient Fine-Tuning with Randomized Smoothing.
CoRR, 2024

LinkPrompt: Natural and Universal Adversarial Attacks on Prompt-based Language Models.
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024

Cross-modality Information Check for Detecting Jailbreaking in Multimodal Large Language Models.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

IGAMT: Privacy-Preserving Electronic Health Record Synthesization with Heterogeneity and Irregularity.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Wasserstein Adversarial Examples on Univariant Time Series Data.
CoRR, 2023

Demo: Certified Robustness on Toolformer.
Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, 2023

2022
Generating Adversarial Examples With Distance Constrained Adversarial Imitation Networks.
IEEE Trans. Dependable Secur. Comput., 2022

2021
Certified Robustness to Word Substitution Attack with Differential Privacy.
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

2020
RADAR: Recurrent Autoencoder Based Detector for Adversarial Examples on Temporal EHR.
Proceedings of the Machine Learning and Knowledge Discovery in Databases: Applied Data Science Track, 2020

Utilizing Multimodal Feature Consistency to Detect Adversarial Examples on Clinical Summaries.
Proceedings of the 3rd Clinical Natural Language Processing Workshop, 2020


  Loading...