Zeming Wei

CoRR, September, 2025

False Sense of Security: Why Probing-based Malicious Input Detection Fails to Generalize.

[BibT_eX]

[DOI]

CoRR, September, 2025

Reliable Unlearning Harmful Information in LLMs with Metamorphosis Representation Projection.

[BibT_eX]

[DOI]

CoRR, August, 2025

Identifying and Understanding Cross-Class Features in Adversarial Training.

[BibT_eX]

[DOI]

Yiwen Guo

Yisen Wang

CoRR, June, 2025

ReGA: Representation-Guided Abstraction for Model-based Safeguarding of LLMs.

[BibT_eX]

[DOI]

Chengcan Wu

CoRR, June, 2025

Understanding Pre-training and Fine-tuning from Loss Landscape Perspectives.

[BibT_eX]

[DOI]

CoRR, May, 2025

Mitigating Fine-tuning Risks in LLMs via Safety-Aware Probing Optimization.

[BibT_eX]

[DOI]

CoRR, May, 2025

Scalable Defense against In-the-wild Jailbreaking Attacks with Safety Context Retrieval.

[BibT_eX]

[DOI]

CoRR, May, 2025

Advancing LLM Safe Alignment with Safety Representation Ranking.

[BibT_eX]

[DOI]

CoRR, May, 2025

3DAffordSplat: Efficient Affordance Reasoning with 3D Gaussians.

[BibT_eX]

[DOI]

CoRR, April, 2025

Towards the Worst-case Robustness of Large Language Models.

[BibT_eX]

[DOI]

CoRR, January, 2025

Robust and Efficient Watermarking of Large Language Models Using Error Correction Codes.

[BibT_eX]

[DOI]

Proc. Priv. Enhancing Technol., 2025

Boosting Jailbreak Attack with Momentum.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Exploring the Robustness of In-Context Learning with Noisy Labels.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

2024

Weighted automata extraction and explanation of recurrent neural networks for natural language tasks.

[BibT_eX]

[DOI]

J. Log. Algebraic Methods Program., January, 2024

Automata Extraction from Transformers.

[BibT_eX]

[DOI]

CoRR, 2024

Towards General Conceptual Model Editing via Adversarial Representation Engineering.

[BibT_eX]

[DOI]

CoRR, 2024

Studious Bob Fight Back Against Jailbreaking via Prompt Adversarial Tuning.

[BibT_eX]

[DOI]

CoRR, 2024

MILE: A Mutation Testing Framework of In-Context Learning Systems.

[BibT_eX]

[DOI]

Proceedings of the Dependable Software Engineering. Theories, Tools, and Applications, 2024

Adversarial Representation Engineering: A General Model Editing Framework for Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Fight Back Against Jailbreaking via Prompt Adversarial Tuning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

A Theoretical Understanding of Self-Correction through In-context Alignment.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

On the Duality Between Sharpness-Aware Minimization and Adversarial Training.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Jatmo: Prompt Injection Defense by Task-Specific Finetuning.

[BibT_eX]

[DOI]

Proceedings of the Computer Security - ESORICS 2024, 2024

2023

Jailbreak and Guard Aligned Language Models with Only Few In-Context Demonstrations.

[BibT_eX]

[DOI]

Yifei Wang

Yisen Wang

CoRR, 2023

On the Relation between Sharpness-Aware Minimization and Adversarial Robustness.

[BibT_eX]

[DOI]

Jingyu Zhu

CoRR, 2023

Using Z3 for Formal Modeling and Verification of FNN Global Robustness.

[BibT_eX]

[DOI]

CoRR, 2023

Using Z3 for Formal Modeling and Verification of FNN Global Robustness (S).

[BibT_eX]

[DOI]

Proceedings of the 35th International Conference on Software Engineering and Knowledge Engineering, 2023

Architecture Matters: Uncovering Implicit Mechanisms in Graph Contrastive Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

CFA: Class-Wise Calibrated Fair Adversarial Training.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Soil Data Storage Framework based on Blockchain and Improved Merkle Mountain Range.

[BibT_eX]

[DOI]

Proceedings of the 2023 7th International Conference on Computer Science and Artificial Intelligence, 2023

2022

Extracting Weighted Finite Automata from Recurrent Neural Networks for Natural Languages.

[BibT_eX]

[DOI]

Xiyue Zhang

Proceedings of the Formal Methods and Software Engineering, 2022

2020

RegiNet: Gradient guided multispectral image registration using convolutional neural networks.

[BibT_eX]

[DOI]