Zeming Wei

Orcid: 0000-0002-9867-3929

According to our database1, Zeming Wei authored at least 31 papers between 2020 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
Reliable Unlearning Harmful Information in LLMs with Metamorphosis Representation Projection.
CoRR, August, 2025

Identifying and Understanding Cross-Class Features in Adversarial Training.
CoRR, June, 2025

ReGA: Representation-Guided Abstraction for Model-based Safeguarding of LLMs.
CoRR, June, 2025

Understanding Pre-training and Fine-tuning from Loss Landscape Perspectives.
CoRR, May, 2025

Mitigating Fine-tuning Risks in LLMs via Safety-Aware Probing Optimization.
CoRR, May, 2025

Scalable Defense against In-the-wild Jailbreaking Attacks with Safety Context Retrieval.
CoRR, May, 2025

Advancing LLM Safe Alignment with Safety Representation Ranking.
CoRR, May, 2025

3DAffordSplat: Efficient Affordance Reasoning with 3D Gaussians.
CoRR, April, 2025

Towards the Worst-case Robustness of Large Language Models.
CoRR, January, 2025

Robust and Efficient Watermarking of Large Language Models Using Error Correction Codes.
Proc. Priv. Enhancing Technol., 2025

Boosting Jailbreak Attack with Momentum.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Exploring the Robustness of In-Context Learning with Noisy Labels.
Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

2024
Weighted automata extraction and explanation of recurrent neural networks for natural language tasks.
J. Log. Algebraic Methods Program., January, 2024

Automata Extraction from Transformers.
CoRR, 2024

Towards General Conceptual Model Editing via Adversarial Representation Engineering.
CoRR, 2024

Studious Bob Fight Back Against Jailbreaking via Prompt Adversarial Tuning.
CoRR, 2024

MILE: A Mutation Testing Framework of In-Context Learning Systems.
Proceedings of the Dependable Software Engineering. Theories, Tools, and Applications, 2024

Adversarial Representation Engineering: A General Model Editing Framework for Large Language Models.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Fight Back Against Jailbreaking via Prompt Adversarial Tuning.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

A Theoretical Understanding of Self-Correction through In-context Alignment.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

On the Duality Between Sharpness-Aware Minimization and Adversarial Training.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Jatmo: Prompt Injection Defense by Task-Specific Finetuning.
Proceedings of the Computer Security - ESORICS 2024, 2024

2023
Jailbreak and Guard Aligned Language Models with Only Few In-Context Demonstrations.
CoRR, 2023

On the Relation between Sharpness-Aware Minimization and Adversarial Robustness.
CoRR, 2023

Using Z3 for Formal Modeling and Verification of FNN Global Robustness.
CoRR, 2023

Using Z3 for Formal Modeling and Verification of FNN Global Robustness (S).
Proceedings of the 35th International Conference on Software Engineering and Knowledge Engineering, 2023

Architecture Matters: Uncovering Implicit Mechanisms in Graph Contrastive Learning.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

CFA: Class-Wise Calibrated Fair Adversarial Training.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Soil Data Storage Framework based on Blockchain and Improved Merkle Mountain Range.
Proceedings of the 2023 7th International Conference on Computer Science and Artificial Intelligence, 2023

2022
Extracting Weighted Finite Automata from Recurrent Neural Networks for Natural Languages.
Proceedings of the Formal Methods and Software Engineering, 2022

2020
RegiNet: Gradient guided multispectral image registration using convolutional neural networks.
Neurocomputing, 2020


  Loading...