Jerry Wei

According to our database1, Jerry Wei authored at least 10 papers between 2020 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
Segment-Level Coherence for Robust Harmful Intent Probing in LLMs.
CoRR, April, 2026

Trojan-Speak: Bypassing Constitutional Classifiers with No Jailbreak Tax via Adversarial Finetuning.
CoRR, March, 2026

Constitutional Classifiers++: Efficient Production-Grade Defenses against Universal Jailbreaks.
CoRR, January, 2026

2025
Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming.
CoRR, January, 2025

2024
Best Practices and Lessons Learned on Synthetic Data for Language Models.
CoRR, 2024

Long-form factuality in large language models.
CoRR, 2024

Non-robustness of diffusion estimates on networks with measurement error.
CoRR, 2024

Long-form factuality in large language models.
Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

2021
Adapting Security Warnings to Counter Online Disinformation.
Proceedings of the 30th USENIX Security Symposium, 2021

2020
NewB: 200, 000+ Sentences for Political Bias Detection.
CoRR, 2020


  Loading...