Zimo Qi

According to our database1, Zimo Qi authored at least 5 papers between 2024 and 2026.

Collaborative distances:
  • Dijkstra number2 of five.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
Deactivating Refusal Triggers: Understanding and Mitigating Overrefusal in Safety Alignment.
CoRR, March, 2026

2025
Moral Self-correction is Not An Innate Capability in Language Models.
Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, 2025

Diagnosing Moral Reasoning Acquisition in Language Models: Pragmatics and Generalization.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2025, 2025

Discourse Heuristics For Paradoxically Moral Self-Correction.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2025, 2025

2024
Is Moral Self-correction An Innate Capability of Large Language Models? A Mechanistic Analysis to Self-correction.
CoRR, 2024


  Loading...