Nouha Dziri
According to our database1,
Nouha Dziri
authored at least 48 papers
between 2018 and 2025.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
2025
OpenAgentSafety: A Comprehensive Framework for Evaluating Real-World AI Agent Safety.
CoRR, July, 2025
A Different Approach to AI Safety: Proceedings from the Columbia Convening on Openness in Artificial Intelligence and AI Safety.
CoRR, June, 2025
OMEGA: Can LLMs Reason Outside the Box in Math? Evaluating Exploratory, Compositional, and Transformative Generalization.
CoRR, June, 2025
Surfacing Semantic Orthogonality Across Model Safety Benchmarks: A Multi-Dimensional Analysis.
CoRR, May, 2025
CoRR, April, 2025
On the Trustworthiness of Generative Foundation Models: Guideline, Assessment, and Perspective.
CoRR, February, 2025
Trans. Mach. Learn. Res., 2025
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2025, Albuquerque, New Mexico, USA, April 29, 2025
Steering Masked Discrete Diffusion Models via Discrete Denoising Posterior Prediction.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
AI as Humanity's Salieri: Quantifying Linguistic Creativity of Language Models via Systematic Attribution of Machine Text against Web Text.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025
2024
CoRR, 2024
Steering Masked Discrete Diffusion Models via Discrete Denoising Posterior Prediction.
CoRR, 2024
CoRR, 2024
CULTURE-GEN: Revealing Global Cultural Perception in Language Models through Natural Language Prompting.
CoRR, 2024
WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
WildGuard: Open One-stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024
Proceedings of the Forty-first International Conference on Machine Learning, 2024
Proceedings of the Twelfth International Conference on Learning Representations, 2024
Phenomenal Yet Puzzling: Testing Inductive Reasoning Capabilities of Language Models with Hypothesis Refinement.
Proceedings of the Twelfth International Conference on Learning Representations, 2024
Proceedings of the Twelfth International Conference on Learning Representations, 2024
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024
2023
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023
What Makes it Ok to Set a Fire? Iterative Self-distillation of Contexts and Rationales for Disambiguating Defeasible Social and Moral Situations.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023
Inference-Time Policy Adapters (IPA): Tailoring Extreme-Scale LMs without Fine-tuning.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023
2022
Trans. Assoc. Comput. Linguistics, 2022
Trans. Assoc. Comput. Linguistics, 2022
On the Origin of Hallucinations in Conversational Models: Is it the Datasets or the Models?
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022
2021
Proceedings of the 38th International Conference on Machine Learning, 2021
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021
2019
Proceedings of the 2019 Workshop on Widening NLP@ACL 2019, Florence, Italy, July 28, 2019, 2019
2018
CoRR, 2018
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018