Nouha Dziri

According to our database1, Nouha Dziri authored at least 48 papers between 2018 and 2025.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
OpenAgentSafety: A Comprehensive Framework for Evaluating Real-World AI Agent Safety.
CoRR, July, 2025

A Different Approach to AI Safety: Proceedings from the Columbia Convening on Openness in Artificial Intelligence and AI Safety.
CoRR, June, 2025

The Singapore Consensus on Global AI Safety Research Priorities.
CoRR, June, 2025

OMEGA: Can LLMs Reason Outside the Box in Math? Evaluating Exploratory, Compositional, and Transformative Generalization.
CoRR, June, 2025

Surfacing Semantic Orthogonality Across Model Safety Benchmarks: A Multi-Dimensional Analysis.
CoRR, May, 2025

Climbing the Ladder of Reasoning: What LLMs Can-and Still Can't-Solve after SFT?
CoRR, April, 2025

On the Trustworthiness of Generative Foundation Models: Guideline, Assessment, and Perspective.
CoRR, February, 2025

2 OLMo 2 Furious.
CoRR, January, 2025

Multi-Attribute Constraint Satisfaction via Language Model Rewriting.
Trans. Mach. Learn. Res., 2025

REL-A.I.: An Interaction-Centered Approach To Measuring Human-LM Reliance.
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

RewardBench: Evaluating Reward Models for Language Modeling.
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2025, Albuquerque, New Mexico, USA, April 29, 2025

Steering Masked Discrete Diffusion Models via Discrete Denoising Posterior Prediction.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

AI as Humanity's Salieri: Quantifying Linguistic Creativity of Language Models via Systematic Attribution of Machine Text against Web Text.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

To Err Is AI: A Case Study Informing LLM Flaw Reporting Practices.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024
TÜLU 3: Pushing Frontiers in Open Language Model Post-Training.
CoRR, 2024

SafetyAnalyst: Interpretable, transparent, and steerable LLM safety moderation.
CoRR, 2024

Steering Masked Discrete Diffusion Models via Discrete Denoising Posterior Prediction.
CoRR, 2024

WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild.
CoRR, 2024

CULTURE-GEN: Revealing Global Cultural Perception in Language Models through Natural Language Prompting.
CoRR, 2024

RewardBench: Evaluating Reward Models for Language Modeling.
CoRR, 2024

A Roadmap to Pluralistic Alignment.
CoRR, 2024

WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

WildGuard: Open One-stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

The Art of Saying No: Contextual Noncompliance in Language Models.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Elastic Weight Removal for Faithful and Abstractive Dialogue Generation.
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024

Position: A Roadmap to Pluralistic Alignment.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

The Generative AI Paradox: "What It Can Create, It May Not Understand".
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Phenomenal Yet Puzzling: Testing Inductive Reasoning Capabilities of Language Models with Hypothesis Refinement.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights, and Duties.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Self-Refine: Iterative Refinement with Self-Feedback.
CoRR, 2023

Fine-Grained Human Feedback Gives Better Rewards for Language Model Training.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Self-Refine: Iterative Refinement with Self-Feedback.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Faith and Fate: Limits of Transformers on Compositionality.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Champagne: Learning Real-world Conversation from Large-Scale Web Videos.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

What Makes it Ok to Set a Fire? Iterative Self-distillation of Contexts and Rationales for Disambiguating Defeasible Social and Moral Situations.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Inference-Time Policy Adapters (IPA): Tailoring Extreme-Scale LMs without Fine-tuning.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Evaluating Open-Domain Question Answering in the Era of Large Language Models.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

2022
Evaluating Attribution in Dialogue Systems: The BEGIN Benchmark.
Trans. Assoc. Comput. Linguistics, 2022

FaithDial: A Faithful Benchmark for Information-Seeking Dialogue.
Trans. Assoc. Comput. Linguistics, 2022

On the Origin of Hallucinations in Conversational Models: Is it the Datasets or the Models?
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

2021
Evaluating Groundedness in Dialogue Systems: The BEGIN Benchmark.
CoRR, 2021

Decomposed Mutual Information Estimation for Contrastive Representation Learning.
Proceedings of the 38th International Conference on Machine Learning, 2021

Neural Path Hunter: Reducing Hallucination in Dialogue Systems via Path Grounding.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

2019
Evaluating Coherence in Dialogue Systems using Entailment.
Proceedings of the 2019 Workshop on Widening NLP@ACL 2019, Florence, Italy, July 28, 2019, 2019

2018
Augmenting Neural Response Generation with Context-Aware Topical Attention.
CoRR, 2018

Automatic Dialogue Generation with Expressed Emotions.
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018


  Loading...