Nouha Dziri

Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2025, Albuquerque, New Mexico, USA, April 29, 2025

Steering Masked Discrete Diffusion Models via Discrete Denoising Posterior Prediction.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

AI as Humanity's Salieri: Quantifying Linguistic Creativity of Language Models via Systematic Attribution of Machine Text against Web Text.

[BibT_eX]

[DOI]

Ximing Lu

Melanie Sclar

Skyler Hallinan

Nouha Dziri

Yejin Choi

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild.

[BibT_eX]

[DOI]

Yuntian Deng

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

To Err Is AI: A Case Study Informing LLM Flaw Reporting Practices.

[BibT_eX]

[DOI]

Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024

TÜLU 3: Pushing Frontiers in Open Language Model Post-Training.

[BibT_eX]

[DOI]

CoRR, 2024

SafetyAnalyst: Interpretable, transparent, and steerable LLM safety moderation.

[BibT_eX]

[DOI]

CoRR, 2024

Steering Masked Discrete Diffusion Models via Discrete Denoising Posterior Prediction.

[BibT_eX]

[DOI]

CoRR, 2024

WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild.

[BibT_eX]

[DOI]

Yuntian Deng

Faeze Brahman

CoRR, 2024

CULTURE-GEN: Revealing Global Cultural Perception in Language Models through Natural Language Prompting.

[BibT_eX]

[DOI]

CoRR, 2024

RewardBench: Evaluating Reward Models for Language Modeling.

[BibT_eX]

[DOI]

CoRR, 2024

A Roadmap to Pluralistic Alignment.

[BibT_eX]

[DOI]

Christopher Michael Rytting

CoRR, 2024

WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

WildGuard: Open One-stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

The Art of Saying No: Contextual Noncompliance in Language Models.

[BibT_eX]

[DOI]

Faeze Brahman

Sachin Kumar

Vidhisha Balachandran

Pradeep Dasigi

Valentina Pyatkin

Sarah Wiegreffe

Nouha Dziri

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Elastic Weight Removal for Faithful and Abstractive Dialogue Generation.

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024

Position: A Roadmap to Pluralistic Alignment.

[BibT_eX]

[DOI]

Christopher Michael Rytting

Proceedings of the Forty-first International Conference on Machine Learning, 2024

The Generative AI Paradox: "What It Can Create, It May Not Understand".

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Phenomenal Yet Puzzling: Testing Inductive Reasoning Capabilities of Language Models with Hypothesis Refinement.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning.

[BibT_eX]

[DOI]

Ximing Lu

Nouha Dziri

Melanie Sclar

Bodhisattwa Prasad Majumder

Chandra Bhagavatula

Yejin Choi

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights, and Duties.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

Self-Refine: Iterative Refinement with Self-Feedback.

[BibT_eX]

[DOI]

Shashank Gupta

Amir Yazdanbakhsh

Peter Clark

CoRR, 2023

Fine-Grained Human Feedback Gives Better Rewards for Language Model Training.

[BibT_eX]

[DOI]

Prithviraj Ammanabrolu

Noah A. Smith

Mari Ostendorf

Hannaneh Hajishirzi

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Self-Refine: Iterative Refinement with Self-Feedback.

[BibT_eX]

[DOI]

Bodhisattwa Prasad Majumder

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Faith and Fate: Limits of Transformers on Compositionality.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Champagne: Learning Real-world Conversation from Large-Scale Web Videos.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

What Makes it Ok to Set a Fire? Iterative Self-distillation of Contexts and Rationales for Disambiguating Defeasible Social and Moral Situations.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Inference-Time Policy Adapters (IPA): Tailoring Extreme-Scale LMs without Fine-tuning.

[BibT_eX]

[DOI]