Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

Language Models Learn to Mislead Humans via RLHF.

[BibT_eX]

[DOI]

Jiaxin Wen

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Adaptive Deployment of Untrusted LLMs Reduces Distributed Threats.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Whose Boat Does it Float? Improving Personalization in Preference Tuning via Inferred User Personas.

[BibT_eX]

[DOI]

Jordan Lee Boyd-Graber

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024

Spontaneous Reward Hacking in Iterative Self-Refinement.

[BibT_eX]

[DOI]

CoRR, 2024

LLM Evaluators Recognize and Favor Their Own Generations.

[BibT_eX]

[DOI]

Arjun Panickssery

Samuel R. Bowman

Shi Feng

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Large Language Models Help Humans Verify Truthfulness - Except When They Are Convincingly Wrong.

[BibT_eX]

[DOI]

Jordan L. Boyd-Graber

Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024

KARL: Knowledge-Aware Retrieval and Representations aid Retention and Learning in Students.

[BibT_eX]

[DOI]

Matthew Shu

Nishant Balepur

Shi Feng

Jordan L. Boyd-Graber

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

A SMART Mnemonic Sounds like "Glue Tonic": Mixing LLMs with Student Feedback to Make Mnemonic Learning Stick.

[BibT_eX]

[DOI]

Nishant Balepur

Matthew Shu

Alexander Miserlis Hoyle

Alison Robey

Shi Feng

Seraphina Goldfarb-Tarrant

Jordan L. Boyd-Graber

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

2023

Learning Human-Compatible Representations for Case-Based Decision Support.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Machine Explanations and Human Understanding.

[BibT_eX]

[DOI]

Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, 2023

Measuring Inductive Biases of In-Context Learning with Underspecified Demonstrations.

[BibT_eX]

[DOI]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

2022

Active Example Selection for In-Context Learning.

[BibT_eX]

[DOI]

Yiming Zhang

Shi Feng

Chenhao Tan

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Learning to Explain Selectively: A Case Study on Question Answering.

[BibT_eX]

[DOI]

Shi Feng

Jordan L. Boyd-Graber

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

2021

Calibrate Before Use: Improving Few-Shot Performance of Language Models.

[BibT_eX]

[DOI]

CoRR, 2021

Concealed Data Poisoning Attacks on NLP Models.

[BibT_eX]

[DOI]

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

Calibrate Before Use: Improving Few-shot Performance of Language Models.

[BibT_eX]

[DOI]

Proceedings of the 38th International Conference on Machine Learning, 2021

2020

Customizing Triggers with Concealed Data Poisoning.

[BibT_eX]

[DOI]

CoRR, 2020

2019

Trick Me If You Can: Human-in-the-loop Generation of Adversarial Question Answering Examples.

[BibT_eX]

[DOI]

Jordan L. Boyd-Graber

Trans. Assoc. Comput. Linguistics, 2019

Universal Adversarial Triggers for NLP.

[BibT_eX]

[DOI]

CoRR, 2019

Quizbowl: The Case for Incremental Question Answering.

[BibT_eX]

[DOI]

Jordan L. Boyd-Graber

CoRR, 2019

What can AI do for me?: evaluating machine learning interpretations in cooperative play.

[BibT_eX]

[DOI]

Shi Feng

Jordan L. Boyd-Graber

Proceedings of the 24th International Conference on Intelligent User Interfaces, 2019

Understanding Impacts of High-Order Loss Approximations and Features in Deep Learning Interpretation.

[BibT_eX]

[DOI]

Proceedings of the 36th International Conference on Machine Learning, 2019

Universal Adversarial Triggers for Attacking and Analyzing NLP.

[BibT_eX]

[DOI]

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019

Misleading Failures of Partial-input Baselines.

[BibT_eX]

[DOI]

Shi Feng

Eric Wallace

Jordan L. Boyd-Graber

Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019

2018

Trick Me If You Can: Adversarial Writing of Trivia Challenge Questions.

[BibT_eX]

[DOI]

Eric Wallace