Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

System 1.x: Learning to Balance Fast and Slow Planning with Language Models.

[BibT_eX]

[DOI]

Swarnadeep Saha

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024

Interpretable and Controllable Language Models.

[BibT_eX]

[DOI]

Peter Hase

PhD thesis, 2024

INSPIRE: Incorporating Diverse Feature Preferences in Recourse.

[BibT_eX]

[DOI]

Prateek Yadav

Peter Hase

Mohit Bansal

Trans. Mach. Learn. Res., 2024

Unlearning Sensitive Information in Multimodal LLMs: Benchmark and Attack-Defense Evaluation.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2024

Fundamental Problems With Model Editing: How Should Rational Belief Revision Work in LLMs?

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2024

Foundational Challenges in Assuring Alignment and Safety of Large Language Models.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2024

Are language models rational? The case of coherence norms and belief revision.

[BibT_eX]

[DOI]

CoRR, 2024

LACIE: Listener-Aware Finetuning for Confidence Calibration in Large Language Models.

[BibT_eX]

[DOI]

Elias Stengel-Eskin

Peter Hase

Mohit Bansal

CoRR, 2024

Foundational Challenges in Assuring Alignment and Safety of Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

Rethinking Machine Unlearning for Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

LACIE: Listener-Aware Finetuning for Calibration in Large Language Models.

[BibT_eX]

[DOI]

Elias Stengel-Eskin

Peter Hase

Mohit Bansal

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Can Sensitive Information Be Deleted From LLMs? Objectives for Defending Against Extraction Attacks.

[BibT_eX]

[DOI]

Vaidehi Patil

Peter Hase

Mohit Bansal

Proceedings of the Twelfth International Conference on Learning Representations, 2024

The Unreasonable Effectiveness of Easy Training Data for Hard Tasks.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023

Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2023

Can Language Models Teach Weaker Agents? Teacher Explanations Improve Students via Theory of Mind.

[BibT_eX]

[DOI]

Swarnadeep Saha

Peter Hase

Mohit Bansal

CoRR, 2023

Adaptive Contextual Perception: How To Generalize To New Backgrounds and Ambiguous Objects.

[BibT_eX]

[DOI]

Zhuofan Ying

Peter Hase

Mohit Bansal

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Can Language Models Teach? Teacher Explanations Improve Student Performance via Personalization.

[BibT_eX]

[DOI]

Swarnadeep Saha

Peter Hase

Mohit Bansal

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Summarization Programs: Interpretable Abstractive Summarization with Neural Modular Trees.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

GrIPS: Gradient-free, Edit-based Instruction Search for Prompting Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 2023

Methods for Measuring, Updating, and Visualizing Factual Beliefs in Language Models.

[BibT_eX]

[DOI]

Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 2023

2022

VisFIS: Visual Feature Importance Supervision with Right-for-the-Right-Reason Objectives.

[BibT_eX]

[DOI]

Zhuofan Ying

Peter Hase

Mohit Bansal

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Are Hard Examples also Harder to Explain? A Study with Human and Model-Generated Explanations.

[BibT_eX]

[DOI]

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

2021

Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs.

[BibT_eX]

[DOI]

CoRR, 2021

Low-Cost Algorithmic Recourse for Users With Uncertain Cost Functions.

[BibT_eX]

[DOI]

Prateek Yadav

Peter Hase

Mohit Bansal

CoRR, 2021

Search Methods for Sufficient, Socially-Aligned Feature Importance Explanations with In-Distribution Counterfactuals.

[BibT_eX]

[DOI]

Peter Hase

Harry Xie

Mohit Bansal

CoRR, 2021

When Can Models Learn From Explanations? A Formal Framework for Understanding the Roles of Explanation Data.

[BibT_eX]

[DOI]

Peter Hase

Mohit Bansal

CoRR, 2021

The Out-of-Distribution Problem in Explainability and Search Methods for Feature Importance Explanations.

[BibT_eX]

[DOI]

Peter Hase

Harry Xie

Mohit Bansal

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

FastIF: Scalable Influence Functions for Efficient Model Interpretation and Debugging.

[BibT_eX]

[DOI]

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

2020

Leakage-Adjusted Simulatability: Can Models Generate Non-Trivial Explanations of Their Behavior in Natural Language?

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020, 2020

Evaluating Explainable AI: Which Algorithmic Explanations Help Users Predict Model Behavior?

[BibT_eX]

[DOI]

Peter Hase

Mohit Bansal

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

2019

Interpretable Image Recognition with Hierarchical Prototypes.

[BibT_eX]

[DOI]

Proceedings of the Seventh AAAI Conference on Human Computation and Crowdsourcing, 2019

2018

Shall I Compare Thee to a Machine-Written Sonnet? An Approach to Algorithmic Sonnet Generation.

[BibT_eX]

[DOI]

CoRR, 2018

1997

An User Adaptive Navigation Metaphor to Connect and Rate the Coherence of Terms and Complex Objects.

[BibT_eX]

[DOI]

Proceedings of the Hypertext 97, 1997

Peter Hase

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...