Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

More RLHF, More Trust? On The Impact of Preference Alignment On Trustworthiness.

[BibT_eX]

[DOI]

Aaron Jiaxun Li

Satyapriya Krishna

Himabindu Lakkaraju

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024

The Disagreement Problem in Explainable Machine Learning: A Practitioner's Perspective.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2024

Insights from the Inverse: Reconstructing LLM Training Goals Through Inverse RL.

[BibT_eX]

[DOI]

CoRR, 2024

More RLHF, More Trust? On The Impact of Human Preference Alignment On Language Model Trustworthiness.

[BibT_eX]

[DOI]

Aaron Jiaxun Li

Satyapriya Krishna

Himabindu Lakkaraju

CoRR, 2024

Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence.

[BibT_eX]

[DOI]

CoRR, 2024

Croissant: A Metadata Format for ML-Ready Datasets.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Understanding the Effects of Iterative Prompting on Truthfulness.

[BibT_eX]

[DOI]

Satyapriya Krishna

Chirag Agarwal

Himabindu Lakkaraju

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Black-Box Access is Insufficient for Rigorous AI Audits.

[BibT_eX]

[DOI]

Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency, 2024

On the Trade-offs between Adversarial Robustness and Actionable Explanations.

[BibT_eX]

[DOI]

Satyapriya Krishna

Chirag Agarwal

Himabindu Lakkaraju

Proceedings of the Seventh AAAI/ACM Conference on AI, Ethics, and Society (AIES-24) - Full Archival Papers, October 21-23, 2024, San Jose, California, USA, 2024

2023

Explaining machine learning models with interactive natural language conversations using TalkToModel.

[BibT_eX]

[DOI]

Nat. Mac. Intell., August, 2023

On the Intersection of Self-Correction and Trust in Language Models.

[BibT_eX]

[DOI]

Satyapriya Krishna

CoRR, 2023

Are Large Language Models Post Hoc Explainers?

[BibT_eX]

[DOI]

CoRR, 2023

Post Hoc Explanations of Language Models Can Improve Language Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Towards Bridging the Gaps between the Right to Explanation and the Right to be Forgotten.

[BibT_eX]

[DOI]

Satyapriya Krishna

Jiaqi Ma

Himabindu Lakkaraju

Proceedings of the International Conference on Machine Learning, 2023

2022

TalkToModel: Understanding Machine Learning Models With Open Ended Dialogues.

[BibT_eX]

[DOI]

CoRR, 2022

Rethinking Stability for Attribution-based Explanations.

[BibT_eX]

[DOI]

CoRR, 2022

The Disagreement Problem in Explainable Machine Learning: A Practitioner's Perspective.

[BibT_eX]

[DOI]

CoRR, 2022

OpenXAI: Towards a Transparent Evaluation of Model Explanations.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Measuring Fairness of Text Classifiers via Prediction Sensitivity.

[BibT_eX]

[DOI]

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

Mitigating Gender Bias in Distilled Language Models via Counterfactual Role Reversal.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022, 2022

2021

Grounding Complex Navigational Instructions Using Scene Graphs.

[BibT_eX]

[DOI]

Michiel de Jong

Satyapriya Krishna

Anuva Agarwal

CoRR, 2021

BOLD: Dataset and Metrics for Measuring Biases in Open-Ended Language Generation.

[BibT_eX]

[DOI]

Proceedings of the FAccT '21: 2021 ACM Conference on Fairness, 2021

Towards Realistic Single-Task Continuous Learning Research for NER.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2021, 2021

ADePT: Auto-encoder based Differentially Private Text Transformation.

[BibT_eX]

[DOI]

Satyapriya Krishna

Rahul Gupta

Christophe Dupuy

Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 2021

Does Robustness Improve Fairness? Approaching Fairness with Word Substitution Robustness Methods for Text Classification.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, 2021

2020

Towards classification parity across cohorts.

[BibT_eX]

[DOI]

CoRR, 2020

2019

FineText: Text Classification via Attention-based Language Model Fine-tuning.

[BibT_eX]

[DOI]

CoRR, 2019

Satyapriya Krishna

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...