Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

Programming Refusal with Conditional Activation Steering.

[BibT_eX]

[DOI]

Bruce W. Lee

Inkit Padhi

Karthikeyan Natesan Ramamurthy

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Contextual Value Alignment.

[BibT_eX]

[DOI]

Proceedings of the 2025 IEEE International Conference on Acoustics, 2025

Multi-Level Explanations for Generative Language Models.

[BibT_eX]

[DOI]

Karthikeyan Natesan Ramamurthy

Prasanna Sattigeri

Werner Geyer

Soumya Ghosh

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024

Alignment Studio: Aligning Large Language Models to Particular Contextual Regulations.

[BibT_eX]

[DOI]

Aleksandra Mojsilovic

Manish Nagireddy

Karthikeyan Natesan Ramamurthy

Rosario A. Uceda-Sosa

Kush R. Varshney

IEEE Internet Comput., 2024

Granite Guardian.

[BibT_eX]

[DOI]

CoRR, 2024

When in Doubt, Cascade: Towards Building Efficient and Capable Guardrails.

[BibT_eX]

[DOI]

CoRR, 2024

Contextual Moral Value Alignment Through Context-Based Aggregation.

[BibT_eX]

[DOI]

CoRR, 2024

Detectors for Safe and Reliable LLMs: Implementations, Uses, and Limitations.

[BibT_eX]

[DOI]

CoRR, 2024

DARE to Diversify: DAta Driven and Diverse LLM REd Teaming.

[BibT_eX]

[DOI]

Manish Nagireddy

Bernat Guillen Pegueroles

Ioana Baldini

Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2024

ComVas: Contextual Moral Values Alignment System.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024

Value Alignment from Unstructured Text.

[BibT_eX]

[DOI]

Inkit Padhi

Karthikeyan Natesan Ramamurthy

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: EMNLP 2024, 2024

Language Models in Dialogue: Conversational Maxims for Human-AI Interactions.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

SocialStigmaQA: A Benchmark to Uncover Stigma Amplification in Generative Language Models.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

Function Composition in Trustworthy Machine Learning: Implementation Choices, Insights, and Questions.

[BibT_eX]

[DOI]

Karthikeyan Natesan Ramamurthy

Kush R. Varshney

CoRR, 2023

2022

A Sandbox Tool to Bias(Stress)-Test Fairness Algorithms.

[BibT_eX]

[DOI]

CoRR, 2022

Exploring How Machine Learning Practitioners (Try To) Use Fairness Toolkits.

[BibT_eX]

[DOI]

Proceedings of the FAccT '22: 2022 ACM Conference on Fairness, Accountability, and Transparency, Seoul, Republic of Korea, June 21, 2022

Manish Nagireddy

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...