Kellin Pelrine

Orcid: 0009-0003-4671-5554

According to our database1, Kellin Pelrine authored at least 44 papers between 2020 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
Concept Influence: Leveraging Interpretability to Improve Performance and Efficiency in Training Data Attribution.
CoRR, February, 2026

Exposing the Systematic Vulnerability of Open-Weight Models to Prefill Attacks.
CoRR, February, 2026

TamperBench: Systematically Stress-Testing LLM Safety Under Fine-Tuning and Tampering.
CoRR, February, 2026

Large language models can effectively convince people to believe conspiracies.
CoRR, January, 2026

Open Technical Problems in Open-Weight AI Model Risk Management.
Trans. Mach. Learn. Res., 2026

2025
Emergent Persuasion: Will LLMs Persuade Without Being Prompted?
CoRR, December, 2025

BluePrint: A Social Media User Dataset for LLM Persona Evaluation and Training.
CoRR, October, 2025

CrediBench: Building Web-Scale Network Datasets for Information Integrity.
CoRR, September, 2025

Jailbreak-Tuning: Models Efficiently Learn Jailbreak Susceptibility.
CoRR, July, 2025

It's the Thought that Counts: Evaluating the Attempts of Frontier LLMs to Persuade on Harmful Topics.
CoRR, June, 2025

Accidental Misalignment: Fine-Tuning Language Models Induces Unexpected Vulnerability.
CoRR, May, 2025

From Intuition to Understanding: Using AI Peers to Overcome Physics Misconceptions.
CoRR, April, 2025

Online Influence Campaigns: Strategies and Vulnerabilities.
CoRR, January, 2025

A Guide to Misinformation Detection Data and Evaluation.
Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, V.2, 2025

SandboxSocial: A Sandbox for Social Media Using Multimodal AI Agents.
Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence, 2025

Veracity: An Open-Source AI Fact-Checking System.
Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence, 2025

Jailbreak-Tuning: Models Efficiently Learn Jailbreak Susceptibility.
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

Towards Accessible Information Retrieval for Children With a Mild Intellectual Disability.
Proceedings of the Advances in Bias, Fairness, and Understudied Users in Information Retrieval, 2025

The Structural Safety Generalization Problem.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

Can Go AIs Be Adversarially Robust?
Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

Scaling Trends for Data Poisoning in LLMs.
Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024
Epistemic Integrity in Large Language Models.
CoRR, 2024

A Guide to Misinformation Detection Datasets.
CoRR, 2024

A Simulation System Towards Solving Societal-Scale Manipulation.
CoRR, 2024

Emerging Vulnerabilities in Frontier Models: Multi-Turn Jailbreak Attacks.
CoRR, 2024

Web Retrieval Agents for Evidence-Based Misinformation Detection.
CoRR, 2024

Scaling Laws for Data Poisoning in LLMs.
CoRR, 2024

Regional and Temporal Patterns of Partisan Polarization during the COVID-19 Pandemic in the United States and Canada.
CoRR, 2024

Combining Confidence Elicitation and Sample-based Methods for Uncertainty Quantification in Misinformation Mitigation.
CoRR, 2024

Comparing GPT-4 and Open-Source Language Models in Misinformation Mitigation.
CoRR, 2024

Uncertainty Resolution in Misinformation Detection.
CoRR, 2024

Party Prediction for Twitter.
Proceedings of the Eighteenth International AAAI Conference on Web and Social Media, 2024

2023
Exploiting Novel GPT-4 APIs.
CoRR, 2023

Open, Closed, or Small Language Models for Text Classification?
CoRR, 2023

Adversarial Policies Beat Superhuman Go AIs.
Proceedings of the International Conference on Machine Learning, 2023

Towards Reliable Misinformation Mitigation: Generalization, Uncertainty, and GPT-4.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

SWEET - Weakly Supervised Person Name Extraction for Fighting Human Trafficking.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Better Bridges Between Model and Real World.
Proceedings of the 36th Canadian Conference on Artificial Intelligence, 2023

2022
Towards Better Evaluation for Dynamic Link Prediction.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Active Keyword Selection to Track Evolving Topics on Twitter.
Proceedings of the IEEE International Conference on Data Mining Workshops, 2022

Extracting Person Names from User Generated Text: Named-Entity Recognition for Combating Human Trafficking.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022, 2022

2021
The Surprising Performance of Simple Baselines for Misinformation Detection.
Proceedings of the WWW '21: The Web Conference 2021, 2021

Online Partisan Polarization of COVID-19.
Proceedings of the 2021 International Conference on Data Mining, 2021

2020
ComplexDataLab at W-NUT 2020 Task 2: Detecting Informative COVID-19 Tweets by Attending over Linked Documents.
Proceedings of the Sixth Workshop on Noisy User-generated Text, 2020


  Loading...