Kellin Pelrine

CoRR, February, 2026

TamperBench: Systematically Stress-Testing LLM Safety Under Fine-Tuning and Tampering.

[BibT_eX]

[DOI]

CoRR, February, 2026

Large language models can effectively convince people to believe conspiracies.

[BibT_eX]

[DOI]

Adam Gleave

David Rand

Gordon Pennycook

CoRR, January, 2026

Open Technical Problems in Open-Weight AI Model Risk Management.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2026

2025

Emergent Persuasion: Will LLMs Persuade Without Being Prompted?

[BibT_eX]

[DOI]

CoRR, December, 2025

BluePrint: A Social Media User Dataset for LLM Persona Evaluation and Training.

[BibT_eX]

[DOI]

Aurélien Bück-Kaeffer

Je Qin Chooi

Dan Zhao

Zachary Yang

CoRR, October, 2025

CrediBench: Building Web-Scale Network Datasets for Information Integrity.

[BibT_eX]

[DOI]

Michael M. Bronstein

Shenyang Huang

CoRR, September, 2025

Jailbreak-Tuning: Models Efficiently Learn Jailbreak Susceptibility.

[BibT_eX]

[DOI]

Brendan Murphy

Dillon Bowen

Shahrad Mohammadzadeh

Julius Broomfield

Adam Gleave

CoRR, July, 2025

It's the Thought that Counts: Evaluating the Attempts of Frontier LLMs to Persuade on Harmful Topics.

[BibT_eX]

[DOI]

Matthew Kowal

Jasper Timm

CoRR, June, 2025

Accidental Misalignment: Fine-Tuning Language Models Induces Unexpected Vulnerability.

[BibT_eX]

[DOI]

CoRR, May, 2025

From Intuition to Understanding: Using AI Peers to Overcome Physics Misconceptions.

[BibT_eX]

[DOI]

CoRR, April, 2025

Online Influence Campaigns: Strategies and Vulnerabilities.

[BibT_eX]

[DOI]

Gabrielle Péloquin-Skulski

CoRR, January, 2025

A Guide to Misinformation Detection Data and Evaluation.

[BibT_eX]

[DOI]

Camille Thibault

Jacob-Junqi Tian

Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, V.2, 2025

SandboxSocial: A Sandbox for Social Media Using Multimodal AI Agents.

[BibT_eX]

[DOI]

Dan Zhao

Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence, 2025

Veracity: An Open-Source AI Fact-Checking System.

[BibT_eX]

[DOI]

Taylor Lynn Curtis

Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence, 2025

Jailbreak-Tuning: Models Efficiently Learn Jailbreak Susceptibility.

[BibT_eX]

[DOI]

Brendan Murphy

Dillon Bowen

Shahrad Mohammadzadeh

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

Towards Accessible Information Retrieval for Children With a Mild Intellectual Disability.

[BibT_eX]

[DOI]

Proceedings of the Advances in Bias, Fairness, and Understudied Users in Information Retrieval, 2025

The Structural Safety Generalization Problem.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2025

Can Go AIs Be Adversarially Robust?

[BibT_eX]

[DOI]

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

Scaling Trends for Data Poisoning in LLMs.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024

Epistemic Integrity in Large Language Models.

[BibT_eX]

[DOI]

Bijean Ghafouri

Shahrad Mohammadzadeh

Gabrielle Péloquin-Skulski

CoRR, 2024

A Guide to Misinformation Detection Datasets.

[BibT_eX]

[DOI]

Camille Thibault

CoRR, 2024

A Simulation System Towards Solving Societal-Scale Manipulation.

[BibT_eX]

[DOI]

CoRR, 2024

Emerging Vulnerabilities in Frontier Models: Multi-Turn Jailbreak Attacks.

[BibT_eX]

[DOI]

CoRR, 2024

Web Retrieval Agents for Evidence-Based Misinformation Detection.

[BibT_eX]

[DOI]

CoRR, 2024

Scaling Laws for Data Poisoning in LLMs.

[BibT_eX]

[DOI]

CoRR, 2024

Regional and Temporal Patterns of Partisan Polarization during the COVID-19 Pandemic in the United States and Canada.

[BibT_eX]

[DOI]

Zachary Yang

Anne Imouza

Gabrielle Desrosiers-Brisebois

Cécile Amadoro

Sacha Levy

CoRR, 2024

Combining Confidence Elicitation and Sample-based Methods for Uncertainty Quantification in Misinformation Mitigation.

[BibT_eX]

[DOI]

Mauricio Rivera

CoRR, 2024

Comparing GPT-4 and Open-Source Language Models in Misinformation Mitigation.

[BibT_eX]

[DOI]

Tyler Vergho

CoRR, 2024

Uncertainty Resolution in Misinformation Detection.

[BibT_eX]

[DOI]

Yury Orlovskiy

Camille Thibault

Anne Imouza

Gabrielle Desrosiers-Brisebois

CoRR, 2024

Party Prediction for Twitter.

[BibT_eX]

[DOI]

Aarash Feizi

Cécile Amadoro

André Blais

Proceedings of the Eighteenth International AAAI Conference on Web and Social Media, 2024

2023

Exploiting Novel GPT-4 APIs.

[BibT_eX]

[DOI]

CoRR, 2023

Open, Closed, or Small Language Models for Text Classification?

[BibT_eX]

[DOI]

Hao Yu

Zachary Yang

CoRR, 2023

Adversarial Policies Beat Superhuman Go AIs.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

Towards Reliable Misinformation Mitigation: Generalization, Uncertainty, and GPT-4.

[BibT_eX]

[DOI]

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

SWEET - Weakly Supervised Person Name Extraction for Fighting Human Trafficking.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Better Bridges Between Model and Real World.

[BibT_eX]

[DOI]

Proceedings of the 36th Canadian Conference on Artificial Intelligence, 2023

2022

Towards Better Evaluation for Dynamic Link Prediction.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Active Keyword Selection to Track Evolving Topics on Twitter.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Data Mining Workshops, 2022

Extracting Person Names from User Generated Text: Named-Entity Recognition for Combating Human Trafficking.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022, 2022

2021

The Surprising Performance of Simple Baselines for Misinformation Detection.

[BibT_eX]

[DOI]

Jacob Danovitch

Gabrielle Desrosiers-Brisebois

Proceedings of the WWW '21: The Web Conference 2021, 2021

Online Partisan Polarization of COVID-19.

[BibT_eX]

[DOI]

André Blais

Proceedings of the 2021 International Conference on Data Mining, 2021

2020

ComplexDataLab at W-NUT 2020 Task 2: Detecting Informative COVID-19 Tweets by Attending over Linked Documents.

[BibT_eX]

[DOI]

Jacob Danovitch

Albert Orozco Camacho