Xuandong Zhao

Wei Wu

Clust. Comput., November, 2025

Position: LLM Watermarking Should Align Stakeholders' Incentives for Practical Adoption.

[BibT_eX]

[DOI]

CoRR, October, 2025

DTGen: Generative Diffusion-Based Few-Shot Data Augmentation for Fine-Grained Dirty Tableware Recognition.

[BibT_eX]

[DOI]

CoRR, September, 2025

PromptArmor: Simple yet Effective Prompt Injection Defenses.

[BibT_eX]

[DOI]

CoRR, July, 2025

Machine Bullshit: Characterizing the Emergent Disregard for Truth in Large Language Models.

[BibT_eX]

[DOI]

Jaime Fernández Fisac

CoRR, July, 2025

The Landscape of Memorization in LLMs: Mechanisms, Measurement, and Mitigation.

[BibT_eX]

[DOI]

CoRR, July, 2025

AgentSynth: Scalable Task Generation for Generalist Computer-Use Agents.

[BibT_eX]

[DOI]

CoRR, June, 2025

OVERT: A Benchmark for Over-Refusal Evaluation on Text-to-Image Models.

[BibT_eX]

[DOI]

CoRR, May, 2025

Learning to Reason without External Rewards.

[BibT_eX]

[DOI]

CoRR, May, 2025

Invisible Tokens, Visible Bills: The Urgent Need to Audit Hidden Operations in Opaque LLM Services.

[BibT_eX]

[DOI]

CoRR, May, 2025

In-Context Watermarks for Large Language Models.

[BibT_eX]

[DOI]

CoRR, May, 2025

SafeKey: Amplifying Aha-Moment Insights for Safety Reasoning.

[BibT_eX]

[DOI]

CoRR, May, 2025

AGENTFUZZER: Generic Black-Box Fuzzing for Indirect Prompt Injection against LLM Agents.

[BibT_eX]

[DOI]

CoRR, May, 2025

Assessing Judging Bias in Large Reasoning Models: An Empirical Study.

[BibT_eX]

[DOI]

CoRR, April, 2025

Are You Getting What You Pay For? Auditing Model Substitution in LLM APIs.

[BibT_eX]

[DOI]

CoRR, April, 2025

MMDT: Decoding the Trustworthiness and Safety of Multimodal Foundation Models.

[BibT_eX]

[DOI]

CoRR, March, 2025

Improving LLM Safety Alignment with Dual-Objective Optimization.

[BibT_eX]

[DOI]

CoRR, March, 2025

Reward Shaping to Mitigate Reward Hacking in RLHF.

[BibT_eX]

[DOI]

CoRR, February, 2025

Scalable Best-of-N Selection for Large Language Models via Self-Certainty.

[BibT_eX]

[DOI]

Zhewei Kang

Dawn Song

CoRR, February, 2025

DIS-CO: Discovering Copyrighted Content in VLMs Training Data.

[BibT_eX]

[DOI]

CoRR, February, 2025

The Hidden Risks of Large Reasoning Models: A Safety Assessment of R1.

[BibT_eX]

[DOI]

CoRR, February, 2025

Dataset Protection via Watermarked Canaries in Retrieval-Augmented LLMs.

[BibT_eX]

[DOI]

CoRR, February, 2025

SoK: Watermarking for AI-Generated Content.

[BibT_eX]

[DOI]

Proceedings of the IEEE Symposium on Security and Privacy, 2025

A Practical Examination of AI-Generated Text Detectors for Large Language Models.

[BibT_eX]

[DOI]

Brian Tufts

Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2025, Albuquerque, New Mexico, USA, April 29, 2025

Multimodal Situational Safety.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Permute-and-Flip: An optimally stable and watermarkable decoder for LLMs.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

MMDT: Decoding the Trustworthiness and Safety of Multimodal Foundation Models.

[BibT_eX]

[DOI]

et al.

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

An Undetectable Watermark for Generative Image Models.

[BibT_eX]

[DOI]

Sam Gunn

Dawn Song

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Efficiently Identifying Watermarked Segments in Mixed-Source Texts.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

CodeHalu: Investigating Code Hallucinations in LLMs via Execution-based Verification.

[BibT_eX]

[DOI]

Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024

Empowering Responsible Use of Large Language Models

[BibT_eX]

[DOI]

PhD thesis, 2024

PrivAgent: Agentic-based Red-teaming for LLM Privacy Leakage.

[BibT_eX]

[DOI]

CoRR, 2024

Efficiently Identifying Watermarked Segments in Mixed-Source Texts.

[BibT_eX]

[DOI]

CoRR, 2024

Evaluating Durability: Benchmark Insights into Multimodal Watermarking.

[BibT_eX]

[DOI]

CoRR, 2024

MarkLLM: An Open-Source Toolkit for LLM Watermarking.

[BibT_eX]

[DOI]

CoRR, 2024

Mapping the Increasing Use of LLMs in Scientific Papers.

[BibT_eX]

[DOI]

Christopher D. Manning

James Y. Zou

CoRR, 2024

Perils of Self-Feedback: Self-Bias Amplifies in Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

Permute-and-Flip: An optimally robust and watermarkable decoder for LLMs.

[BibT_eX]

[DOI]

CoRR, 2024

Weak-to-Strong Jailbreaking on Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

Invisible Image Watermarks Are Provably Removable Using Generative AI.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Bileve: Securing Text Provenance in Large Language Models Against Spoofing with Bi-level Signature.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

DE-COP: Detecting Copyrighted Content in Language Models Training Data.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Provable Robust Watermarking for AI-Generated Text.

[BibT_eX]

[DOI]

Prabhanjan Vijendra Ananth

Proceedings of the Twelfth International Conference on Learning Representations, 2024

A Survey on Detection of LLMs-Generated Content.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

Chatbot and Fatigued Driver: Exploring the Use of LLM-Based Voice Assistants for Driving Fatigue.

[BibT_eX]

[DOI]

Proceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems, 2024

Pride and Prejudice: LLM Amplifies Self-Bias in Self-Refinement.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

GumbelSoft: Diversified Language Model Watermarking via the GumbelMax-trick.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023

Generative Autoencoders as Watermark Attackers: Analyses of Vulnerabilities and Threats.

[BibT_eX]

[DOI]

CoRR, 2023

Private Prediction Strikes Back! Private Kernelized Nearest Neighbors with Individual Rényi Filter.

[BibT_eX]

[DOI]

Proceedings of the Uncertainty in Artificial Intelligence, 2023

Protecting Language Generation Models via Invisible Watermarking.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

Pre-trained Language Models Can be Fully Zero-Shot Learners.

[BibT_eX]

[DOI]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

2022

Provably Confidential Language Modelling.

[BibT_eX]

[DOI]

Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

Distillation-Resistant Watermarking for Model Protection in NLP.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

Compressing Sentence Representation for Semantic Retrieval via Homomorphic Projective Distillation.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022, 2022

2021

An Optimal Reduction of TV-Denoising to Adaptive Online Learning.

[BibT_eX]

[DOI]

Dheeraj Baby