Xuandong Zhao

Orcid: 0009-0008-7888-2783

According to our database1, Xuandong Zhao authored at least 56 papers between 2019 and 2025.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
PromptArmor: Simple yet Effective Prompt Injection Defenses.
CoRR, July, 2025

Machine Bullshit: Characterizing the Emergent Disregard for Truth in Large Language Models.
CoRR, July, 2025

The Landscape of Memorization in LLMs: Mechanisms, Measurement, and Mitigation.
CoRR, July, 2025

AgentSynth: Scalable Task Generation for Generalist Computer-Use Agents.
CoRR, June, 2025

OVERT: A Benchmark for Over-Refusal Evaluation on Text-to-Image Models.
CoRR, May, 2025

Learning to Reason without External Rewards.
CoRR, May, 2025

Invisible Tokens, Visible Bills: The Urgent Need to Audit Hidden Operations in Opaque LLM Services.
CoRR, May, 2025

In-Context Watermarks for Large Language Models.
CoRR, May, 2025

SafeKey: Amplifying Aha-Moment Insights for Safety Reasoning.
CoRR, May, 2025

AGENTFUZZER: Generic Black-Box Fuzzing for Indirect Prompt Injection against LLM Agents.
CoRR, May, 2025

Assessing Judging Bias in Large Reasoning Models: An Empirical Study.
CoRR, April, 2025

Are You Getting What You Pay For? Auditing Model Substitution in LLM APIs.
CoRR, April, 2025

MMDT: Decoding the Trustworthiness and Safety of Multimodal Foundation Models.
CoRR, March, 2025

Improving LLM Safety Alignment with Dual-Objective Optimization.
CoRR, March, 2025

Reward Shaping to Mitigate Reward Hacking in RLHF.
CoRR, February, 2025

Scalable Best-of-N Selection for Large Language Models via Self-Certainty.
CoRR, February, 2025

DIS-CO: Discovering Copyrighted Content in VLMs Training Data.
CoRR, February, 2025

The Hidden Risks of Large Reasoning Models: A Safety Assessment of R1.
CoRR, February, 2025

Dataset Protection via Watermarked Canaries in Retrieval-Augmented LLMs.
CoRR, February, 2025

SoK: Watermarking for AI-Generated Content.
Proceedings of the IEEE Symposium on Security and Privacy, 2025

A Practical Examination of AI-Generated Text Detectors for Large Language Models.
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2025, Albuquerque, New Mexico, USA, April 29, 2025

Multimodal Situational Safety.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Permute-and-Flip: An optimally stable and watermarkable decoder for LLMs.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

MMDT: Decoding the Trustworthiness and Safety of Multimodal Foundation Models.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Efficiently Identifying Watermarked Segments in Mixed-Source Texts.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

CodeHalu: Investigating Code Hallucinations in LLMs via Execution-based Verification.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024
Empowering Responsible Use of Large Language Models
PhD thesis, 2024

An undetectable watermark for generative image models.
IACR Cryptol. ePrint Arch., 2024

PrivAgent: Agentic-based Red-teaming for LLM Privacy Leakage.
CoRR, 2024

Efficiently Identifying Watermarked Segments in Mixed-Source Texts.
CoRR, 2024

Evaluating Durability: Benchmark Insights into Multimodal Watermarking.
CoRR, 2024

MarkLLM: An Open-Source Toolkit for LLM Watermarking.
CoRR, 2024

Mapping the Increasing Use of LLMs in Scientific Papers.
CoRR, 2024

Perils of Self-Feedback: Self-Bias Amplifies in Large Language Models.
CoRR, 2024

Permute-and-Flip: An optimally robust and watermarkable decoder for LLMs.
CoRR, 2024

Weak-to-Strong Jailbreaking on Large Language Models.
CoRR, 2024

Invisible Image Watermarks Are Provably Removable Using Generative AI.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Bileve: Securing Text Provenance in Large Language Models Against Spoofing with Bi-level Signature.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

DE-COP: Detecting Copyrighted Content in Language Models Training Data.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Provable Robust Watermarking for AI-Generated Text.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

A Survey on Detection of LLMs-Generated Content.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

Chatbot and Fatigued Driver: Exploring the Use of LLM-Based Voice Assistants for Driving Fatigue.
Proceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems, 2024

Pride and Prejudice: LLM Amplifies Self-Bias in Self-Refinement.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

GumbelSoft: Diversified Language Model Watermarking via the GumbelMax-trick.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023
Generative Autoencoders as Watermark Attackers: Analyses of Vulnerabilities and Threats.
CoRR, 2023

Private Prediction Strikes Back! Private Kernelized Nearest Neighbors with Individual Rényi Filter.
Proceedings of the Uncertainty in Artificial Intelligence, 2023

Protecting Language Generation Models via Invisible Watermarking.
Proceedings of the International Conference on Machine Learning, 2023

Pre-trained Language Models Can be Fully Zero-Shot Learners.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

2022
Provably Confidential Language Modelling.
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

Distillation-Resistant Watermarking for Model Protection in NLP.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

Compressing Sentence Representation for Semantic Retrieval via Homomorphic Projective Distillation.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022, 2022

2021
An Optimal Reduction of TV-Denoising to Adaptive Online Learning.
Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, 2021

2020
A Multi-Semantic Metapath Model for Large Scale Heterogeneous Network Representation Learning.
CoRR, 2020

2019
Multi-Size Computer-Aided Diagnosis Of Positron Emission Tomography Images Using Graph Convolutional Networks.
Proceedings of the 16th IEEE International Symposium on Biomedical Imaging, 2019

Predicting Alzheimer's Disease by Hierarchical Graph Convolution from Positron Emission Tomography Imaging.
Proceedings of the 2019 IEEE International Conference on Big Data (IEEE BigData), 2019


  Loading...