Shu Yang

Orcid: 0009-0009-1786-9187

Affiliations:

King Abdullah University of Science and Technology, Provable Responsible AI and Data Analytics (PRADA) Lab, Thuwal, Saudi Arabia
University of Macau, NLP2CT Lab, Taipa, Macau (former)

According to our database¹, Shu Yang authored at least 53 papers between 2023 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Bibliography

2026

STARS: Skill-Triggered Audit for Request-Conditioned Invocation Safety in Agent Systems.

[BibT_eX]

[DOI]

CoRR, April, 2026

Hierarchical Alignment: Enforcing Hierarchical Instruction-Following in LLMs through Logical Consistency.

[BibT_eX]

[DOI]

CoRR, April, 2026

Multi-User Large Language Model Agents.

[BibT_eX]

[DOI]

CoRR, April, 2026

Accelerating Suffix Jailbreak attacks with Prefix-Shared KV-cache.

[BibT_eX]

[DOI]

CoRR, March, 2026

Neuron-Aware Data Selection In Instruction Tuning For Large Language Models.

[BibT_eX]

[DOI]

CoRR, March, 2026

Word Recovery in Large Language Models Enables Character-Level Tokenization Robustness.

[BibT_eX]

[DOI]

CoRR, March, 2026

Visual Self-Fulfilling Alignment: Shaping Safety-Oriented Personas via Threat-Related Images.

[BibT_eX]

[DOI]

CoRR, March, 2026

Concept-Based Dictionary Learning for Inference-Time Safety in Vision Language Action Models.

[BibT_eX]

[DOI]

CoRR, February, 2026

Faithful-Patchscopes: Understanding and Mitigating Model Bias in Hidden Representations Explanation of Large Language Models.

[BibT_eX]

[DOI]

CoRR, February, 2026

Hearing is Believing? Evaluating and Analyzing Audio Language Model Sycophancy with SYAUDIO.

[BibT_eX]

[DOI]

Junchi Yao

Lokranjan Lakshmikanthan

CoRR, January, 2026

Not All Code Is Equal: A Data-Centric Study of Code Complexity and LLM Reasoning.

[BibT_eX]

[DOI]

CoRR, January, 2026

AutoMonitor-Bench: Evaluating the Reliability of LLM-Based Misbehavior Monitor.

[BibT_eX]

[DOI]

CoRR, January, 2026

Towards Representation Backdoor on CLIP via Concept Confusion.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2026

When Truth Is Overridden: Uncovering the Internal Origins of Sycophancy in Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025

Investigating CoT Monitorability in Large Reasoning Models.

[BibT_eX]

[DOI]

CoRR, November, 2025

MONICA: Real-Time Monitoring and Calibration of Chain-of-Thought Sycophancy in Large Reasoning Models.

[BibT_eX]

[DOI]

CoRR, November, 2025

PAHQ: Accelerating Automated Circuit Discovery through Mixed-Precision Inference Optimization.

[BibT_eX]

[DOI]

CoRR, October, 2025

Can AI Truly Represent Your Voice in Deliberations? A Comprehensive Study of Large-Scale Opinion Aggregation with LLMs.

[BibT_eX]

[DOI]

CoRR, October, 2025

Benchmarking and Mitigate Psychological Sycophancy in Medical Vision-Language Models.

[BibT_eX]

[DOI]

CoRR, September, 2025

PersRM-R1: Enhance Personalized Reward Modeling with Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, August, 2025

Understanding and Mitigating Political Stance Cross-topic Generalization in Large Language Models.

[BibT_eX]

[DOI]

CoRR, August, 2025

Is Long-to-Short a Free Lunch? Investigating Inconsistency and Reasoning Efficiency in LRMs.

[BibT_eX]

[DOI]

CoRR, June, 2025

The Compositional Architecture of Regret in Large Language Models.

[BibT_eX]

[DOI]

CoRR, June, 2025

Mitigating Behavioral Hallucination in Multimodal Large Language Models for Sequential Images.

[BibT_eX]

[DOI]

CoRR, June, 2025

Flattery in Motion: Benchmarking and Analyzing Sycophancy in Video-LLMs.

[BibT_eX]

[DOI]

CoRR, June, 2025

Understanding and Mitigating Cross-lingual Privacy Leakage via Language-specific and Universal Privacy Neurons.

[BibT_eX]

[DOI]

CoRR, June, 2025

Is Your LLM-Based Multi-Agent a Reliable Real-World Planner? Exploring Fraud Detection in Travel Planning.

[BibT_eX]

[DOI]

CoRR, May, 2025

A Survey on the Safety and Security Threats of Computer-Using Agents: JARVIS or Ultron?

[BibT_eX]

[DOI]

CoRR, May, 2025

Understanding Aha Moments: from External Observations to Internal Mechanisms.

[BibT_eX]

[DOI]

CoRR, April, 2025

Rethinking Prompt-based Debiasing in Large Language Models.

[BibT_eX]

[DOI]

CoRR, March, 2025

C<sup>2</sup> ATTACK: Towards Representation Backdoor on CLIP via Concept Confusion.

[BibT_eX]

[DOI]

CoRR, March, 2025

Evaluating Data Influence in Meta Learning.

[BibT_eX]

[DOI]

CoRR, January, 2025

RepreGuard: Detecting LLM-Generated Text by Revealing Hidden Representation Patterns.

[BibT_eX]

[DOI]

Trans. Assoc. Comput. Linguistics, 2025

A Survey on LLM-Generated Text Detection: Necessity, Methods, and Future Directions.

[BibT_eX]

[DOI]

Comput. Linguistics, 2025

Stable Vision Concept Transformers for Medical Diagnosis.

[BibT_eX]

[DOI]

Proceedings of the Machine Learning and Knowledge Discovery in Databases. Research Track, 2025

EAP-GP: Mitigating Saturation Effect in Gradient-based Automated Circuit Identification.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

Understanding How Value Neurons Shape the Generation of Specified Values in LLMs.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2025, 2025

Can Large Language Models Identify Implicit Suicidal Ideation? An Empirical Evaluation.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2025, 2025

Who Wrote This? The Key to Zero-Shot LLM-Generated Text Detection Is GECScore.

[BibT_eX]

[DOI]

Proceedings of the 31st International Conference on Computational Linguistics, 2025

Understanding the Repeat Curse in Large Language Models from a Feature Perspective.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2025

Fraud-R1 : A Multi-Round Benchmark for Assessing the Robustness of LLM Against Augmented Fraud and Phishing Inducements.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2025

Rethinking Prompt-based Debiasing in Large Language Model.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2025

2024

Dissecting Misalignment of Multimodal Large Language Models via Influence Function.

[BibT_eX]

[DOI]

CoRR, 2024

What makes your model a low-empathy or warmth person: Exploring the Origins of Personality in LLMs.

[BibT_eX]

[DOI]

CoRR, 2024

Understanding Reasoning in Chain-of-Thought from the Hopfieldian View.

[BibT_eX]

[DOI]

CoRR, 2024

A Hopfieldian View-based Interpretation for Chain-of-Thought Reasoning.

[BibT_eX]

[DOI]

CoRR, 2024

Leveraging Logical Rules in Knowledge Editing: A Cherry on the Top.

[BibT_eX]

[DOI]

CoRR, 2024

PROMPT-SAW: Leveraging Relation-Aware Graphs for Textual Prompt Compression.

[BibT_eX]

[DOI]

CoRR, 2024

Dialectical Alignment: Resolving the Tension of 3H and Security Threats of LLMs.

[BibT_eX]

[DOI]

CoRR, 2024

Human-AI Interactions in the Communication Era: Autophagy Makes Large Models Achieving Local Optima.

[BibT_eX]

[DOI]

CoRR, 2024

MoRAL: MoE Augmented LoRA for LLMs' Lifelong Learning.

[BibT_eX]

[DOI]

CoRR, 2024

DetectRL: Benchmarking LLM-Generated Text Detection in Real-World Scenarios.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

2023

Is ChatGPT a Highly Fluent Grammatical Error Correction System? A Comprehensive Evaluation.

[BibT_eX]

[DOI]

CoRR, 2023

Shu Yang

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...