Shu Yang
Orcid: 0009-0009-1786-9187Affiliations:
- King Abdullah University of Science and Technology, Provable Responsible AI and Data Analytics (PRADA) Lab, Thuwal, Saudi Arabia
- University of Macau, NLP2CT Lab, Taipa, Macau (former)
According to our database1,
Shu Yang authored at least 53 papers
between 2023 and 2026.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on linkedin.com
-
on twitter.com
On csauthors.net:
Bibliography
2026
STARS: Skill-Triggered Audit for Request-Conditioned Invocation Safety in Agent Systems.
CoRR, April, 2026
Hierarchical Alignment: Enforcing Hierarchical Instruction-Following in LLMs through Logical Consistency.
CoRR, April, 2026
CoRR, March, 2026
Word Recovery in Large Language Models Enables Character-Level Tokenization Robustness.
CoRR, March, 2026
Visual Self-Fulfilling Alignment: Shaping Safety-Oriented Personas via Threat-Related Images.
CoRR, March, 2026
Concept-Based Dictionary Learning for Inference-Time Safety in Vision Language Action Models.
CoRR, February, 2026
Faithful-Patchscopes: Understanding and Mitigating Model Bias in Hidden Representations Explanation of Large Language Models.
CoRR, February, 2026
Hearing is Believing? Evaluating and Analyzing Audio Language Model Sycophancy with SYAUDIO.
CoRR, January, 2026
CoRR, January, 2026
CoRR, January, 2026
Trans. Mach. Learn. Res., 2026
When Truth Is Overridden: Uncovering the Internal Origins of Sycophancy in Large Language Models.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026
2025
MONICA: Real-Time Monitoring and Calibration of Chain-of-Thought Sycophancy in Large Reasoning Models.
CoRR, November, 2025
PAHQ: Accelerating Automated Circuit Discovery through Mixed-Precision Inference Optimization.
CoRR, October, 2025
Can AI Truly Represent Your Voice in Deliberations? A Comprehensive Study of Large-Scale Opinion Aggregation with LLMs.
CoRR, October, 2025
Benchmarking and Mitigate Psychological Sycophancy in Medical Vision-Language Models.
CoRR, September, 2025
CoRR, August, 2025
Understanding and Mitigating Political Stance Cross-topic Generalization in Large Language Models.
CoRR, August, 2025
Is Long-to-Short a Free Lunch? Investigating Inconsistency and Reasoning Efficiency in LRMs.
CoRR, June, 2025
Mitigating Behavioral Hallucination in Multimodal Large Language Models for Sequential Images.
CoRR, June, 2025
CoRR, June, 2025
Understanding and Mitigating Cross-lingual Privacy Leakage via Language-specific and Universal Privacy Neurons.
CoRR, June, 2025
Is Your LLM-Based Multi-Agent a Reliable Real-World Planner? Exploring Fraud Detection in Travel Planning.
CoRR, May, 2025
A Survey on the Safety and Security Threats of Computer-Using Agents: JARVIS or Ultron?
CoRR, May, 2025
CoRR, April, 2025
CoRR, March, 2025
RepreGuard: Detecting LLM-Generated Text by Revealing Hidden Representation Patterns.
Trans. Assoc. Comput. Linguistics, 2025
Comput. Linguistics, 2025
Proceedings of the Machine Learning and Knowledge Discovery in Databases. Research Track, 2025
EAP-GP: Mitigating Saturation Effect in Gradient-based Automated Circuit Identification.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2025, 2025
Can Large Language Models Identify Implicit Suicidal Ideation? An Empirical Evaluation.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2025, 2025
Proceedings of the 31st International Conference on Computational Linguistics, 2025
Proceedings of the Findings of the Association for Computational Linguistics, 2025
Fraud-R1 : A Multi-Round Benchmark for Assessing the Robustness of LLM Against Augmented Fraud and Phishing Inducements.
Proceedings of the Findings of the Association for Computational Linguistics, 2025
Proceedings of the Findings of the Association for Computational Linguistics, 2025
2024
CoRR, 2024
What makes your model a low-empathy or warmth person: Exploring the Origins of Personality in LLMs.
CoRR, 2024
CoRR, 2024
CoRR, 2024
Human-AI Interactions in the Communication Era: Autophagy Makes Large Models Achieving Local Optima.
CoRR, 2024
Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024
2023
Is ChatGPT a Highly Fluent Grammatical Error Correction System? A Comprehensive Evaluation.
CoRR, 2023