We stand with Ukraine

We stand with Ukraine

Wenxuan Wang

Orcid: 0000-0002-9803-8204

Affiliations:

Chinese University of Hong Kong, Department of Computer Science and Engineering, Hong Kong (PhD 2023)

According to our database¹, Wenxuan Wang authored at least 103 papers between 2017 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

Online presence:

on orcid.org

On csauthors.net:

Bibliography

2025

ComboBench: Can LLMs Manipulate Physical Devices to Play Virtual Reality Games?

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

CoRR, October, 2025

UI-Ins: Enhancing GUI Grounding with Multi-Perspective Instruction-as-Reasoning.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, October, 2025

Testing and Enhancing Multi-Agent Systems for Robust Code Generation.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

Shing-Chi Cheung

CoRR, October, 2025

Curing Miracle Steps in LLM Mathematical Reasoning with Rubric Rewards.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

CoRR, October, 2025

Social Welfare Function Leaderboard: When LLM Agents Allocate Social Welfare.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, October, 2025

The Hunger Game Debate: On the Emergence of Over-Competition in Multi-Agent Systems.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

,

Zhuosheng Zhang

,

,

,

,

,

CoRR, September, 2025

Metamorphic Testing for Audio Content Moderation Software.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

CoRR, September, 2025

The PIMMUR Principles: Ensuring Validity in Collective Behavior of LLM Societies.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

CoRR, September, 2025

Digging Into the Internal: Causality-Based Analysis of LLM Function Calling.

[BibT_eX]

[DOI]

,

,

,

,

,

CoRR, September, 2025

Med-RewardBench: Benchmarking Reward Models and Judges for Medical Multimodal Large Language Models.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

CoRR, August, 2025

DRQA: Dynamic Reasoning Quota Allocation for Controlling Overthinking in Reasoning Large Language Models.

[BibT_eX]

[DOI]

,

,

,

,

Zhuosheng Zhang

,

CoRR, August, 2025

Beyond the Leaderboard: Rethinking Medical Benchmarks for Large Language Models.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

CoRR, August, 2025

Medical Reasoning in the Era of LLMs: A Systematic Review of Enhancement Techniques and Applications.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

CoRR, August, 2025

ChartM<sup>3</sup>: Benchmarking Chart Editing with Multimodal Instructions.

[BibT_eX]

[DOI]

,

,

,

,

,

,

CoRR, July, 2025

3D Software Synthesis Guided by Constraint-Expressive Intermediate Representation.

[BibT_eX]

[DOI]

,

,

,

,

CoRR, July, 2025

Reasoning Models Can be Easily Hacked by Fake Reasoning Bias.

[BibT_eX]

[DOI]

,

,

,

,

,

CoRR, July, 2025

POLYCHARTQA: Benchmarking Large Vision-Language Models with Multilingual Chart Question Answering.

[BibT_eX]

[DOI]

,

,

,

,

CoRR, July, 2025

A Survey of Deep Learning for Geometry Problem Solving.

[BibT_eX]

[DOI]

,

,

CoRR, July, 2025

SoK: Evaluating Jailbreak Guardrails for Large Language Models.

[BibT_eX]

[DOI]

,

,

,

,

,

CoRR, June, 2025

VisCRA: A Visual Chain Reasoning Attack for Jailbreaking Multimodal Large Language Models.

[BibT_eX]

[DOI]

,

,

,

CoRR, May, 2025

Towards Evaluating Proactive Risk Awareness of Multimodal Language Models.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

CoRR, May, 2025

Two Experts Are All You Need for Steering Thinking: Reinforcing Cognitive Effort in MoE Reasoning Models Without Additional Training.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, May, 2025

Trust, But Verify: A Self-Verification Approach to Reinforcement Learning with Verifiable Rewards.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

CoRR, May, 2025

IP Leakage Attacks Targeting LLM-Based Multi-Agent Systems.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

Shing-Chi Cheung

CoRR, May, 2025

A Survey on the Safety and Security Threats of Computer-Using Agents: JARVIS or Ultron?

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

CoRR, May, 2025

DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

Zhuosheng Zhang

,

,

,

,

CoRR, April, 2025

STShield: Single-Token Sentinel for Real-Time Jailbreak Detection in Large Language Models.

[BibT_eX]

[DOI]

,

,

,

,

,

,

CoRR, March, 2025

TimeZero: Temporal Video Grounding with Reasoning-Guided LVLM.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

CoRR, March, 2025

VisBias: Measuring Explicit and Implicit Social Biases in Vision Language Models.

[BibT_eX]

[DOI]

,

,

,

,

,

CoRR, March, 2025

VisFactor: Benchmarking Fundamental Visual Cognition in Multimodal Large Language Models.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

CoRR, February, 2025

A Survey of LLM-based Agents in Medicine: How far are we from Baymax?

[BibT_eX]

[DOI]

,

,

,

,

,

,

CoRR, February, 2025

VLMs as GeoGuessr Masters: Exceptional Performance, Hidden Biases, and Privacy Risks.

[BibT_eX]

[DOI]

,

,

,

,

,

CoRR, February, 2025

Making Them a Malicious Database: Exploiting Query Code to Jailbreak Aligned Large Language Models.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

CoRR, February, 2025

Fact-or-Fair: A Checklist for Behavioral Testing of AI Models on Fairness-Related Queries.

[BibT_eX]

[DOI]

,

,

,

,

,

,

CoRR, February, 2025

How Should I Build A Benchmark? Revisiting Code-Related Benchmarks For LLMs.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Shing-Chi Cheung

CoRR, January, 2025

Divide-and-Conquer: Generating UI Code from Screenshots.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proc. ACM Softw. Eng., 2025

On the shortcut learning in multilingual neural machine translation.

[BibT_eX]

[DOI]

,

,

,

,

Neurocomputing, 2025

Knowledge-to-Jailbreak: Investigating Knowledge-driven Jailbreaking Attacks for Large Language Models.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, V.2, 2025

On the Resilience of LLM-Based Multi-Agent Collaboration with Faulty Agents.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Competing Large Language Models in Multi-Agent Gaming Environments.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

QueryAttack: Jailbreaking Aligned Large Language Models Using Structured Non-natural Query Language.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

Proceedings of the Findings of the Association for Computational Linguistics, 2025

IntentionESC: An Intention-Centered Framework for Enhancing Emotional Support in Dialogue Systems.

[BibT_eX]

[DOI]

,

,

Proceedings of the Findings of the Association for Computational Linguistics, 2025

Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

A Survey of LLM-based Agents in Medicine: How far are we from Baymax?

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

Proceedings of the Findings of the Association for Computational Linguistics, 2025

Asclepius: A Spectrum Evaluation Benchmark for Medical Multi-Modal Large Language Models.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Insight Over Sight: Exploring the Vision-Knowledge Conflicts in Multimodal LLMs.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

EAGLE: Expert-Guided Self-Enhancement for Preference Alignment in Pathology Large Vision-Language Model.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Can't See the Forest for the Trees: Benchmarking Multimodal Safety Awareness for Multimodal LLMs.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Chain-of-Jailbreak Attack for Image Generation Models via Step by Step Editing.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

Proceedings of the Findings of the Association for Computational Linguistics, 2025

2024

Understanding and Mitigating the Uncertainty in Zero-Shot Translation.

[BibT_eX]

[DOI]

,

,

,

,

IEEE ACM Trans. Audio Speech Lang. Process., 2024

MRWeb: An Exploration of Generating Multi-Page Resource-Aware Web Code from UI Designs.

[BibT_eX]

[DOI]

,

,

,

,

,

CoRR, 2024

Medchain: Bridging the Gap Between LLM Agents and Clinical Practice through Interactive Sequential Benchmarking.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

CoRR, 2024

Chain-of-Jailbreak Attack for Image Generation Models via Editing Step by Step.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

CoRR, 2024

Learning to Ask: When LLMs Meet Unclear Instruction.

[BibT_eX]

[DOI]

,

,

,

,

,

,

CoRR, 2024

Testing and Evaluation of Large Language Models: Correctness, Non-Toxicity, and Fairness.

[BibT_eX]

[DOI]

CoRR, 2024

On the Resilience of Multi-Agent Systems with Malicious Agents.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

CoRR, 2024

Automatically Generating UI Code from Screenshot: A Divide-and-Conquer-Based Approach.

[BibT_eX]

[DOI]

,

,

,

,

,

,

CoRR, 2024

Knowledge-to-Jailbreak: One Knowledge Point Worth One Attack.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

CoRR, 2024

Exploring Multi-Lingual Bias of Large Code Models in Code Generation.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

CoRR, 2024

How Well Can LLMs Echo Us? Evaluating AI Chatbots' Role-Play Ability with ECHO.

[BibT_eX]

[DOI]

,

,

,

,

,

CoRR, 2024

How Far Are We on the Decision-Making of LLMs? Evaluating LLMs' Gaming Ability in Multi-Agent Environments.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

CoRR, 2024

Asclepius: A Spectrum Evaluation Benchmark for Medical Multi-Modal Large Language Models.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

CoRR, 2024

The Earth is Flat? Unveiling Factual Errors in Large Language Models.

[BibT_eX]

[DOI]

,

,

,

,

,

,

CoRR, 2024

A & B == B & A: Triggering Logical Reasoning Failures in Large Language Models.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

CoRR, 2024

Apathetic or Empathetic? Evaluating LLMs' Emotional Alignments with Humans.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

New Job, New Gender? Measuring the Social Bias in Image Generation Models.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

A Systematic Evaluation of Large Code Models in API Suggestion: When, Which, and How.

[BibT_eX]

[DOI]

,

,

,

,

Chun Yong Chong

,

,

Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering, 2024

GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via Cipher.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the Twelfth International Conference on Learning Representations, 2024

On the Humanity of Conversational AI: Evaluating the Psychological Portrayal of LLMs.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the Twelfth International Conference on Learning Representations, 2024

LogicAsker: Evaluating and Improving the Logical Reasoning Ability of Large Language Models.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

On the Reliability of Psychological Scales on Large Language Models.

[BibT_eX]

[DOI]

,

,

,

,

,

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Boosting Adversarial Transferability by Block Shuffle and Rotation.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Does ChatGPT Know That It Does Not Know? Evaluating the Black-Box Calibration of ChatGPT.

[BibT_eX]

[DOI]

,

,

,

,

,

Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

All Languages Matter: On the Multilingual Safety of LLMs.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the Findings of the Association for Computational Linguistics, 2024

Not All Countries Celebrate Thanksgiving: On the Cultural Dominance in Large Language Models.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023

APIBench: A Benchmark Dataset for Evaluating API Recommendation Approaches in Python and Java.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Dataset, November, 2023

Revisiting, Benchmarking and Exploring API Recommendation: How Far Are We?

[BibT_eX]

[DOI]

,

,

,

,

,

,

IEEE Trans. Software Eng., April, 2023

Leveraging Word Guessing Games to Assess the Intelligence of Large Language Models.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

CoRR, 2023

Who is ChatGPT? Benchmarking LLMs' Psychological Portrayal Using PsychoBench.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

CoRR, 2023

All Languages Matter: On the Multilingual Safety of Large Language Models.

[BibT_eX]

[DOI]

,

,

,

,

,

,

CoRR, 2023

Emotionally Numb or Empathetic? Evaluating How LLMs Feel Using EmotionBench.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

CoRR, 2023

ChatGPT an ENFJ, Bard an ISTJ: Empirical Study on Personalities of Large Language Models.

[BibT_eX]

[DOI]

,

,

,

,

,

CoRR, 2023

Constructing Effective In-Context Demonstration for Code Intelligence Tasks: An Empirical Study.

[BibT_eX]

[DOI]

,

,

,

,

CoRR, 2023

ParroT: Translating During Chat Using Large Language Models.

[BibT_eX]

[DOI]

,

,

,

,

,

CoRR, 2023

ChatGPT or Grammarly? Evaluating ChatGPT on Grammatical Error Correction Benchmark.

[BibT_eX]

[DOI]

,

,

,

,

CoRR, 2023

Is ChatGPT A Good Translator? A Preliminary Study.

[BibT_eX]

[DOI]

,

,

,

,

CoRR, 2023

BiasAsker: Measuring the Bias in Conversational AI System.

[BibT_eX]

[DOI]

,

,

,

,

,

Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2023

An Image is Worth a Thousand Toxic Words: A Metamorphic Testing Framework for Content Moderation Software.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering, 2023

Generative Type Inference for Python.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering, 2023

What Makes Good In-Context Demonstrations for Code Intelligence Tasks with LLMs?

[BibT_eX]

[DOI]

,

,

,

,

,

Proceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering, 2023

Validating Multimedia Content Moderation Software via Semantic Fusion.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, 2023

MTTM: Metamorphic Testing for Textual Content Moderation Software.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

Proceedings of the 45th IEEE/ACM International Conference on Software Engineering, 2023

ParroT: Translating during Chat using Large Language Models tuned with Human Translation and Feedback.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Improving the Transferability of Adversarial Samples by Path-Augmented Method.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

Tencent's Multilingual Machine Translation System for WMT22 Large-Scale African Languages.

[BibT_eX]

[DOI]

,

,

,

,

,

Proceedings of the Seventh Conference on Machine Translation, 2022

AEON: a method for automatic evaluation of NLP test cases.

[BibT_eX]

[DOI]

,

,

,

,

,

Proceedings of the ISSTA '22: 31st ACM SIGSOFT International Symposium on Software Testing and Analysis, Virtual Event, South Korea, July 18, 2022

Improving Adversarial Transferability via Neuron Attribution-based Attacks.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Understanding and Improving Sequence-to-Sequence Pretraining for Neural Machine Translation.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

2021

APIBench: A Benchmark Dataset for Evaluating API Recommendation Approaches in Python and Java.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Dataset, December, 2021

Language Models are Good Translators.

[BibT_eX]

[DOI]

,

,

,

,

,

CoRR, 2021

2020

Rethinking the Value of Transformer Components.

[BibT_eX]

[DOI]

,

Proceedings of the 28th International Conference on Computational Linguistics, 2020

2017

RUC at MediaEval 2017: Predicting Media Interestingness Task.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the Working Notes Proceedings of the MediaEval 2017 Workshop co-located with the Conference and Labs of the Evaluation Forum (CLEF 2017), 2017

Emotion recognition with multimodal features and temporal models.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the 19th ACM International Conference on Multimodal Interaction, 2017

Loading...