Terry Yue Zhuo

Orcid: 0000-0002-5760-5188

Affiliations:
  • Monash University, Department of Data Science and AI, Melbourne, Australia
  • University of New South Wales, School of CSE, Australia


According to our database1, Terry Yue Zhuo authored at least 66 papers between 2020 and 2026.

Collaborative distances:

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
Bypassing Guardrails: Lessons Learned from Red Teaming ChatGPT.
ACM Trans. Softw. Eng. Methodol., May, 2026

SWE-Chain: Benchmarking Coding Agents on Chained Release-Level Package Upgrades.
CoRR, May, 2026

ShredBench: Evaluating the Semantic Reasoning Capabilities of Multimodal LLMs in Document Reconstruction.
CoRR, April, 2026

Identifying and Mitigating API Misuse in Large Language Models.
IEEE Trans. Software Eng., March, 2026

TRAJEVAL: Decomposing Code Agent Trajectories for Fine-Grained Diagnosis.
CoRR, March, 2026

Defending Code Language Models against Backdoor Attacks with Deceptive Cross-Entropy Loss.
ACM Trans. Softw. Eng. Methodol., February, 2026

Less Is More: DocString Compression in Code Generation.
ACM Trans. Softw. Eng. Methodol., February, 2026

Watermarking LLM Agent Trajectories.
CoRR, February, 2026

SecCodeBench-V2 Technical Report.
CoRR, February, 2026

Secure Code Generation via Online Reinforcement Learning with Vulnerability Reward Model.
CoRR, February, 2026

To Defend Against Cyber Attacks, We Must Teach AI Agents to Hack.
CoRR, February, 2026

Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces.
CoRR, January, 2026

Less is more: Towards green code large language models via unified structural pruning.
Inf. Process. Manag., 2026

PrivCode: When Code Generation Meets Differential Privacy.
Proceedings of the 33rd Annual Network and Distributed System Security Symposium, 2026

2025
SimpleDevQA: Benchmarking Large Language Models on Development Knowledge QA.
CoRR, December, 2025

Large Language Model for Verilog Code Generation: Literature Review and the Road Ahead.
CoRR, December, 2025

LLMAID: Identifying AI Capabilities in Android Apps with LLMs.
CoRR, November, 2025

From Code Foundation Models to Agents and Applications: A Comprehensive Survey and Practical Guide to Code Intelligence.
CoRR, November, 2025

LLM-as-a-Judge for Software Engineering: Literature Review, Vision, and the Road Ahead.
CoRR, October, 2025

HackWorld: Evaluating Computer-Use Agents on Exploiting Web Application Vulnerabilities.
CoRR, October, 2025

BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution.
CoRR, October, 2025

Bamboo: LLM-Driven Discovery of API-Permission Mappings in the Android Framework.
CoRR, October, 2025

The Cream Rises to the Top: Efficient Reranking Method for Verilog Code Generation.
CoRR, September, 2025

An Empirical Study of Vulnerabilities in Python Packages and Their Detection.
CoRR, September, 2025

Training Language Model Agents to Find Vulnerabilities with CTF-Dojo.
CoRR, August, 2025

PTMPicker: Facilitating Efficient Pretrained Model Selection for Application Developers.
CoRR, August, 2025

Cyber-Zero: Training Cybersecurity Agents without Runtime.
CoRR, August, 2025

A Mixture of Linear Corrections Generates Secure Code.
CoRR, July, 2025

EffiBench-X: A Multi-Language Benchmark for Measuring Efficiency of LLM-Generated Code.
CoRR, May, 2025

From Code to Courtroom: LLMs as the New Software Judges.
CoRR, March, 2025

CodeArena: A Collective Evaluation Platform for LLM Code Generation.
CoRR, March, 2025

BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

NLP+Code: Code Intelligence in Language Models.
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025


CodeArena: A Collective Evaluation Platform for LLM Code Generation.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), 2025

2024
Chain-of-Thought in Neural Code Generation: From and for Lightweight Language Models.
IEEE Trans. Software Eng., September, 2024

A First Look at On-device Models in iOS Apps.
ACM Trans. Softw. Eng. Methodol., January, 2024

GitChameleon: Unmasking the Version-Switching Capabilities of Code Generation Models.
CoRR, 2024

DeCE: Deceptive Cross-Entropy Loss Designed for Defending Backdoor Attacks.
CoRR, 2024

BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions.
CoRR, 2024

XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging Upcycled Mixture-of-Experts.
CoRR, 2024

Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order.
CoRR, 2024

Robustness, Security, Privacy, Explainability, Efficiency, and Usability of Large Language Models for Code.
CoRR, 2024

StarCoder 2 and The Stack v2: The Next Generation.
CoRR, 2024

Astraios: Parameter-Efficient Instruction Tuning Code Large Language Models.
CoRR, 2024

OctoPack: Instruction Tuning Code Large Language Models.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

ICE-Score: Instructing Large Language Models to Evaluate Code.
Proceedings of the Findings of the Association for Computational Linguistics: EACL 2024, 2024

2023
StarCoder: may the source be with you!
Trans. Mach. Learn. Res., 2023

Fake News Detectors are Biased against Texts Generated by Large Language Models.
CoRR, 2023

Pop Quiz! Do Pre-trained Code Models Possess Knowledge of Correct API Names?
CoRR, 2023

Data Augmentation Approaches for Source Code Models: A Survey.
CoRR, 2023

Large Language Models Are State-of-the-Art Evaluators of Code Generation.
CoRR, 2023

Exploring AI Ethics of ChatGPT: A Diagnostic Analysis.
CoRR, 2023

SantaCoder: don't reach for the stars!
CoRR, 2023

Training-free Lexical Backdoor Attacks on Language Models.
Proceedings of the ACM Web Conference 2023, 2023

DetectLLM: Leveraging Log Rank Information for Zero-Shot Detection of Machine-Generated Text.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Can ChatGPT Perform Reasoning Using the IRAC Method in Analyzing Legal Scenarios Like a Lawyer?
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

ViLPAct: A Benchmark for Compositional Generalization on Multimodal Human Activities.
Proceedings of the Findings of the Association for Computational Linguistics: EACL 2023, 2023

On Robustness of Prompt-based Semantic Parsing with Large Pre-trained Language Model: An Empirical Study on Codex.
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 2023

Rethinking Round-Trip Translation for Machine Translation Evaluation.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

FACTUAL: A Benchmark for Faithful and Consistent Textual Scene Graph Parsing.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

2022
Rethinking Round-trip Translation for Automatic Machine Translation Evaluation.
CoRR, 2022

Paraphrasing Techniques for Maritime QA system.
Proceedings of the 25th International Conference on Information Fusion, 2022

2021
PyArmadillo: a streamlined linear algebra library for Python.
J. Open Source Softw., 2021

Neural-Symbolic Commonsense Reasoner with Relation Predictors.
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

2020
CosMo: Conditional Seq2Seq-based Mixture Model for Zero-Shot Commonsense Question Answering.
Proceedings of the 28th International Conference on Computational Linguistics, 2020


  Loading...