We stand with Ukraine

We stand with Ukraine

Terry Yue Zhuo

Orcid: 0000-0002-5760-5188

Affiliations:

Monash University, Department of Data Science and AI, Melbourne, Australia
University of New South Wales, School of CSE, Australia

According to our database¹, Terry Yue Zhuo authored at least 66 papers between 2020 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

Bypassing Guardrails: Lessons Learned from Red Teaming ChatGPT.

[DOI]

,

,

,

,

ACM Trans. Softw. Eng. Methodol., May, 2026

SWE-Chain: Benchmarking Coding Agents on Chained Release-Level Package Upgrades.

[DOI]

,

,

,

,

,

,

,

CoRR, May, 2026

ShredBench: Evaluating the Semantic Reasoning Capabilities of Multimodal LLMs in Document Reconstruction.

[DOI]

,

,

,

,

,

,

,

,

CoRR, April, 2026

Identifying and Mitigating API Misuse in Large Language Models.

[DOI]

,

,

,

,

,

,

IEEE Trans. Software Eng., March, 2026

TRAJEVAL: Decomposing Code Agent Trajectories for Fine-Grained Diagnosis.

[DOI]

,

,

,

Farima Farmahinifarahani

,

,

,

,

Rajdeep Mukherjee

,

CoRR, March, 2026

Defending Code Language Models against Backdoor Attacks with Deceptive Cross-Entropy Loss.

[DOI]

,

,

,

,

,

,

ACM Trans. Softw. Eng. Methodol., February, 2026

Less Is More: DocString Compression in Code Generation.

[DOI]

,

,

,

,

,

,

,

,

,

ACM Trans. Softw. Eng. Methodol., February, 2026

Watermarking LLM Agent Trajectories.

[DOI]

,

,

,

,

,

,

,

,

CoRR, February, 2026

SecCodeBench-V2 Technical Report.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, February, 2026

Secure Code Generation via Online Reinforcement Learning with Vulnerability Reward Model.

[DOI]

,

,

,

,

,

,

CoRR, February, 2026

To Defend Against Cyber Attacks, We Must Teach AI Agents to Hack.

[DOI]

,

,

,

CoRR, February, 2026

Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces.

[DOI]

Mike A. Merrill

,

Alexander Glenn Shaw

,

Nicholas Carlini

,

,

,

,

,

Jeong Yeon Shin

,

,

Estefany Kelly Buchanan

,

,

,

,

,

,

Marianna Nezhurina

,

,

,

Orfeas Menis-Mastromichalakis

,

,

,

,

,

Leon Liangyu Chen

,

,

,

,

,

,

,

,

,

Steven Dillmann

,

,

Andrew Lanpouthakoun

,

,

,

Etash Kumar Guha

,

Gabriel H. S. Dreiman

,

,

,

,

Niklas Muennighoff

,

,

,

Shreyas Pimpalgaonkar

,

Tushar Aggarwal

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Dariush Wahdany

,

,

,

,

,

,

,

Arinbjörn Kolbeinsson

,

,

Christopher Michael Rytting

,

,

,

,

,

CoRR, January, 2026

Less is more: Towards green code large language models via unified structural pruning.

[DOI]

,

,

,

,

,

,

,

Inf. Process. Manag., 2026

PrivCode: When Code Generation Meets Differential Privacy.

[DOI]

,

,

,

,

,

Matt Fredrikson

,

Proceedings of the 33rd Annual Network and Distributed System Security Symposium, 2026

2025

SimpleDevQA: Benchmarking Large Language Models on Development Knowledge QA.

[DOI]

,

,

,

,

,

,

,

,

,

CoRR, December, 2025

Large Language Model for Verilog Code Generation: Literature Review and the Road Ahead.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, December, 2025

LLMAID: Identifying AI Capabilities in Android Apps with LLMs.

[DOI]

,

,

,

,

,

,

,

,

,

CoRR, November, 2025

From Code Foundation Models to Agents and Applications: A Comprehensive Survey and Practical Guide to Code Intelligence.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Wangchunshu Zhou

,

,

,

,

,

,

,

,

,

,

,

Shuangyong Song

,

,

,

,

Zhaoxiang Zhang

,

CoRR, November, 2025

LLM-as-a-Judge for Software Engineering: Literature Review, Vision, and the Road Ahead.

[DOI]

,

,

,

Christoph Treude

,

,

,

,

CoRR, October, 2025

HackWorld: Evaluating Computer-Use Agents on Exploiting Web Application Vulnerabilities.

[DOI]

,

,

,

,

,

,

,

,

CoRR, October, 2025

BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution.

[DOI]

,

,

,

,

,

,

Bhupesh Bishnoi

,

Vaisakhi Mishra

,

,

,

,

,

,

,

,

,

,

Sabina Abdurakhmanova

,

,

,

,

Kenneth Hamilton

,

,

,

,

,

,

,

,

,

,

,

,

Wasi Uddin Ahmad

,

,

,

,

,

Torsten Scholak

,

Leandro von Werra

CoRR, October, 2025

Bamboo: LLM-Driven Discovery of API-Permission Mappings in the Android Framework.

[DOI]

,

,

,

,

,

,

,

,

CoRR, October, 2025

The Cream Rises to the Top: Efficient Reranking Method for Verilog Code Generation.

[DOI]

,

,

,

,

,

CoRR, September, 2025

An Empirical Study of Vulnerabilities in Python Packages and Their Detection.

[DOI]

,

,

,

,

,

CoRR, September, 2025

Training Language Model Agents to Find Vulnerabilities with CTF-Dojo.

[DOI]

,

,

,

,

CoRR, August, 2025

PTMPicker: Facilitating Efficient Pretrained Model Selection for Application Developers.

[DOI]

,

,

,

,

,

,

CoRR, August, 2025

Cyber-Zero: Training Cybersecurity Agents without Runtime.

[DOI]

,

,

,

,

CoRR, August, 2025

A Mixture of Linear Corrections Generates Secure Code.

[DOI]

,

,

,

Matt Fredrikson

,

Corina S. Pasareanu

CoRR, July, 2025

EffiBench-X: A Multi-Language Benchmark for Measuring Efficiency of LLM-Generated Code.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

CoRR, May, 2025

From Code to Courtroom: LLMs as the New Software Judges.

[DOI]

,

,

,

Christoph Treude

,

,

,

,

CoRR, March, 2025

CodeArena: A Collective Evaluation Platform for LLM Code Generation.

[DOI]

,

,

,

,

,

,

,

CoRR, March, 2025

BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions.

[DOI]

,

,

,

,

,

Ratnadira Widyasari

,

Imam Nur Bani Yusuf

,

,

,

,

,

,

,

Armel Randy Zebaze

,

,

,

,

,

,

,

et al.

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

NLP+Code: Code Intelligence in Language Models.

[DOI]

,

,

,

Wasi Uddin Ahmad

,

,

Loubna Ben Allal

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

Aurora-M: Open Source Continual Pre-training for Multilingual Language and Code.

[DOI]

Proceedings of the 31st International Conference on Computational Linguistics, 2025

CodeArena: A Collective Evaluation Platform for LLM Code Generation.

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), 2025

2024

Chain-of-Thought in Neural Code Generation: From and for Lightweight Language Models.

[DOI]

,

,

,

,

,

IEEE Trans. Software Eng., September, 2024

A First Look at On-device Models in iOS Apps.

[DOI]

,

,

,

,

ACM Trans. Softw. Eng. Methodol., January, 2024

GitChameleon: Unmasking the Version-Switching Capabilities of Code Generation Models.

[DOI]

,

Justine Gehring

,

,

Eilif B. Müller

,

,

,

CoRR, 2024

DeCE: Deceptive Cross-Entropy Loss Designed for Defending Backdoor Attacks.

[DOI]

,

,

,

,

,

,

CoRR, 2024

BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions.

[DOI]

CoRR, 2024

XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging Upcycled Mixture-of-Experts.

[DOI]

,

,

,

,

CoRR, 2024

Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order.

[DOI]

CoRR, 2024

Robustness, Security, Privacy, Explainability, Efficiency, and Usability of Large Language Models for Code.

[DOI]

,

,

,

Premkumar T. Devanbu

,

CoRR, 2024

StarCoder 2 and The Stack v2: The Next Generation.

[DOI]

,

,

Loubna Ben Allal

,

Federico Cassano

,

Joel Lamy-Poirier

,

,

,

,

,

,

,

,

,

,

,

,

,

Dmitry Abulkhanov

,

,

,

,

,

,

,

,

Evgenii Zheltonozhskii

,

Nii Osae Osae Dade

,

,

,

,

,

,

,

,

,

Niklas Muennighoff

,

,

Muhtasham Oblokulov

,

Christopher Akiki

,

,

,

,

,

,

,

,

Olivier Dehaene

,

,

,

Julian J. McAuley

,

,

Torsten Scholak

,

Sébastien Paquet

,

Jennifer Robinson

,

Carolyn Jane Anderson

,

Nicolas Chapados

,

et al.

CoRR, 2024

Astraios: Parameter-Efficient Instruction Tuning Code Large Language Models.

[DOI]

,

,

Nitchakarn Suppattarachai

,

Leandro von Werra

,

,

,

Niklas Muennighoff

CoRR, 2024

OctoPack: Instruction Tuning Code Large Language Models.

[DOI]

Niklas Muennighoff

,

,

Armel Randy Zebaze

,

,

,

,

,

,

Leandro von Werra

,

Proceedings of the Twelfth International Conference on Learning Representations, 2024

ICE-Score: Instructing Large Language Models to Evaluate Code.

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EACL 2024, 2024

2023

StarCoder: may the source be with you!

[DOI]

,

Loubna Ben Allal

,

,

Niklas Muennighoff

,

,

,

,

Christopher Akiki

,

,

,

,

Evgenii Zheltonozhskii

,

,

,

Olivier Dehaene

,

Mishig Davaadorj

,

Joel Lamy-Poirier

,

,

,

Nicolas Gontier

,

,

,

,

Logesh Kumar Umapathi

,

,

Benjamin Lipkin

,

Muhtasham Oblokulov

,

,

,

Jason T. Stillerman

,

Siva Sankalp Patel

,

Dmitry Abulkhanov

,

,

,

,

,

Urvashi Bhattacharyya

,

,

,

,

,

,

,

,

,

,

,

Claire Schlesinger

,

Hailey Schoelkopf

,

,

,

,

,

Jennifer Robinson

,

Carolyn Jane Anderson

,

Brendan Dolan-Gavitt

,

Danish Contractor

,

,

,

Dzmitry Bahdanau

,

,

Carlos Muñoz Ferrandis

,

,

,

,

Leandro von Werra

,

Trans. Mach. Learn. Res., 2023

Fake News Detectors are Biased against Texts Generated by Large Language Models.

[DOI]

,

,

Jonibek Mansurov

,

,

CoRR, 2023

Pop Quiz! Do Pre-trained Code Models Possess Knowledge of Correct API Names?

[DOI]

,

,

,

,

,

,

CoRR, 2023

Data Augmentation Approaches for Source Code Models: A Survey.

[DOI]

,

,

,

,

,

,

,

CoRR, 2023

Large Language Models Are State-of-the-Art Evaluators of Code Generation.

[DOI]

CoRR, 2023

Exploring AI Ethics of ChatGPT: A Diagnostic Analysis.

[DOI]

,

,

,

CoRR, 2023

SantaCoder: don't reach for the stars!

[DOI]

CoRR, 2023

Training-free Lexical Backdoor Attacks on Language Models.

[DOI]

,

,

,

,

,

Proceedings of the ACM Web Conference 2023, 2023

DetectLLM: Leveraging Log Rank Information for Zero-Shot Detection of Machine-Generated Text.

[DOI]

,

,

,

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Can ChatGPT Perform Reasoning Using the IRAC Method in Analyzing Legal Scenarios Like a Lawyer?

[DOI]

,

,

,

,

,

Patrick Charles Emerton

,

Genevieve Grant

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

ViLPAct: A Benchmark for Compositional Generalization on Multimodal Human Activities.

[DOI]

,

,

,

,

,

,

,

Proceedings of the Findings of the Association for Computational Linguistics: EACL 2023, 2023

On Robustness of Prompt-based Semantic Parsing with Large Pre-trained Language Model: An Empirical Study on Codex.

[DOI]

,

,

,

,

,

Gholamreza Haffari

,

Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 2023

Rethinking Round-Trip Translation for Machine Translation Evaluation.

[DOI]

,

,

,

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

FACTUAL: A Benchmark for Faithful and Consistent Textual Scene Graph Parsing.

[DOI]

,

,

,

,

Gholamreza Haffari

,

,

,

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

2022

Rethinking Round-trip Translation for Automatic Machine Translation Evaluation.

[DOI]

,

,

,

CoRR, 2022

Paraphrasing Techniques for Maritime QA system.

[DOI]

,

,

,

,

,

,

,

Proceedings of the 25th International Conference on Information Fusion, 2022

2021

PyArmadillo: a streamlined linear algebra library for Python.

[DOI]

,

,

Conrad Sanderson

J. Open Source Softw., 2021

Neural-Symbolic Commonsense Reasoner with Relation Predictors.

[DOI]

Farhad Moghimifar

,

,

,

Gholamreza Haffari

,

Mahsa Baktashmotlagh

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

2020

CosMo: Conditional Seq2Seq-based Mixture Model for Zero-Shot Commonsense Question Answering.

[DOI]

Farhad Moghimifar

,

,

,

Mahsa Baktashmotlagh

,

Gholamreza Haffari

Proceedings of the 28th International Conference on Computational Linguistics, 2020

Loading...