Zongzhang Zhang

Orcid: 0000-0002-9238-4747

Affiliations:

Nanjing University, National Key Laboratory for Novel Software Technology, Nanjing, China
Soochow University, School of Computer Science and Technology, Suzhou, China (former)
University of Science and Technology of China, School of Computer Science and Technology, Hefei, China (former, PhD)

According to our database¹, Zongzhang Zhang authored at least 103 papers between 2010 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Bibliography

2026

Unleashing Humanoid Reaching Potential via Real-World-Ready Skill Space.

[BibT_eX]

[DOI]

IEEE Robotics Autom. Lett., February, 2026

ASTER: Agentic Scaling with Tool-integrated Extended Reasoning.

[BibT_eX]

[DOI]

CoRR, February, 2026

Reward Model Evaluation via Automatically-Ranked Policy Alignment.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

Meta-Normalizing Flow for Data-Limited Offline Meta-Reinforcement Learning (Student Abstract).

[BibT_eX]

[DOI]

Lianghui Liu

Zongzhang Zhang

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

Efficient Preference Alignment via Pareto Exploration (Student Abstract).

[BibT_eX]

[DOI]

Pengfei Liu

Rui Kong

Zongzhang Zhang

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

Multi-agent In-context Coordination via Decentralized Memory Retrieval.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025

Improving Sample Efficiency of Reinforcement Learning With Background Knowledge From Large Language Models.

[BibT_eX]

[DOI]

IEEE Trans. Neural Networks Learn. Syst., November, 2025

Learning to Coordinate With Different Teammates via Team Probing.

[BibT_eX]

[DOI]

IEEE Trans. Neural Networks Learn. Syst., September, 2025

Generalizable Multi-Modal Adversarial Imitation Learning for Non-Stationary Dynamics.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., July, 2025

EvaLearn: Quantifying the Learning Capability and Efficiency of LLMs via Sequential Problem Solving.

[BibT_eX]

[DOI]

CoRR, June, 2025

Efficient Communication via Self-Supervised Information Aggregation for Online and Offline Multiagent Reinforcement Learning.

[BibT_eX]

[DOI]

IEEE Trans. Neural Networks Learn. Syst., May, 2025

Sentence-level Reward Model can Generalize Better for Aligning LLM from Human Preference.

[BibT_eX]

[DOI]

CoRR, March, 2025

Efficient Multi-Agent Cooperation Learning through Teammate Lookahead.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2025

Constraining an Unconstrained Multi-agent Policy with offline data.

[BibT_eX]

[DOI]

Neural Networks, 2025

EvaLearn: Quantifying the Learning Capability and Efficiency of LLMs via Sequential Problem Solving.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, 2025

Reward Models in Deep Reinforcement Learning: A Survey.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence, 2025

Reinforced In-Context Black-Box Optimization.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence, 2025

Behavior-Regularized Diffusion Policy Optimization for Offline Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Q-Adapter: Customizing Pre-trained LLMs to New Preferences with Forgetting Mitigation.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Lost in the Context: Insufficient and Distracted Attention to Contexts in Preference Modeling.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024

Communication-robust multi-agent learning by adaptable auxiliary multi-agent adversary generation.

[BibT_eX]

[DOI]

Frontiers Comput. Sci., December, 2024

Model gradient: unified model and policy learning in model-based reinforcement learning.

[BibT_eX]

[DOI]

Frontiers Comput. Sci., August, 2024

One by One, Continual Coordinating with Humans via Hyper-Teammate Identification.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2024

Stable Continual Reinforcement Learning via Diffusion-based Trajectory Replay.

[BibT_eX]

[DOI]

CoRR, 2024

Hindsight Preference Learning for Offline Preference-based Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, 2024

Q-Adapter: Training Your LLM Adapter as a Residual Q-Function.

[BibT_eX]

[DOI]

CoRR, 2024

Alpha<sup>2</sup>: Discovering Logical Formulaic Alphas using Deep Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, 2024

Robust cooperative multi-agent reinforcement learning via multi-view message certification.

[BibT_eX]

[DOI]

Sci. China Inf. Sci., 2024

Multi-agent policy transfer via task relationship modeling.

[BibT_eX]

[DOI]

Sci. China Inf. Sci., 2024

ODRL: A Benchmark for Off-Dynamics Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Multi-Agent Domain Calibration with a Handful of Offline Data.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

Efficient and Stable Offline-to-online Reinforcement Learning via Continual Policy Revitalization.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024

Debiased Offline Representation Learning for Fast Online Adaptation in Non-stationary Dynamics.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Deep Demonstration Tracing: Learning Generalizable Imitator Policy for Runtime Imitation from a Single Demonstration.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Language Model Self-improvement by Reinforcement Learning Contemplation.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Policy Rehearsing: Training Generalizable Policies for Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Attention-Guided Contrastive Role Representations for Multi-agent Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Disentangling Policy from Offline Task Representation Learning via Adversarial Data Augmentation.

[BibT_eX]

[DOI]

Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems, 2024

Deep Anomaly Detection via Active Anomaly Search.

[BibT_eX]

[DOI]

Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems, 2024

Multi-Expert Distillation for Few-Shot Coordination (Student Abstract).

[BibT_eX]

[DOI]

Yujian Zhu

Hao Ding

Zongzhang Zhang

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

Generalizable Task Representation Learning for Offline Meta-Reinforcement Learning with Data Limitations.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

Generalizable Policy Improvement via Reinforcement Sampling (Student Abstract).

[BibT_eX]

[DOI]

Rui Kong

Chenyang Wu

Zongzhang Zhang

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

ACT: Empowering Decision Transformer with Dynamic Programming via Advantage Conditioning.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

Focus-Then-Decide: Segmentation-Assisted Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

Imitator Learning: Achieve Out-of-the-Box Imitation Ability in Variable Environments.

[BibT_eX]

[DOI]

CoRR, 2023

Robust Multi-agent Communication via Multi-view Message Certification.

[BibT_eX]

[DOI]

CoRR, 2023

Efficient Communication via Self-supervised Information Aggregation for Online and Offline Multi-agent Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, 2023

Internal Logical Induction for Pixel-Symbolic Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2023

Policy Regularization with Dataset Constraint for Offline Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

Retrosynthetic Planning with Dual Value Networks.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

Discovering Generalizable Multi-agent Coordination Skills from Multi-task Offline Data.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

How To Guide Your Learner: Imitation Learning with Active Adaptive Expert Involvement.

[BibT_eX]

[DOI]

Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems, 2023

Model-Based Offline Weighted Policy Optimization (Student Abstract).

[BibT_eX]

[DOI]

Renzhe Zhou

Zongzhang Zhang

Yang Yu

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

Anti-drifting Feature Selection via Deep Reinforcement Learning (Student Abstract).

[BibT_eX]

[DOI]

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

Policy-Independent Behavioral Metric-Based Representation for Deep Reinforcement Learning.

[BibT_eX]

[DOI]

Weijian Liao

Zongzhang Zhang

Yang Yu

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

Learning Generalizable Batch Active Learning Strategies via Deep Q-networks (Student Abstract).

[BibT_eX]

[DOI]

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

Expert Data Augmentation in Imitation Learning (Student Abstract).

[BibT_eX]

[DOI]

Fuguang Han

Zongzhang Zhang

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

Towards Deployment-Efficient and Collision-Free Multi-Agent Path Finding (Student Abstract).

[BibT_eX]

[DOI]

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

Deep Anomaly Detection and Search via Reinforcement Learning (Student Abstract).

[BibT_eX]

[DOI]

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022

Multi-Agent Policy Transfer via Task Relationship Modeling.

[BibT_eX]

[DOI]

CoRR, 2022

Bayesian Optimistic Optimization: Optimistic Exploration for Model-based Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Efficient Multi-agent Communication via Self-supervised Information Aggregation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Multi-agent Dynamic Algorithm Configuration.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Multi-Agent Concentrative Coordination with Decentralized Task Representation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, 2022

Efficient Multi-Agent Communication via Shapley Message Value.

[BibT_eX]

[DOI]

Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, 2022

Multi-Agent Incentive Communication via Decentralized Teammate Modeling.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

Adapt to Environment Sudden Changes by Learning a Context Sensitive Policy.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021

Efficient policy detecting and reusing for non-stationarity in Markov games.

[BibT_eX]

[DOI]

Auton. Agents Multi Agent Syst., 2021

Adaptive Online Packing-guided Search for POMDPs.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Cross-modal Domain Adaptation for Cost-Efficient Visual Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Enhancing Context-Based Meta-Reinforcement Learning Algorithms via An Efficient Task Encoder (Student Abstract).

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

LB-DESPOT: Efficient Online POMDP Planning Considering Lower Bound in Action Selection (Student Abstract).

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020

Efficient Multiagent Policy Optimization Based on Weighted Estimators in Stochastic Cooperative Environments.

[BibT_eX]

[DOI]

J. Comput. Sci. Technol., 2020

Efficient Deep Reinforcement Learning via Adaptive Policy Transfer.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, 2020

Triple-GAIL: A Multi-Modal Imitation Learning Framework with Generative Adversarial Nets.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, 2020

Double Replay Buffers with Restricted Gradient.

[BibT_eX]

[DOI]

Linjing Zhang

Zongzhang Zhang

Proceedings of the Neural Information Processing - 27th International Conference, 2020

Recency-Weighted Acceleration for Continuous Control Through Deep Reinforcement Learning.

[BibT_eX]

[DOI]

Zhen Wu

Zongzhang Zhang

Xiaofang Zhang

Proceedings of the Neural Information Processing - 27th International Conference, 2020

Efficient Deep Reinforcement Learning through Policy Transfer.

[BibT_eX]

[DOI]

Proceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems, 2020

Generative Adversarial Imitation Learning from Failed Experiences (Student Abstract).

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

Third-Person Imitation Learning via Image Difference and Variational Discriminator Bottleneck (Student Abstract).

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019

Efficient reinforcement learning in continuous state and action spaces with Dyna and policy approximation.

[BibT_eX]

[DOI]

Frontiers Comput. Sci., 2019

Monte Carlo Tree Search for Policy Optimization.

[BibT_eX]

[DOI]

Xiaobai Ma

Katherine Rose Driggs-Campbell

Zongzhang Zhang

Mykel J. Kochenderfer

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019

Experience Selection in Multi-agent Deep Reinforcement Learning.

[BibT_eX]

[DOI]

Yishen Wang

Zongzhang Zhang

Proceedings of the 31st IEEE International Conference on Tools with Artificial Intelligence, 2019

Deep Recurrent Policy Networks for Planning Under Partial Observability.

[BibT_eX]

[DOI]

Zixuan Chen

Zongzhang Zhang

Proceedings of the Artificial Neural Networks and Machine Learning - ICANN 2019: Theoretical Neural Computation, 2019

2018

Hierarchical Deep Multiagent Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, 2018

Weighted Double Deep Multiagent Reinforcement Learning in Stochastic Cooperative Environments.

[BibT_eX]

[DOI]

Yan Zheng

Jianye Hao

Zongzhang Zhang

CoRR, 2018

Weighted Double Deep Multiagent Reinforcement Learning in Stochastic Cooperative Environments.

[BibT_eX]

[DOI]

Proceedings of the PRICAI 2018: Trends in Artificial Intelligence, 2018

ACGAIL: Imitation Learning About Multiple Intentions with Auxiliary Classifier GANs.

[BibT_eX]

[DOI]

Jiahao Lin

Zongzhang Zhang

Proceedings of the PRICAI 2018: Trends in Artificial Intelligence, 2018

A Deep Bayesian Policy Reuse Approach Against Non-Stationary Agents.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Asynchronous Value Iteration Network.

[BibT_eX]

[DOI]

Zhiyuan Pan

Zongzhang Zhang

Zixuan Chen

Proceedings of the Neural Information Processing - 25th International Conference, 2018

2017

Weighted Double Q-learning.

[BibT_eX]

[DOI]

Zongzhang Zhang

Zhiyuan Pan

Mykel J. Kochenderfer

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, 2017

2016

Reasoning and predicting POMDP planning complexity via covering numbers.

[BibT_eX]

[DOI]

Frontiers Comput. Sci., 2016

Policy graph pruning and optimization in Monte Carlo Value Iteration for continuous-state POMDPs.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE Symposium Series on Computational Intelligence, 2016

Deep Q-Learning with Prioritized Sampling.

[BibT_eX]

[DOI]

Proceedings of the Neural Information Processing - 23rd International Conference, 2016

Covering Number: Analyses for Approximate Continuous-state POMDP Planning (Extended Abstract).

[BibT_eX]

[DOI]

Zongzhang Zhang

Quan Liu

Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems, 2016

2015

PLEASE: Palm Leaf Search for POMDPs with Large Observation Spaces.

[BibT_eX]

[DOI]

Proceedings of the Eighth Annual Symposium on Combinatorial Search, 2015

Intelligent Model Learning Based on Variance for Bayesian Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the 27th IEEE International Conference on Tools with Artificial Intelligence, 2015

Trajectory Sampling Value Iteration: Improved Dyna Search for MDPs.

[BibT_eX]

[DOI]

Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems, 2015

2014

Covering Number for Efficient Heuristic-based POMDP Planning.

[BibT_eX]

[DOI]

Zongzhang Zhang

David Hsu

Wee Sun Lee

Proceedings of the 31th International Conference on Machine Learning, 2014

Thompson Sampling Based Monte-Carlo Planning in POMDPs.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Fourth International Conference on Automated Planning and Scheduling, 2014

2012

FHHOP: A Factored Hybrid Heuristic Online Planning Algorithm for Large POMDPs.

[BibT_eX]

[DOI]

Zongzhang Zhang

Xiaoping Chen

Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence, 2012

Covering Number as a Complexity Measure for POMDP Planning and Learning.

[BibT_eX]

[DOI]

Zongzhang Zhang

Michael L. Littman

Xiaoping Chen

Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence, 2012

2010

Accelerating Point-Based POMDP Algorithms via Greedy Strategies.

[BibT_eX]

[DOI]

Zongzhang Zhang

Xiaoping Chen

Proceedings of the Simulation, Modeling, and Programming for Autonomous Robots, 2010

Zongzhang Zhang

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...