We stand with Ukraine

We stand with Ukraine

Yuanzhao Zhai

Orcid: 0000-0003-1385-0074

According to our database¹, Yuanzhao Zhai authored at least 31 papers between 2020 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

Online presence:

on orcid.org

On csauthors.net:

Bibliography

2026

Uncertainty-penalized reinforcement learning from human feedback with diversified reward LoRA ensembles.

[DOI]

,

,

,

,

,

,

,

Inf. Process. Manag., 2026

2025

Empowering Large Language Model Agent through Step-Level Self-Critique and Self-Training.

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2025

Preference-Strength-Aware Self-Improving Alignment with Generative Preference Models.

[DOI]

,

,

,

,

,

,

,

,

,

,

Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2025

Extracting Reasoning Patterns from Knowledge Graph to Enhance LLMs' Reasoning Capability.

[DOI]

,

,

,

,

Proceedings of the 2025 IEEE International Conference on Joint Cloud Computing (JCC), 2025

GRACE: Graph-Adapted Case-Augmented Execution for Tool Use in Large Language Models.

[DOI]

,

,

,

Proceedings of the 31th IEEE International Conference on Parallel and Distributed Systems, 2025

COPR: Continual Human Preference Learning via Optimal Policy Regularization.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the Findings of the Association for Computational Linguistics, 2025

Correcting Large Language Model Behavior via Influence Function.

[DOI]

,

,

,

,

,

,

,

,

,

,

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

Enhancing Decision-Making for LLM Agents via Step-Level Q-Value Models.

[DOI]

,

,

,

,

,

,

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024

Self-Supervised Exploration via Temporal Inconsistency in Reinforcement Learning.

[DOI]

,

,

,

,

,

,

IEEE Trans. Artif. Intell., November, 2024

Nuclear Norm Maximization-Based Curiosity-Driven Reinforcement Learning.

[DOI]

,

,

,

,

,

,

,

,

IEEE Trans. Artif. Intell., May, 2024

Dynamic Memory-Based Curiosity: A Bootstrap Approach for Exploration in Reinforcement Learning.

[DOI]

,

,

,

,

,

,

,

IEEE Trans. Emerg. Top. Comput. Intell., April, 2024

C3F: Constant Collaboration and Communication Framework for Graph-Representation Dynamic Multi-Robotic Systems.

[DOI]

,

,

,

,

,

IEEE Robotics Autom. Lett., January, 2024

Online Self-Preferring Language Models.

[DOI]

,

,

,

,

,

,

,

,

CoRR, 2024

Uncertainty-Penalized Reinforcement Learning from Human Feedback with Diverse Reward LoRA Ensembles.

[DOI]

,

,

,

,

,

,

,

CoRR, 2024

Iterative Regularized Policy Optimization with Imperfect Demonstrations.

[DOI]

,

,

,

,

,

,

,

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Nuclear-Norm Maximization for Low-Rank Updates.

[DOI]

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2024

Optimistic Model Rollouts for Pessimistic Offline Policy Optimization.

[DOI]

,

,

,

,

,

,

,

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

COPF: Continual Learning Human Preference through Optimal Policy Fitting.

[DOI]

,

,

,

,

,

CoRR, 2023

Bi-level Multi-Agent Actor-Critic Methods with ransformers.

[DOI]

,

,

,

,

,

Proceedings of the IEEE International Conference on Joint Cloud Computing, 2023

Diversifying Message Aggregation in Multi-Agent Communication Via Normalized Tensor Nuclear Norm Regularization.

[DOI]

,

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2023

Progressive Diversifying Policy for Multi-Agent Reinforcement Learning.

[DOI]

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2023

2022

CRMRL: Collaborative Relationship Meta Reinforcement Learning for Effectively Adapting to Type Changes in Multi-Robotic System.

[DOI]

,

,

,

,

,

IEEE Robotics Autom. Lett., 2022

A Fast and Robust Solution for Common Knowledge Formation in Decentralized Swarm Robots.

[DOI]

,

,

,

,

,

J. Intell. Robotic Syst., 2022

Self-Supervised Exploration via Temporal Inconsistency in Reinforcement Learning.

[DOI]

,

,

,

,

,

,

,

CoRR, 2022

Dynamic Memory-based Curiosity: A Bootstrap Approach for Exploration.

[DOI]

,

,

,

,

,

,

,

CoRR, 2022

Exploring Policy Diversity in Parallel Actor-Critic Learning.

[DOI]

,

,

,

,

,

Proceedings of the 34th IEEE International Conference on Tools with Artificial Intelligence, 2022

Pseudo Reward and Action Importance Classification for Sparse Reward Problem.

[DOI]

,

,

,

,

Proceedings of the ICMLC 2022: 14th International Conference on Machine Learning and Computing, Guangzhou, China, February 18, 2022

2021

Cloudroid Swarm: A QoS-Aware Framework for Multirobot Cooperation Offloading.

[DOI]

,

,

,

Wirel. Commun. Mob. Comput., 2021

Decentralized Multi-Robot Collision Avoidance in Complex Scenarios With Selective Communication.

[DOI]

,

,

,

,

,

IEEE Robotics Autom. Lett., 2021

Accelerating Robot Reinforcement Learning with Samples of Different Simulation Precision.

[DOI]

,

,

,

,

,

Proceedings of the 2021 IEEE 23rd Int Conf on High Performance Computing & Communications; 7th Int Conf on Data Science & Systems; 19th Int Conf on Smart City; 7th Int Conf on Dependability in Sensor, 2021

2020

Cooperative Offloading for Multiple Robot Applications.

[DOI]

,

,

,

,

,

,

Proceedings of the IEEE International Conference on Joint Cloud Computing, 2020

Loading...