Yuanzhao Zhai

Orcid: 0000-0003-1385-0074

According to our database1, Yuanzhao Zhai authored at least 31 papers between 2020 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

Online presence:

On csauthors.net:

Bibliography

2026
Uncertainty-penalized reinforcement learning from human feedback with diversified reward LoRA ensembles.
Inf. Process. Manag., 2026

2025
Empowering Large Language Model Agent through Step-Level Self-Critique and Self-Training.
Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2025

Preference-Strength-Aware Self-Improving Alignment with Generative Preference Models.
Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2025

Extracting Reasoning Patterns from Knowledge Graph to Enhance LLMs' Reasoning Capability.
Proceedings of the 2025 IEEE International Conference on Joint Cloud Computing (JCC), 2025

GRACE: Graph-Adapted Case-Augmented Execution for Tool Use in Large Language Models.
Proceedings of the 31th IEEE International Conference on Parallel and Distributed Systems, 2025

COPR: Continual Human Preference Learning via Optimal Policy Regularization.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

Correcting Large Language Model Behavior via Influence Function.
Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

Enhancing Decision-Making for LLM Agents via Step-Level Q-Value Models.
Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024
Self-Supervised Exploration via Temporal Inconsistency in Reinforcement Learning.
IEEE Trans. Artif. Intell., November, 2024

Nuclear Norm Maximization-Based Curiosity-Driven Reinforcement Learning.
IEEE Trans. Artif. Intell., May, 2024

Dynamic Memory-Based Curiosity: A Bootstrap Approach for Exploration in Reinforcement Learning.
IEEE Trans. Emerg. Top. Comput. Intell., April, 2024

C3F: Constant Collaboration and Communication Framework for Graph-Representation Dynamic Multi-Robotic Systems.
IEEE Robotics Autom. Lett., January, 2024

Online Self-Preferring Language Models.
CoRR, 2024

Uncertainty-Penalized Reinforcement Learning from Human Feedback with Diverse Reward LoRA Ensembles.
CoRR, 2024

Iterative Regularized Policy Optimization with Imperfect Demonstrations.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Nuclear-Norm Maximization for Low-Rank Updates.
Proceedings of the IEEE International Conference on Acoustics, 2024

Optimistic Model Rollouts for Pessimistic Offline Policy Optimization.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
COPF: Continual Learning Human Preference through Optimal Policy Fitting.
CoRR, 2023

Bi-level Multi-Agent Actor-Critic Methods with ransformers.
Proceedings of the IEEE International Conference on Joint Cloud Computing, 2023

Diversifying Message Aggregation in Multi-Agent Communication Via Normalized Tensor Nuclear Norm Regularization.
Proceedings of the IEEE International Conference on Acoustics, 2023

Progressive Diversifying Policy for Multi-Agent Reinforcement Learning.
Proceedings of the IEEE International Conference on Acoustics, 2023

2022
CRMRL: Collaborative Relationship Meta Reinforcement Learning for Effectively Adapting to Type Changes in Multi-Robotic System.
IEEE Robotics Autom. Lett., 2022

A Fast and Robust Solution for Common Knowledge Formation in Decentralized Swarm Robots.
J. Intell. Robotic Syst., 2022

Self-Supervised Exploration via Temporal Inconsistency in Reinforcement Learning.
CoRR, 2022

Dynamic Memory-based Curiosity: A Bootstrap Approach for Exploration.
CoRR, 2022

Exploring Policy Diversity in Parallel Actor-Critic Learning.
Proceedings of the 34th IEEE International Conference on Tools with Artificial Intelligence, 2022

Pseudo Reward and Action Importance Classification for Sparse Reward Problem.
Proceedings of the ICMLC 2022: 14th International Conference on Machine Learning and Computing, Guangzhou, China, February 18, 2022

2021
Cloudroid Swarm: A QoS-Aware Framework for Multirobot Cooperation Offloading.
Wirel. Commun. Mob. Comput., 2021

Decentralized Multi-Robot Collision Avoidance in Complex Scenarios With Selective Communication.
IEEE Robotics Autom. Lett., 2021

Accelerating Robot Reinforcement Learning with Samples of Different Simulation Precision.
Proceedings of the 2021 IEEE 23rd Int Conf on High Performance Computing & Communications; 7th Int Conf on Data Science & Systems; 19th Int Conf on Smart City; 7th Int Conf on Dependability in Sensor, 2021

2020
Cooperative Offloading for Multiple Robot Applications.
Proceedings of the IEEE International Conference on Joint Cloud Computing, 2020


  Loading...