Yaodong Yang

Orcid: 0000-0001-8132-5613

Affiliations:
  • Peking University, Institute for AI, Beijing, China
  • King's College London, UK (former)
  • Huawei Technologies, Noah's Ark Lab, UK (former)
  • University College London, UK (PhD)


According to our database1, Yaodong Yang authored at least 231 papers between 2017 and 2025.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
ReDMan: reliable dexterous manipulation with safe reinforcement learning.
Mach. Learn., August, 2025

Goal Discovery with Causal Capacity for Efficient Reinforcement Learning.
CoRR, August, 2025

Fault Tolerant Multi-Agent Learning with Adversarial Budget Constraints.
CoRR, August, 2025

Never Compromise to Vulnerabilities: A Comprehensive Survey on AI Governance.
CoRR, August, 2025

Scaling DRL for Decision Making: A Survey on Data, Network, and Training Budget Strategies.
CoRR, August, 2025

Re:Form - Reducing Human Priors in Scalable Formal Software Verification with RL in LLMs: A Preliminary Study on Dafny.
CoRR, July, 2025

A Survey on Vision-Language-Action Models: An Action Tokenization Perspective.
CoRR, July, 2025

Distributed Policy Space Response Oracles in Two-Player Zero-Sum Games.
IEEE Trans. Neural Networks Learn. Syst., June, 2025

ClutterDexGrasp: A Sim-to-Real System for General Dexterous Grasping in Cluttered Scenes.
CoRR, June, 2025

A Game-Theoretic Negotiation Framework for Cross-Cultural Consensus in LLMs.
CoRR, June, 2025

SafeLawBench: Towards Safe Alignment of Large Language Models.
CoRR, June, 2025

SIV-Bench: A Video Benchmark for Social Interaction Understanding and Reasoning.
CoRR, June, 2025

InterMT: Multi-Turn Interleaved Preference Alignment with Human Feedback.
CoRR, May, 2025

From Strangers to Assistants: Fast Desire Alignment for Embodied Agent-User Adaptation.
CoRR, May, 2025

Risk-aware Direct Preference Optimization under Nested Risk Measure.
CoRR, May, 2025

The Mirage of Multimodality: Where Truth is Tested and Honesty Unravels.
CoRR, May, 2025

EuroCon: Benchmarking Parliament Deliberation for Political Consensus Finding.
CoRR, May, 2025

Mitigating Deceptive Alignment via Self-Monitoring.
CoRR, May, 2025

Generative RLHF-V: Learning Principles from Multi-modal Human Preference.
CoRR, May, 2025

Adaptive Visuo-Tactile Fusion with Predictive Force Attention for Dexterous Manipulation.
CoRR, May, 2025

J1: Exploring Simple Test-Time Scaling for LLM-as-a-Judge.
CoRR, May, 2025

Redefining Superalignment: From Weak-to-Strong Alignment to Human-AI Co-Alignment to Sustainable Symbiotic Society.
CoRR, April, 2025

A Comprehensive Survey in LLM(-Agent) Full Stack Safety: Data, Training and Deployment.
CoRR, April, 2025

Benchmarking Multi-National Value Alignment for Large Language Models.
CoRR, April, 2025

JARVIS-1: Open-World Multi-Task Agents With Memory-Augmented Multimodal Language Models.
IEEE Trans. Pattern Anal. Mach. Intell., March, 2025

Dexterous Non-Prehensile Manipulation for Ungraspable Object via Extrinsic Dexterity.
CoRR, March, 2025

Safe RLHF-V: Safe Reinforcement Learning from Human Feedback in Multimodal Large Language Models.
CoRR, March, 2025

ThinkPatterns-21k: A Systematic Study on the Impact of Thinking Patterns in LLMs.
CoRR, March, 2025

SafeVLA: Towards Safety Alignment of Vision-Language-Action Model via Safe Reinforcement Learning.
CoRR, March, 2025

Fast Visuomotor Policies via Partial Denoising.
CoRR, March, 2025

DexGraspVLA: A Vision-Language-Action Framework Towards General Dexterous Grasping.
CoRR, February, 2025

Retrieval Dexterity: Efficient Object Retrieval in Clutters with Dexterous Hand.
CoRR, February, 2025

SAE-V: Interpreting Multimodal Models for Enhanced Alignment.
CoRR, February, 2025

Model Evolution Framework with Genetic Algorithm for Multi-Task Reinforcement Learning.
CoRR, February, 2025

RedStar: Does Scaling Long-CoT Data Unlock Better Slow-Reasoning Systems?
CoRR, January, 2025

Approximating N-Player Nash Equilibrium through Gradient Descent.
CoRR, January, 2025

Attacking cooperative multi-agent reinforcement learning by adversarial minority influence.
Neural Networks, 2025

TIMAR: Transition-informed representation for sample-efficient multi-agent reinforcement learning.
Neural Networks, 2025

Can large language models independently complete tasks? A dynamic evaluation framework for multi-turn task planning and completion.
Neurocomputing, 2025

Mean Field Correlated Imitation Learning.
Proceedings of the 24th International Conference on Autonomous Agents and Multiagent Systems, 2025

Hierarchical Multi-Agent Framework for Dynamic Macroeconomic Modelling Using Large Language Models.
Proceedings of the 24th International Conference on Autonomous Agents and Multiagent Systems, 2025

Amulet: ReAlignment During Test Time for Personalized Preference Adaptation of LLMs.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Emerging Safety Attack and Defense in Federated Instruction Tuning of Large Language Models.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Magnetic Preference Optimization: Achieving Last-iterate Convergence for Language Model Alignment.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

In-Context Editing: Learning Knowledge from Self-Induced Distributions.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Mitigating Reward Over-Optimization in RLHF via Behavior-Supported Regularization.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Heterogeneous Value Alignment Evaluation for Large Language Models.
Proceedings of the Artificial General Intelligence - 18th International Conference, 2025

Roadmap on Incentive Compatibility for AI Alignment and Governance in Sociotechnical Systems.
Proceedings of the Artificial General Intelligence - 18th International Conference, 2025

Reward Generalization in RLHF: A Topological Perspective.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

Benchmarking Multi-National Value Alignment for Large Language Models.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

Language Models Resist Alignment: Evidence From Data Compression.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

PKU-SafeRLHF: Towards Multi-Level Safety Alignment for LLMs with Human Preference.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Boosting Policy and Process Reward Models with Monte Carlo Tree Search in Open-Domain QA.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

SafeLawBench: Towards Safe Alignment of Large Language Models.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

Sequence to Sequence Reward Modeling: Improving RLHF by Language Feedback.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

Differentiable Information Enhanced Model-Based Reinforcement Learning.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

Stream Aligner: Efficient Sentence-Level Alignment via Distribution Induction.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

Towards Efficient Collaboration via Graph Modeling in Reinforcement Learning.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

RAT: Adversarial Attacks on Deep Reinforcement Agents for Targeted Behaviors.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024
Carbon trading supply chain management based on constrained deep reinforcement learning.
Auton. Agents Multi Agent Syst., December, 2024

Self-Supervised MAFENN for Classifying Low-Labeled Distorted Images Over Mobile Fading Channels.
IEEE Trans. Mob. Comput., August, 2024

ASP: Learn a Universal Neural Solver!
IEEE Trans. Pattern Anal. Mach. Intell., June, 2024

Grasp Multiple Objects With One Hand.
IEEE Robotics Autom. Lett., May, 2024

Bi-DexHands: Towards Human-Level Bimanual Dexterous Manipulation.
IEEE Trans. Pattern Anal. Mach. Intell., May, 2024

MaskMA: Towards Zero-Shot Multi-Agent Decision Making with Mask-Based Collaborative Learning.
Trans. Mach. Learn. Res., 2024

RoMAT: Role-based multi-agent transformer for generalizable heterogeneous cooperation.
Neural Networks, 2024

Adaptive pessimism via target Q-value for offline reinforcement learning.
Neural Networks, 2024

Heterogeneous-Agent Reinforcement Learning.
J. Mach. Learn. Res., 2024

Libra-Leaderboard: Towards Responsible AI through a Balanced Leaderboard of Safety and Capability.
CoRR, 2024

Align Anything: Training All-Modality Models to Follow Instructions with Language Feedback.
CoRR, 2024

Random Feature Models with Learnable Activation Functions.
CoRR, 2024

Learning Uniformly Distributed Embedding Clusters of Stylistic Skills for Physically Simulated Characters.
CoRR, 2024

Sample-Efficient Regret-Minimizing Double Oracle in Extensive-Form Games.
CoRR, 2024

Magnetic Preference Optimization: Achieving Last-iterate Convergence for Language Models Alignment.
CoRR, 2024

Computing Ex Ante Equilibrium in Heterogeneous Zero-Sum Team Games.
CoRR, 2024

Sequence to Sequence Reward Modeling: Improving RLHF by Language Feedback.
CoRR, 2024

A Survey on Self-play Methods in Reinforcement Learning.
CoRR, 2024

PKU-SafeRLHF: A Safety Alignment Preference Dataset for Llama Family Models.
CoRR, 2024

SafeSora: Towards Safety Alignment of Text2Video Generation via a Human Preference Dataset.
CoRR, 2024

Language Models Resist Alignment.
CoRR, 2024

Fusion-PSRO: Nash Policy Fusion for Policy Space Response Oracles.
CoRR, 2024

Efficient Model-agnostic Alignment via Bayesian Persuasion.
CoRR, 2024

Efficient Preference-based Reinforcement Learning via Aligned Experience Estimation.
CoRR, 2024

Correlated Mean Field Imitation Learning.
CoRR, 2024

INSIGHT: End-to-End Neuro-Symbolic Visual Reinforcement Learning with Language Explanations.
CoRR, 2024

UniDexFPM: Universal Dexterous Functional Pre-grasp Manipulation Via Diffusion Policy.
CoRR, 2024

Leveraging Team Correlation for Approximating Equilibrium in Two-Team Zero-Sum Games.
CoRR, 2024

Incentive Compatibility for AI Alignment in Sociotechnical Systems: Positions and Prospects.
CoRR, 2024

Rethinking Information Structures in RLHF: Reward Generalization from a Graph Theory Perspective.
CoRR, 2024

Aligner: Achieving Efficient Alignment through Weak-to-Strong Correction.
CoRR, 2024

CivRealm: A Learning and Reasoning Odyssey in Civilization for Decision-Making Agents.
CoRR, 2024

Panacea: Pareto Alignment via Preference Adaptation for LLMs.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Scalable Constrained Policy Optimization for Safe Multi-agent Reinforcement Learning.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

ProgressGym: Alignment with a Millennium of Moral Progress.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Aligner: Efficient Alignment by Learning to Correct.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

SafeSora: Towards Safety Alignment of Text2Video Generation via a Human Preference Dataset.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Remember the Past for Better Future: Memory-Augmented Offline RL.
Proceedings of the International Joint Conference on Neural Networks, 2024

Off-Agent Trust Region Policy Optimization.
Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024

End-to-End Neuro-Symbolic Reinforcement Learning with Textual Explanations.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Efficient Adaptation in Mixed-Motive Environments via Hierarchical Opponent Modeling and Planning.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Safe Reinforcement Learning using Finite-Horizon Gradient-based Estimation.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

CivRealm: A Learning and Reasoning Odyssey in Civilization for Decision-Making Agents.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Maximum Entropy Heterogeneous-Agent Reinforcement Learning.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Byzantine Robust Cooperative Multi-Agent Reinforcement Learning as a Bayesian Game.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Safe RLHF: Safe Reinforcement Learning from Human Feedback.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

SafeDreamer: Safe Reinforcement Learning with World Models.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

AnySkill: Learning Open-Vocabulary Physical Skill for Interactive Agents.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Neural Attention Field: Emerging Point Relevance in 3D Scenes for One-Shot Dexterous Grasping.
Proceedings of the Conference on Robot Learning, 6-9 November 2024, Munich, Germany., 2024

Object-Centric Dexterous Manipulation from Human Motion Data.
Proceedings of the Conference on Robot Learning, 6-9 November 2024, Munich, Germany., 2024

A Summary of Online Markov Decision Processes with Non-oblivious Strategic Adversary.
Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems, 2024

GIPUT: Maximizing Photo Coverage Efficiency for UAV Trajectory.
Proceedings of the Web and Big Data - 8th International Joint Conference, 2024

ProAgent: Building Proactive Cooperative Agents with Large Language Models.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

A Perspective of Q-value Estimation on Offline-to-Online Reinforcement Learning.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

STAS: Spatial-Temporal Return Decomposition for Solving Sparse Rewards Problems in Multi-agent Reinforcement Learning.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Large sequence models for sequential decision-making: a survey.
Frontiers Comput. Sci., December, 2023

Safe multi-agent reinforcement learning for multi-robot control.
Artif. Intell., June, 2023

Online Markov decision processes with non-oblivious strategic adversary.
Auton. Agents Multi Agent Syst., June, 2023

Offline Pre-trained Multi-agent Decision Transformer.
Mach. Intell. Res., April, 2023

JiangJun: Mastering Xiangqi by Tackling Non-Transitivity in Two-Player Zero-Sum Games.
Trans. Mach. Learn. Res., 2023

MALib: A Parallel Framework for Population-based Multi-agent Reinforcement Learning.
J. Mach. Learn. Res., 2023

TorchOpt: An Efficient Library for Differentiable Optimization.
J. Mach. Learn. Res., 2023

MARLlib: A Scalable and Efficient Multi-agent Reinforcement Learning Library.
J. Mach. Learn. Res., 2023

AI Alignment: A Comprehensive Survey.
CoRR, 2023

Safety-Gymnasium: A Unified Safe Reinforcement Learning Benchmark.
CoRR, 2023

Masked Pretraining for Multi-Agent Decision Making.
CoRR, 2023

MIR2: Towards Provably Robust Multi-Agent Reinforcement Learning by Mutual Information Regularization.
CoRR, 2023

Measuring Value Understanding in Language Models through Discriminator-Critique Gap.
CoRR, 2023

Red Teaming Game: A Game-Theoretic Framework for Red Teaming Language Models.
CoRR, 2023

Mixup-Augmented Meta-Learning for Sample-Efficient Fine-Tuning of Protein Simulators.
CoRR, 2023

ProAgent: Building Proactive Cooperative AI with Large Language Models.
CoRR, 2023

Safe DreamerV3: Safe Reinforcement Learning with World Models.
CoRR, 2023

BeaverTails: Towards Improved Safety Alignment of LLM via a Human-Preference Dataset.
CoRR, 2023

Maximum Entropy Heterogeneous-Agent Mirror Learning.
CoRR, 2023

Deep Reinforcement Learning with Multitask Episodic Memory Based on Task-Conditioned Hypernetwork.
CoRR, 2023

Heterogeneous Value Evaluation for Large Language Models.
CoRR, 2023

Byzantine Robust Cooperative Multi-Agent Reinforcement Learning as a Bayesian Game.
CoRR, 2023

OmniSafe: An Infrastructure for Accelerating Safe Reinforcement Learning Research.
CoRR, 2023

Heterogeneous-Agent Reinforcement Learning.
CoRR, 2023

STAS: Spatial-Temporal Return Decomposition for Multi-agent Reinforcement Learning.
CoRR, 2023

A Human-Centered Safe Robot Reinforcement Learning Framework with Interactive Behaviors.
CoRR, 2023

MANSA: Learning Fast and Slow in Multi-Agent Systems.
CoRR, 2023

MSRL: Distributed Reinforcement Learning with Dataflow Fragments.
Proceedings of the 2023 USENIX Annual Technical Conference, 2023

Multi-Agent First Order Constrained Optimization in Policy Space.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Policy Space Diversity for Non-Transitive Games.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Hierarchical Multi-Agent Skill Discovery.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Team-PSRO for Learning Approximate TMECor in Large Team Games via Cooperative Reinforcement Learning.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Safety Gymnasium: A Unified Safe Reinforcement Learning Benchmark.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

BeaverTails: Towards Improved Safety Alignment of LLM via a Human-Preference Dataset.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

GenDexGrasp: Generalizable Dexterous Grasping.
Proceedings of the IEEE International Conference on Robotics and Automation, 2023

RLAfford: End-to-End Affordance Learning for Robotic Manipulation.
Proceedings of the IEEE International Conference on Robotics and Automation, 2023

GEAR: A GPU-Centric Experience Replay System for Large Reinforcement Learning Models.
Proceedings of the International Conference on Machine Learning, 2023

Regret-Minimizing Double Oracle for Extensive-Form Games.
Proceedings of the International Conference on Machine Learning, 2023

A Game-Theoretic Framework for Managing Risk in Multi-Agent Systems.
Proceedings of the International Conference on Machine Learning, 2023

MANSA: Learning Fast and Slow in Multi-Agent Systems.
Proceedings of the International Conference on Machine Learning, 2023

Quality-Similar Diversity via Population Based Reinforcement Learning.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

UniDexGrasp++: Improving Dexterous Grasping Policy Learning via Geometry-aware Curriculum and Iterative Generalist-Specialist Learning.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Theoretically Guaranteed Policy Improvement Distilled from Model-Based Planning.
Proceedings of the ECAI 2023 - 26th European Conference on Artificial Intelligence, September 30 - October 4, 2023, Kraków, Poland, 2023

Dynamic Handover: Throw and Catch with Bimanual Hands.
Proceedings of the Conference on Robot Learning, 2023

Is Nash Equilibrium Approximator Learnable?
Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems, 2023

Subspace-Aware Exploration for Sparse-Reward Multi-Agent Tasks.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

Learning to Shape Rewards Using a Game of Two Partners.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

ACE: Cooperative Multi-Agent Q-learning with Bidirectional Action-Dependency.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
Online Double Oracle.
Trans. Mach. Learn. Res., 2022

Illiquidity Comovement and Market Crisis.
J. Syst. Sci. Complex., 2022

Contextual Transformer for Offline Meta Reinforcement Learning.
CoRR, 2022

MARLlib: Extending RLlib for Multi-agent Reinforcement Learning.
CoRR, 2022

End-to-End Affordance Learning for Robotic Manipulation.
CoRR, 2022

Heterogeneous-Agent Mirror Learning: A Continuum of Solutions to Cooperative MARL.
CoRR, 2022

Fully Decentralized Model-based Policy Optimization for Networked Systems.
CoRR, 2022

Towards Human-Level Bimanual Dexterous Manipulation with Reinforcement Learning.
CoRR, 2022

Learning Risk-Averse Equilibria in Multi-Agent Systems.
CoRR, 2022

A Review of Safe Reinforcement Learning: Methods, Theory and Applications.
CoRR, 2022

Understanding Value Decomposition Algorithms in Deep Cooperative Multi-Agent Reinforcement Learning.
CoRR, 2022

Settling the Communication Complexity for Distributed Offline Reinforcement Learning.
CoRR, 2022

Efficient Policy Space Response Oracles.
CoRR, 2022

Measuring the Non-Transitivity in Chess.
Algorithms, 2022

Debias the Black-Box: A Fair Ranking Framework via Knowledge Distillation.
Proceedings of the Web Information Systems Engineering - WISE 2022, 2022

Constrained Update Projection Approach to Safe Policy Optimization.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Multi-Agent Reinforcement Learning is a Sequence Modeling Problem.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

MATE: Benchmarking Multi-Agent Reinforcement Learning in Distributed Target Coverage Control.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Meta-Reward-Net: Implicitly Differentiable Reward Learning for Preference-based Reinforcement Learning.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Towards Human-Level Bimanual Dexterous Manipulation with Reinforcement Learning.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

A Theoretical Understanding of Gradient Bias in Meta-Reinforcement Learning.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Scalable Model-based Policy Optimization for Decentralized Networked Systems.
Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2022

On the Convergence of Fictitious Play: A Decomposition Approach.
Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, 2022

LIGS: Learnable Intrinsic-Reward Generation Selection for Multi-Agent Learning.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Trust Region Policy Optimisation in Multi-Agent Reinforcement Learning.
Proceedings of the Tenth International Conference on Learning Representations, 2022

A Game-Theoretic Approach to Multi-agent Trust Region Optimization.
Proceedings of the Distributed Artificial Intelligence - 4th International Conference, 2022

2021
Many-agent reinforcement learning
PhD thesis, 2021

On the Complexity of Computing Markov Perfect Equilibrium in General-Sum Stochastic Games.
Electron. Colloquium Comput. Complex., 2021

Settling the Bias and Variance of Meta-Gradient Estimation for Meta-Reinforcement Learning.
CoRR, 2021

Offline Pre-trained Multi-Agent Decision Transformer: One Big Sequence Model Tackles All SMAC Tasks.
CoRR, 2021

A Game-Theoretic Approach for Improving Generalization Ability of TSP Solvers.
CoRR, 2021

DESTA: A Framework for Safe Reinforcement Learning with Markov Games of Intervention.
CoRR, 2021

Multi-Agent Constrained Policy Optimisation.
CoRR, 2021

Revisiting the Characteristics of Stochastic Gradient Noise and Dynamics.
CoRR, 2021

MALib: A Parallel Framework for Population-based Multi-agent Reinforcement Learning.
CoRR, 2021

Unifying Behavioral and Response Diversity for Open-ended Learning in Zero-sum Games.
CoRR, 2021

Discovering Multi-Agent Auto-Curricula in Two-Player Zero-Sum Games.
CoRR, 2021

Learning to Shape Rewards using a Game of Switching Controls.
CoRR, 2021

Modelling Behavioural Diversity for Learning in Open-Ended Games.
CoRR, 2021

Online Double Oracle.
CoRR, 2021

Towards Unifying Behavioral and Response Diversity for Open-ended Learning in Zero-sum Games.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Settling the Variance of Multi-Agent Policy Gradients.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Neural Auto-Curricula in Two-Player Zero-Sum Games.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021


Modelling Behavioural Diversity for Learning in Open-Ended Games.
Proceedings of the 38th International Conference on Machine Learning, 2021

Learning in Nonzero-Sum Stochastic Games with Potentials.
Proceedings of the 38th International Conference on Machine Learning, 2021

2020
Order Execution Probability and Order Queue in Limit Order Markets.
J. Syst. Sci. Complex., 2020

Can deep learning predict risky retail investors? A case study in financial risk behavior forecasting.
Eur. J. Oper. Res., 2020

An Overview of Multi-Agent Reinforcement Learning from Game Theoretical Perspective.
CoRR, 2020

Replica-Exchange Nosé-Hoover Dynamics for Bayesian Learning on Large Datasets.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Modelling Bounded Rationality in Multi-Agent Interactions by Generalized Recursive Reasoning.
Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, 2020

Multi-Agent Determinantal Q-Learning.
Proceedings of the 37th International Conference on Machine Learning, 2020

Learning to Infer User Hidden States for Online Sequential Advertising.
Proceedings of the CIKM '20: The 29th ACM International Conference on Information and Knowledge Management, 2020

αα-Rank: Practically Scaling α-Rank through Stochastic Optimisation.
Proceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems, 2020

Sequential Advertising Agent with Interpretable User Hidden Intents.
Proceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems, 2020

Bi-Level Actor-Critic for Multi-Agent Coordination.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019
Efficient Ridesharing Order Dispatching with Mean Field Multi-Agent Reinforcement Learning.
CoRR, 2019

Multi-Agent Generalized Recursive Reasoning.
CoRR, 2019

Efficient Ridesharing Order Dispatching with Mean Field Multi-Agent Reinforcement Learning.
Proceedings of the World Wide Web Conference, 2019

Probabilistic Recursive Reasoning for Multi-Agent Reinforcement Learning.
Proceedings of the 7th International Conference on Learning Representations, 2019

Adversarial Variational Bayes Methods for Tweedie Compound Poisson Mixed Models.
Proceedings of the IEEE International Conference on Acoustics, 2019

Factorized Q-learning for large-scale multi-agent systems.
Proceedings of the First International Conference on Distributed Artificial Intelligence, 2019

2018
Benchmarking Deep Sequential Models on Volatility Predictions for Financial Time Series.
CoRR, 2018

Factorized Q-Learning for Large-Scale Multi-Agent Systems.
CoRR, 2018

Thermostat-assisted continuously-tempered Hamiltonian Monte Carlo for Bayesian learning.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Mean Field Multi-Agent Reinforcement Learning.
Proceedings of the 35th International Conference on Machine Learning, 2018

A Study of AI Population Dynamics with Million-agent Reinforcement Learning.
Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, 2018

2017
An Empirical Study of AI Population Dynamics with Million-agent Reinforcement Learning.
CoRR, 2017

Multiagent Bidirectionally-Coordinated Nets for Learning to Play StarCraft Combat Games.
CoRR, 2017


  Loading...