Yaodong Yang

Orcid: 0000-0001-8132-5613

Affiliations:
  • Peking University, Institute for AI, Beijing, China
  • King's College London, UK (former)
  • Huawei Technologies, Noah's Ark Lab, UK (former)
  • University College London, UK (PhD)


According to our database1, Yaodong Yang authored at least 135 papers between 2017 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Incentive Compatibility for AI Alignment in Sociotechnical Systems: Positions and Prospects.
CoRR, 2024

Rethinking Information Structures in RLHF: Reward Generalization from a Graph Theory Perspective.
CoRR, 2024

Aligner: Achieving Efficient Alignment through Weak-to-Strong Correction.
CoRR, 2024

Panacea: Pareto Alignment via Preference Adaptation for LLMs.
CoRR, 2024

CivRealm: A Learning and Reasoning Odyssey in Civilization for Decision-Making Agents.
CoRR, 2024

ProAgent: Building Proactive Cooperative Agents with Large Language Models.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

A Perspective of Q-value Estimation on Offline-to-Online Reinforcement Learning.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

STAS: Spatial-Temporal Return Decomposition for Solving Sparse Rewards Problems in Multi-agent Reinforcement Learning.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Large sequence models for sequential decision-making: a survey.
Frontiers Comput. Sci., December, 2023

Safe multi-agent reinforcement learning for multi-robot control.
Artif. Intell., June, 2023

Online Markov decision processes with non-oblivious strategic adversary.
Auton. Agents Multi Agent Syst., June, 2023

Offline Pre-trained Multi-agent Decision Transformer.
Mach. Intell. Res., April, 2023

MALib: A Parallel Framework for Population-based Multi-agent Reinforcement Learning.
J. Mach. Learn. Res., 2023

JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models.
CoRR, 2023

AI Alignment: A Comprehensive Survey.
CoRR, 2023

Grasp Multiple Objects with One Hand.
CoRR, 2023

Safe RLHF: Safe Reinforcement Learning from Human Feedback.
CoRR, 2023

Safety-Gymnasium: A Unified Safe Reinforcement Learning Benchmark.
CoRR, 2023

Masked Pretraining for Multi-Agent Decision Making.
CoRR, 2023

MIR2: Towards Provably Robust Multi-Agent Reinforcement Learning by Mutual Information Regularization.
CoRR, 2023

Measuring Value Understanding in Language Models through Discriminator-Critique Gap.
CoRR, 2023

Red Teaming Game: A Game-Theoretic Framework for Red Teaming Language Models.
CoRR, 2023

Dynamic Handover: Throw and Catch with Bimanual Hands.
CoRR, 2023

Mixup-Augmented Meta-Learning for Sample-Efficient Fine-Tuning of Protein Simulators.
CoRR, 2023

ProAgent: Building Proactive Cooperative AI with Large Language Models.
CoRR, 2023

JiangJun: Mastering Xiangqi by Tackling Non-Transitivity in Two-Player Zero-Sum Games.
CoRR, 2023

Theoretically Guaranteed Policy Improvement Distilled from Model-Based Planning.
CoRR, 2023

Safe DreamerV3: Safe Reinforcement Learning with World Models.
CoRR, 2023

BeaverTails: Towards Improved Safety Alignment of LLM via a Human-Preference Dataset.
CoRR, 2023

Maximum Entropy Heterogeneous-Agent Mirror Learning.
CoRR, 2023

Deep Reinforcement Learning with Multitask Episodic Memory Based on Task-Conditioned Hypernetwork.
CoRR, 2023

Heterogeneous Value Evaluation for Large Language Models.
CoRR, 2023

Byzantine Robust Cooperative Multi-Agent Reinforcement Learning as a Bayesian Game.
CoRR, 2023

OmniSafe: An Infrastructure for Accelerating Safe Reinforcement Learning Research.
CoRR, 2023

Heterogeneous-Agent Reinforcement Learning.
CoRR, 2023

STAS: Spatial-Temporal Return Decomposition for Multi-agent Reinforcement Learning.
CoRR, 2023

ASP: Learn a Universal Neural Solver!
CoRR, 2023

A Human-Centered Safe Robot Reinforcement Learning Framework with Interactive Behaviors.
CoRR, 2023

MANSA: Learning Fast and Slow in Multi-Agent Systems.
CoRR, 2023

MSRL: Distributed Reinforcement Learning with Dataflow Fragments.
Proceedings of the 2023 USENIX Annual Technical Conference, 2023

Multi-Agent First Order Constrained Optimization in Policy Space.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Policy Space Diversity for Non-Transitive Games.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Hierarchical Multi-Agent Skill Discovery.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Team-PSRO for Learning Approximate TMECor in Large Team Games via Cooperative Reinforcement Learning.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Safety Gymnasium: A Unified Safe Reinforcement Learning Benchmark.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

BeaverTails: Towards Improved Safety Alignment of LLM via a Human-Preference Dataset.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

GenDexGrasp: Generalizable Dexterous Grasping.
Proceedings of the IEEE International Conference on Robotics and Automation, 2023

RLAfford: End-to-End Affordance Learning for Robotic Manipulation.
Proceedings of the IEEE International Conference on Robotics and Automation, 2023

GEAR: A GPU-Centric Experience Replay System for Large Reinforcement Learning Models.
Proceedings of the International Conference on Machine Learning, 2023

Regret-Minimizing Double Oracle for Extensive-Form Games.
Proceedings of the International Conference on Machine Learning, 2023

A Game-Theoretic Framework for Managing Risk in Multi-Agent Systems.
Proceedings of the International Conference on Machine Learning, 2023

MANSA: Learning Fast and Slow in Multi-Agent Systems.
Proceedings of the International Conference on Machine Learning, 2023

Quality-Similar Diversity via Population Based Reinforcement Learning.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

UniDexGrasp++: Improving Dexterous Grasping Policy Learning via Geometry-aware Curriculum and Iterative Generalist-Specialist Learning.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Theoretically Guaranteed Policy Improvement Distilled from Model-Based Planning.
Proceedings of the ECAI 2023 - 26th European Conference on Artificial Intelligence, September 30 - October 4, 2023, Kraków, Poland, 2023

Dynamic Handover: Throw and Catch with Bimanual Hands.
Proceedings of the Conference on Robot Learning, 2023

Is Nash Equilibrium Approximator Learnable?
Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems, 2023

Subspace-Aware Exploration for Sparse-Reward Multi-Agent Tasks.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

Learning to Shape Rewards Using a Game of Two Partners.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

ACE: Cooperative Multi-Agent Q-learning with Bidirectional Action-Dependency.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
Online Double Oracle.
Trans. Mach. Learn. Res., 2022

Illiquidity Comovement and Market Crisis.
J. Syst. Sci. Complex., 2022

Contextual Transformer for Offline Meta Reinforcement Learning.
CoRR, 2022

TorchOpt: An Efficient Library for Differentiable Optimization.
CoRR, 2022

MARLlib: Extending RLlib for Multi-agent Reinforcement Learning.
CoRR, 2022

End-to-End Affordance Learning for Robotic Manipulation.
CoRR, 2022

Constrained Update Projection Approach to Safe Policy Optimization.
CoRR, 2022

Heterogeneous-Agent Mirror Learning: A Continuum of Solutions to Cooperative MARL.
CoRR, 2022

Fully Decentralized Model-based Policy Optimization for Networked Systems.
CoRR, 2022

Towards Human-Level Bimanual Dexterous Manipulation with Reinforcement Learning.
CoRR, 2022

Learning Risk-Averse Equilibria in Multi-Agent Systems.
CoRR, 2022

A Review of Safe Reinforcement Learning: Methods, Theory and Applications.
CoRR, 2022

Understanding Value Decomposition Algorithms in Deep Cooperative Multi-Agent Reinforcement Learning.
CoRR, 2022

Settling the Communication Complexity for Distributed Offline Reinforcement Learning.
CoRR, 2022

Efficient Policy Space Response Oracles.
CoRR, 2022

Measuring the Non-Transitivity in Chess.
Algorithms, 2022

Debias the Black-Box: A Fair Ranking Framework via Knowledge Distillation.
Proceedings of the Web Information Systems Engineering - WISE 2022, 2022

Constrained Update Projection Approach to Safe Policy Optimization.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Multi-Agent Reinforcement Learning is a Sequence Modeling Problem.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

MATE: Benchmarking Multi-Agent Reinforcement Learning in Distributed Target Coverage Control.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Meta-Reward-Net: Implicitly Differentiable Reward Learning for Preference-based Reinforcement Learning.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Towards Human-Level Bimanual Dexterous Manipulation with Reinforcement Learning.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

A Theoretical Understanding of Gradient Bias in Meta-Reinforcement Learning.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Transformer-based Working Memory for Multiagent Reinforcement Learning with Action Parsing.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Scalable Model-based Policy Optimization for Decentralized Networked Systems.
Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2022

On the Convergence of Fictitious Play: A Decomposition Approach.
Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, 2022

LIGS: Learnable Intrinsic-Reward Generation Selection for Multi-Agent Learning.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Trust Region Policy Optimisation in Multi-Agent Reinforcement Learning.
Proceedings of the Tenth International Conference on Learning Representations, 2022

A Game-Theoretic Approach to Multi-agent Trust Region Optimization.
Proceedings of the Distributed Artificial Intelligence - 4th International Conference, 2022

What about Inputting Policy in Value Function: Policy Representation and Policy-Extended Value Function Approximator.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
On the Complexity of Computing Markov Perfect Equilibrium in General-Sum Stochastic Games.
Electron. Colloquium Comput. Complex., 2021

Settling the Bias and Variance of Meta-Gradient Estimation for Meta-Reinforcement Learning.
CoRR, 2021

Offline Pre-trained Multi-Agent Decision Transformer: One Big Sequence Model Tackles All SMAC Tasks.
CoRR, 2021

A Game-Theoretic Approach for Improving Generalization Ability of TSP Solvers.
CoRR, 2021

DESTA: A Framework for Safe Reinforcement Learning with Markov Games of Intervention.
CoRR, 2021

Multi-Agent Constrained Policy Optimisation.
CoRR, 2021

Revisiting the Characteristics of Stochastic Gradient Noise and Dynamics.
CoRR, 2021

MALib: A Parallel Framework for Population-based Multi-agent Reinforcement Learning.
CoRR, 2021

Unifying Behavioral and Response Diversity for Open-ended Learning in Zero-sum Games.
CoRR, 2021

Discovering Multi-Agent Auto-Curricula in Two-Player Zero-Sum Games.
CoRR, 2021

Learning to Shape Rewards using a Game of Switching Controls.
CoRR, 2021

Modelling Behavioural Diversity for Learning in Open-Ended Games.
CoRR, 2021

Online Double Oracle.
CoRR, 2021

Towards Unifying Behavioral and Response Diversity for Open-ended Learning in Zero-sum Games.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Settling the Variance of Multi-Agent Policy Gradients.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Neural Auto-Curricula in Two-Player Zero-Sum Games.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021


Modelling Behavioural Diversity for Learning in Open-Ended Games.
Proceedings of the 38th International Conference on Machine Learning, 2021

Learning in Nonzero-Sum Stochastic Games with Potentials.
Proceedings of the 38th International Conference on Machine Learning, 2021

Diverse Auto-Curriculum is Critical for Successful Real-World Multiagent Learning Systems.
Proceedings of the AAMAS '21: 20th International Conference on Autonomous Agents and Multiagent Systems, 2021

2020
Order Execution Probability and Order Queue in Limit Order Markets.
J. Syst. Sci. Complex., 2020

Can deep learning predict risky retail investors? A case study in financial risk behavior forecasting.
Eur. J. Oper. Res., 2020

An Overview of Multi-Agent Reinforcement Learning from Game Theoretical Perspective.
CoRR, 2020

SMARTS: Scalable Multi-Agent Reinforcement Learning Training School for Autonomous Driving.
CoRR, 2020

Replica-Exchange Nosé-Hoover Dynamics for Bayesian Learning on Large Datasets.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Modelling Bounded Rationality in Multi-Agent Interactions by Generalized Recursive Reasoning.
Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, 2020

Multi-Agent Determinantal Q-Learning.
Proceedings of the 37th International Conference on Machine Learning, 2020


Learning to Infer User Hidden States for Online Sequential Advertising.
Proceedings of the CIKM '20: The 29th ACM International Conference on Information and Knowledge Management, 2020

αα-Rank: Practically Scaling α-Rank through Stochastic Optimisation.
Proceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems, 2020

Sequential Advertising Agent with Interpretable User Hidden Intents.
Proceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems, 2020

Bi-Level Actor-Critic for Multi-Agent Coordination.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019
Efficient Ridesharing Order Dispatching with Mean Field Multi-Agent Reinforcement Learning.
CoRR, 2019

Multi-Agent Generalized Recursive Reasoning.
CoRR, 2019

Efficient Ridesharing Order Dispatching with Mean Field Multi-Agent Reinforcement Learning.
Proceedings of the World Wide Web Conference, 2019

Probabilistic Recursive Reasoning for Multi-Agent Reinforcement Learning.
Proceedings of the 7th International Conference on Learning Representations, 2019

Adversarial Variational Bayes Methods for Tweedie Compound Poisson Mixed Models.
Proceedings of the IEEE International Conference on Acoustics, 2019

Factorized Q-learning for large-scale multi-agent systems.
Proceedings of the First International Conference on Distributed Artificial Intelligence, 2019

2018
Benchmarking Deep Sequential Models on Volatility Predictions for Financial Time Series.
CoRR, 2018

Factorized Q-Learning for Large-Scale Multi-Agent Systems.
CoRR, 2018

Thermostat-assisted continuously-tempered Hamiltonian Monte Carlo for Bayesian learning.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Mean Field Multi-Agent Reinforcement Learning.
Proceedings of the 35th International Conference on Machine Learning, 2018

A Study of AI Population Dynamics with Million-agent Reinforcement Learning.
Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, 2018

2017
An Empirical Study of AI Population Dynamics with Million-agent Reinforcement Learning.
CoRR, 2017

Multiagent Bidirectionally-Coordinated Nets for Learning to Play StarCraft Combat Games.
CoRR, 2017


  Loading...