Yaodong Yang

Larry Olanrewaju Orimoloye

CoRR, October, 2025

SafeMT: Multi-turn Safety for Multimodal Language Models.

[BibT_eX]

[DOI]

CoRR, October, 2025

Empirical Study on Robustness and Resilience in Cooperative Multi-Agent Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, October, 2025

On Robustness of Vision-Language-Action Model against Multi-Modal Perturbations.

[BibT_eX]

[DOI]

CoRR, October, 2025

DexFlyWheel: A Scalable and Self-improving Data Generation Framework for Dexterous Manipulation.

[BibT_eX]

[DOI]

CoRR, September, 2025

Vulnerable Agent Identification in Large-Scale Multi-Agent Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, September, 2025

ReDMan: reliable dexterous manipulation with safe reinforcement learning.

[BibT_eX]

[DOI]

Mach. Learn., August, 2025

Enhancing LLM-Based Social Bot via an Adversarial Learning Framework.

[BibT_eX]

[DOI]

CoRR, August, 2025

Goal Discovery with Causal Capacity for Efficient Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, August, 2025

Fault Tolerant Multi-Agent Learning with Adversarial Budget Constraints.

[BibT_eX]

[DOI]

CoRR, August, 2025

Never Compromise to Vulnerabilities: A Comprehensive Survey on AI Governance.

[BibT_eX]

[DOI]

CoRR, August, 2025

Scaling DRL for Decision Making: A Survey on Data, Network, and Training Budget Strategies.

[BibT_eX]

[DOI]

CoRR, August, 2025

Re:Form - Reducing Human Priors in Scalable Formal Software Verification with RL in LLMs: A Preliminary Study on Dafny.

[BibT_eX]

[DOI]

CoRR, July, 2025

A Survey on Vision-Language-Action Models: An Action Tokenization Perspective.

[BibT_eX]

[DOI]

CoRR, July, 2025

Distributed Policy Space Response Oracles in Two-Player Zero-Sum Games.

[BibT_eX]

[DOI]

IEEE Trans. Neural Networks Learn. Syst., June, 2025

ClutterDexGrasp: A Sim-to-Real System for General Dexterous Grasping in Cluttered Scenes.

[BibT_eX]

[DOI]

CoRR, June, 2025

A Game-Theoretic Negotiation Framework for Cross-Cultural Consensus in LLMs.

[BibT_eX]

[DOI]

CoRR, June, 2025

SafeLawBench: Towards Safe Alignment of Large Language Models.

[BibT_eX]

[DOI]

CoRR, June, 2025

SIV-Bench: A Video Benchmark for Social Interaction Understanding and Reasoning.

[BibT_eX]

[DOI]

CoRR, June, 2025

InterMT: Multi-Turn Interleaved Preference Alignment with Human Feedback.

[BibT_eX]

[DOI]

CoRR, May, 2025

From Strangers to Assistants: Fast Desire Alignment for Embodied Agent-User Adaptation.

[BibT_eX]

[DOI]

CoRR, May, 2025

Risk-aware Direct Preference Optimization under Nested Risk Measure.

[BibT_eX]

[DOI]

CoRR, May, 2025

The Mirage of Multimodality: Where Truth is Tested and Honesty Unravels.

[BibT_eX]

[DOI]

CoRR, May, 2025

EuroCon: Benchmarking Parliament Deliberation for Political Consensus Finding.

[BibT_eX]

[DOI]

CoRR, May, 2025

Mitigating Deceptive Alignment via Self-Monitoring.

[BibT_eX]

[DOI]

CoRR, May, 2025

Generative RLHF-V: Learning Principles from Multi-modal Human Preference.

[BibT_eX]

[DOI]

CoRR, May, 2025

Adaptive Visuo-Tactile Fusion with Predictive Force Attention for Dexterous Manipulation.

[BibT_eX]

[DOI]

CoRR, May, 2025

J1: Exploring Simple Test-Time Scaling for LLM-as-a-Judge.

[BibT_eX]

[DOI]

CoRR, May, 2025

Redefining Superalignment: From Weak-to-Strong Alignment to Human-AI Co-Alignment to Sustainable Symbiotic Society.

[BibT_eX]

[DOI]

CoRR, April, 2025

A Comprehensive Survey in LLM(-Agent) Full Stack Safety: Data, Training and Deployment.

[BibT_eX]

[DOI]

CoRR, April, 2025

Benchmarking Multi-National Value Alignment for Large Language Models.

[BibT_eX]

[DOI]

CoRR, April, 2025

JARVIS-1: Open-World Multi-Task Agents With Memory-Augmented Multimodal Language Models.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., March, 2025

Dexterous Non-Prehensile Manipulation for Ungraspable Object via Extrinsic Dexterity.

[BibT_eX]

[DOI]

CoRR, March, 2025

Safe RLHF-V: Safe Reinforcement Learning from Human Feedback in Multimodal Large Language Models.

[BibT_eX]

[DOI]

CoRR, March, 2025

ThinkPatterns-21k: A Systematic Study on the Impact of Thinking Patterns in LLMs.

[BibT_eX]

[DOI]

CoRR, March, 2025

SafeVLA: Towards Safety Alignment of Vision-Language-Action Model via Safe Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, March, 2025

DexGraspVLA: A Vision-Language-Action Framework Towards General Dexterous Grasping.

[BibT_eX]

[DOI]

CoRR, February, 2025

Retrieval Dexterity: Efficient Object Retrieval in Clutters with Dexterous Hand.

[BibT_eX]

[DOI]

CoRR, February, 2025

Model Evolution Framework with Genetic Algorithm for Multi-Task Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, February, 2025

RedStar: Does Scaling Long-CoT Data Unlock Better Slow-Reasoning Systems?

[BibT_eX]

[DOI]

CoRR, January, 2025

Approximating N-Player Nash Equilibrium through Gradient Descent.

[BibT_eX]

[DOI]

CoRR, January, 2025

Attacking cooperative multi-agent reinforcement learning by adversarial minority influence.

[BibT_eX]

[DOI]

Neural Networks, 2025

TIMAR: Transition-informed representation for sample-efficient multi-agent reinforcement learning.

[BibT_eX]

[DOI]

Neural Networks, 2025

Can large language models independently complete tasks? A dynamic evaluation framework for multi-turn task planning and completion.

[BibT_eX]

[DOI]

Neurocomputing, 2025

Mean Field Correlated Imitation Learning.

[BibT_eX]

[DOI]

Proceedings of the 24th International Conference on Autonomous Agents and Multiagent Systems, 2025

Carbon Trading Supply Chain Management Based on Constrained Deep Reinforcement Learning.

[BibT_eX]

[DOI]

Qinghao Wang

Proceedings of the 24th International Conference on Autonomous Agents and Multiagent Systems, 2025

Hierarchical Multi-Agent Framework for Dynamic Macroeconomic Modelling Using Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the 24th International Conference on Autonomous Agents and Multiagent Systems, 2025

SAE-V: Interpreting Multimodal Models for Enhanced Alignment.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Falcon: Fast Visuomotor Policies via Partial Denoising.

[BibT_eX]

[DOI]

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Amulet: ReAlignment During Test Time for Personalized Preference Adaptation of LLMs.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Emerging Safety Attack and Defense in Federated Instruction Tuning of Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Magnetic Preference Optimization: Achieving Last-iterate Convergence for Language Model Alignment.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

In-Context Editing: Learning Knowledge from Self-Induced Distributions.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Mitigating Reward Over-Optimization in RLHF via Behavior-Supported Regularization.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

A Unified Framework for Multi-Stage Decision Optimization with Deep Reinforcement Learning and Foundation Models.

[BibT_eX]

[DOI]

Proceedings of the 21st IEEE International Conference on Automation Science and Engineering, 2025

Heterogeneous Value Alignment Evaluation for Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Artificial General Intelligence - 18th International Conference, 2025

Roadmap on Incentive Compatibility for AI Alignment and Governance in Sociotechnical Systems.

[BibT_eX]

[DOI]

Proceedings of the Artificial General Intelligence - 18th International Conference, 2025

Reward Generalization in RLHF: A Topological Perspective.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2025

Benchmarking Multi-National Value Alignment for Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2025

Language Models Resist Alignment: Evidence From Data Compression.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

PKU-SafeRLHF: Towards Multi-Level Safety Alignment for LLMs with Human Preference.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Boosting Policy and Process Reward Models with Monte Carlo Tree Search in Open-Domain QA.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2025

SafeLawBench: Towards Safe Alignment of Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2025

Sequence to Sequence Reward Modeling: Improving RLHF by Language Feedback.

[BibT_eX]

[DOI]

Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

Differentiable Information Enhanced Model-Based Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

Stream Aligner: Efficient Sentence-Level Alignment via Distribution Induction.

[BibT_eX]

[DOI]

Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

Towards Efficient Collaboration via Graph Modeling in Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

RAT: Adversarial Attacks on Deep Reinforcement Agents for Targeted Behaviors.

[BibT_eX]

[DOI]

Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024

Self-Supervised MAFENN for Classifying Low-Labeled Distorted Images Over Mobile Fading Channels.

[BibT_eX]

[DOI]

IEEE Trans. Mob. Comput., August, 2024

ASP: Learn a Universal Neural Solver!

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., June, 2024

Grasp Multiple Objects With One Hand.

[BibT_eX]

[DOI]

IEEE Robotics Autom. Lett., May, 2024

Bi-DexHands: Towards Human-Level Bimanual Dexterous Manipulation.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., May, 2024

MaskMA: Towards Zero-Shot Multi-Agent Decision Making with Mask-Based Collaborative Learning.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2024

RoMAT: Role-based multi-agent transformer for generalizable heterogeneous cooperation.

[BibT_eX]

[DOI]

Neural Networks, 2024

Adaptive pessimism via target Q-value for offline reinforcement learning.

[BibT_eX]

[DOI]

Neural Networks, 2024

Heterogeneous-Agent Reinforcement Learning.

[BibT_eX]

[DOI]

J. Mach. Learn. Res., 2024

Libra-Leaderboard: Towards Responsible AI through a Balanced Leaderboard of Safety and Capability.

[BibT_eX]

[DOI]

CoRR, 2024

Align Anything: Training All-Modality Models to Follow Instructions with Language Feedback.

[BibT_eX]

[DOI]

CoRR, 2024

Random Feature Models with Learnable Activation Functions.

[BibT_eX]

[DOI]

Zailin Ma

Jiansheng Yang

CoRR, 2024

Learning Uniformly Distributed Embedding Clusters of Stylistic Skills for Physically Simulated Characters.

[BibT_eX]

[DOI]

CoRR, 2024

Sample-Efficient Regret-Minimizing Double Oracle in Extensive-Form Games.

[BibT_eX]

[DOI]

CoRR, 2024

Magnetic Preference Optimization: Achieving Last-iterate Convergence for Language Models Alignment.

[BibT_eX]

[DOI]

CoRR, 2024

Computing Ex Ante Equilibrium in Heterogeneous Zero-Sum Team Games.

[BibT_eX]

[DOI]

CoRR, 2024

Sequence to Sequence Reward Modeling: Improving RLHF by Language Feedback.

[BibT_eX]

[DOI]

CoRR, 2024

A Survey on Self-play Methods in Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, 2024

PKU-SafeRLHF: A Safety Alignment Preference Dataset for Llama Family Models.

[BibT_eX]

[DOI]

CoRR, 2024

SafeSora: Towards Safety Alignment of Text2Video Generation via a Human Preference Dataset.

[BibT_eX]

[DOI]

CoRR, 2024

Language Models Resist Alignment.

[BibT_eX]

[DOI]

CoRR, 2024

Fusion-PSRO: Nash Policy Fusion for Policy Space Response Oracles.

[BibT_eX]

[DOI]

CoRR, 2024

Efficient Model-agnostic Alignment via Bayesian Persuasion.

[BibT_eX]

[DOI]

CoRR, 2024

Efficient Preference-based Reinforcement Learning via Aligned Experience Estimation.

[BibT_eX]

[DOI]

CoRR, 2024

Correlated Mean Field Imitation Learning.

[BibT_eX]

[DOI]

CoRR, 2024

INSIGHT: End-to-End Neuro-Symbolic Visual Reinforcement Learning with Language Explanations.

[BibT_eX]

[DOI]

CoRR, 2024

UniDexFPM: Universal Dexterous Functional Pre-grasp Manipulation Via Diffusion Policy.

[BibT_eX]

[DOI]

CoRR, 2024

Leveraging Team Correlation for Approximating Equilibrium in Two-Team Zero-Sum Games.

[BibT_eX]

[DOI]

CoRR, 2024

Incentive Compatibility for AI Alignment in Sociotechnical Systems: Positions and Prospects.

[BibT_eX]

[DOI]

CoRR, 2024

Rethinking Information Structures in RLHF: Reward Generalization from a Graph Theory Perspective.

[BibT_eX]

[DOI]

CoRR, 2024

Aligner: Achieving Efficient Alignment through Weak-to-Strong Correction.

[BibT_eX]

[DOI]

CoRR, 2024

CivRealm: A Learning and Reasoning Odyssey in Civilization for Decision-Making Agents.

[BibT_eX]

[DOI]

CoRR, 2024

Panacea: Pareto Alignment via Preference Adaptation for LLMs.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Scalable Constrained Policy Optimization for Safe Multi-agent Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

ProgressGym: Alignment with a Millennium of Moral Progress.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Aligner: Efficient Alignment by Learning to Correct.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

SafeSora: Towards Safety Alignment of Text2Video Generation via a Human Preference Dataset.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Remember the Past for Better Future: Memory-Augmented Offline RL.

[BibT_eX]

[DOI]

Proceedings of the International Joint Conference on Neural Networks, 2024

Off-Agent Trust Region Policy Optimization.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024

End-to-End Neuro-Symbolic Reinforcement Learning with Textual Explanations.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Efficient Adaptation in Mixed-Motive Environments via Hierarchical Opponent Modeling and Planning.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Safe Reinforcement Learning using Finite-Horizon Gradient-based Estimation.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

CivRealm: A Learning and Reasoning Odyssey in Civilization for Decision-Making Agents.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Maximum Entropy Heterogeneous-Agent Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Byzantine Robust Cooperative Multi-Agent Reinforcement Learning as a Bayesian Game.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Safe RLHF: Safe Reinforcement Learning from Human Feedback.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

SafeDreamer: Safe Reinforcement Learning with World Models.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

AnySkill: Learning Open-Vocabulary Physical Skill for Interactive Agents.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Neural Attention Field: Emerging Point Relevance in 3D Scenes for One-Shot Dexterous Grasping.

[BibT_eX]

[DOI]

Proceedings of the Conference on Robot Learning, 6-9 November 2024, Munich, Germany., 2024

Object-Centric Dexterous Manipulation from Human Motion Data.

[BibT_eX]

[DOI]

Proceedings of the Conference on Robot Learning, 6-9 November 2024, Munich, Germany., 2024

A Summary of Online Markov Decision Processes with Non-oblivious Strategic Adversary.

[BibT_eX]

[DOI]

Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems, 2024

GIPUT: Maximizing Photo Coverage Efficiency for UAV Trajectory.

[BibT_eX]

[DOI]

Proceedings of the Web and Big Data - 8th International Joint Conference, 2024

ProAgent: Building Proactive Cooperative Agents with Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

A Perspective of Q-value Estimation on Offline-to-Online Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

STAS: Spatial-Temporal Return Decomposition for Solving Sparse Rewards Problems in Multi-agent Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

Large sequence models for sequential decision-making: a survey.

[BibT_eX]

[DOI]

Frontiers Comput. Sci., December, 2023

Safe multi-agent reinforcement learning for multi-robot control.

[BibT_eX]

[DOI]

Artif. Intell., June, 2023

Online Markov decision processes with non-oblivious strategic adversary.

[BibT_eX]

[DOI]

Auton. Agents Multi Agent Syst., June, 2023

Offline Pre-trained Multi-agent Decision Transformer.

[BibT_eX]

[DOI]

Mach. Intell. Res., April, 2023

JiangJun: Mastering Xiangqi by Tackling Non-Transitivity in Two-Player Zero-Sum Games.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2023

MALib: A Parallel Framework for Population-based Multi-agent Reinforcement Learning.

[BibT_eX]

[DOI]

J. Mach. Learn. Res., 2023

TorchOpt: An Efficient Library for Differentiable Optimization.

[BibT_eX]

[DOI]

J. Mach. Learn. Res., 2023

MARLlib: A Scalable and Efficient Multi-agent Reinforcement Learning Library.

[BibT_eX]

[DOI]

J. Mach. Learn. Res., 2023

AI Alignment: A Comprehensive Survey.

[BibT_eX]

[DOI]

CoRR, 2023

Safety-Gymnasium: A Unified Safe Reinforcement Learning Benchmark.

[BibT_eX]

[DOI]

CoRR, 2023

Masked Pretraining for Multi-Agent Decision Making.

[BibT_eX]

[DOI]

CoRR, 2023

MIR2: Towards Provably Robust Multi-Agent Reinforcement Learning by Mutual Information Regularization.

[BibT_eX]

[DOI]

CoRR, 2023

Measuring Value Understanding in Language Models through Discriminator-Critique Gap.

[BibT_eX]

[DOI]

CoRR, 2023

Red Teaming Game: A Game-Theoretic Framework for Red Teaming Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

Mixup-Augmented Meta-Learning for Sample-Efficient Fine-Tuning of Protein Simulators.

[BibT_eX]

[DOI]

CoRR, 2023

ProAgent: Building Proactive Cooperative AI with Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

Safe DreamerV3: Safe Reinforcement Learning with World Models.

[BibT_eX]

[DOI]

CoRR, 2023

BeaverTails: Towards Improved Safety Alignment of LLM via a Human-Preference Dataset.

[BibT_eX]

[DOI]

CoRR, 2023

Maximum Entropy Heterogeneous-Agent Mirror Learning.

[BibT_eX]

[DOI]

CoRR, 2023

Deep Reinforcement Learning with Multitask Episodic Memory Based on Task-Conditioned Hypernetwork.

[BibT_eX]

[DOI]

CoRR, 2023

Heterogeneous Value Evaluation for Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

Byzantine Robust Cooperative Multi-Agent Reinforcement Learning as a Bayesian Game.

[BibT_eX]

[DOI]

CoRR, 2023

OmniSafe: An Infrastructure for Accelerating Safe Reinforcement Learning Research.

[BibT_eX]

[DOI]

CoRR, 2023

Heterogeneous-Agent Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, 2023

STAS: Spatial-Temporal Return Decomposition for Multi-agent Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, 2023

A Human-Centered Safe Robot Reinforcement Learning Framework with Interactive Behaviors.

[BibT_eX]

[DOI]

CoRR, 2023

MANSA: Learning Fast and Slow in Multi-Agent Systems.

[BibT_eX]

[DOI]

CoRR, 2023

MSRL: Distributed Reinforcement Learning with Dataflow Fragments.

[BibT_eX]

[DOI]

Proceedings of the 2023 USENIX Annual Technical Conference, 2023

Multi-Agent First Order Constrained Optimization in Policy Space.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Policy Space Diversity for Non-Transitive Games.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Hierarchical Multi-Agent Skill Discovery.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Team-PSRO for Learning Approximate TMECor in Large Team Games via Cooperative Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Safety Gymnasium: A Unified Safe Reinforcement Learning Benchmark.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

BeaverTails: Towards Improved Safety Alignment of LLM via a Human-Preference Dataset.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

GenDexGrasp: Generalizable Dexterous Grasping.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Robotics and Automation, 2023

RLAfford: End-to-End Affordance Learning for Robotic Manipulation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Robotics and Automation, 2023

GEAR: A GPU-Centric Experience Replay System for Large Reinforcement Learning Models.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

Regret-Minimizing Double Oracle for Extensive-Form Games.

[BibT_eX]

[DOI]

Xiaohang Tang

Le Cong Dinh

Proceedings of the International Conference on Machine Learning, 2023

A Game-Theoretic Framework for Managing Risk in Multi-Agent Systems.

[BibT_eX]

[DOI]

Oliver Slumbers

David Henry Mguni

Stefano B. Blumberg

Proceedings of the International Conference on Machine Learning, 2023

MANSA: Learning Fast and Slow in Multi-Agent Systems.

[BibT_eX]

[DOI]

Feifei Tong

Proceedings of the International Conference on Machine Learning, 2023

Quality-Similar Diversity via Population Based Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

UniDexGrasp++: Improving Dexterous Grasping Policy Learning via Geometry-aware Curriculum and Iterative Generalist-Specialist Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Theoretically Guaranteed Policy Improvement Distilled from Model-Based Planning.

[BibT_eX]

[DOI]

Proceedings of the ECAI 2023 - 26th European Conference on Artificial Intelligence, September 30 - October 4, 2023, Kraków, Poland, 2023

Dynamic Handover: Throw and Catch with Bimanual Hands.

[BibT_eX]

[DOI]

Proceedings of the Conference on Robot Learning, 2023

Is Nash Equilibrium Approximator Learnable?

[BibT_eX]

[DOI]

Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems, 2023

Subspace-Aware Exploration for Sparse-Reward Multi-Agent Tasks.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

Learning to Shape Rewards Using a Game of Two Partners.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

ACE: Cooperative Multi-Agent Q-learning with Bidirectional Action-Dependency.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022

Online Double Oracle.

[BibT_eX]

[DOI]

Le Cong Dinh

Trans. Mach. Learn. Res., 2022

Illiquidity Comovement and Market Crisis.

[BibT_eX]

[DOI]

J. Syst. Sci. Complex., 2022

Contextual Transformer for Offline Meta Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, 2022

MARLlib: Extending RLlib for Multi-agent Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, 2022

End-to-End Affordance Learning for Robotic Manipulation.

[BibT_eX]

[DOI]

CoRR, 2022

Heterogeneous-Agent Mirror Learning: A Continuum of Solutions to Cooperative MARL.

[BibT_eX]

[DOI]

CoRR, 2022

Fully Decentralized Model-based Policy Optimization for Networked Systems.

[BibT_eX]

[DOI]

CoRR, 2022

Towards Human-Level Bimanual Dexterous Manipulation with Reinforcement Learning.

[BibT_eX]

[DOI]

Hao Dong

Zongqing Lu

Song-Chun Zhu

CoRR, 2022

Learning Risk-Averse Equilibria in Multi-Agent Systems.

[BibT_eX]

[DOI]

CoRR, 2022

A Review of Safe Reinforcement Learning: Methods, Theory and Applications.

[BibT_eX]

[DOI]

CoRR, 2022

Understanding Value Decomposition Algorithms in Deep Cooperative Multi-Agent Reinforcement Learning.

[BibT_eX]

[DOI]

Zehao Dou

Jakub Grudzien Kuba

CoRR, 2022

Settling the Communication Complexity for Distributed Offline Reinforcement Learning.

[BibT_eX]

[DOI]

Juliusz Krysztof Ziomek

CoRR, 2022

Efficient Policy Space Response Oracles.

[BibT_eX]

[DOI]

CoRR, 2022

Measuring the Non-Transitivity in Chess.

[BibT_eX]

[DOI]

Ricky Sanjaya

Algorithms, 2022

Debias the Black-Box: A Fair Ranking Framework via Knowledge Distillation.

[BibT_eX]

[DOI]

Proceedings of the Web Information Systems Engineering - WISE 2022, 2022

Constrained Update Projection Approach to Safe Policy Optimization.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Multi-Agent Reinforcement Learning is a Sequence Modeling Problem.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

MATE: Benchmarking Multi-Agent Reinforcement Learning in Distributed Target Coverage Control.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Meta-Reward-Net: Implicitly Differentiable Reward Learning for Preference-based Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Towards Human-Level Bimanual Dexterous Manipulation with Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

A Theoretical Understanding of Gradient Bias in Meta-Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Scalable Model-based Policy Optimization for Decentralized Networked Systems.

[BibT_eX]

[DOI]

Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2022

On the Convergence of Fictitious Play: A Decomposition Approach.

[BibT_eX]

[DOI]

Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, 2022

LIGS: Learnable Intrinsic-Reward Generation Selection for Multi-Agent Learning.

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Learning Representations, 2022

Trust Region Policy Optimisation in Multi-Agent Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Learning Representations, 2022

A Game-Theoretic Approach to Multi-agent Trust Region Optimization.

[BibT_eX]

[DOI]

Proceedings of the Distributed Artificial Intelligence - 4th International Conference, 2022

2021

Many-agent reinforcement learning

[BibT_eX]

[DOI]

PhD thesis, 2021

On the Complexity of Computing Markov Perfect Equilibrium in General-Sum Stochastic Games.

[BibT_eX]

[DOI]

Electron. Colloquium Comput. Complex., 2021

Settling the Bias and Variance of Meta-Gradient Estimation for Meta-Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, 2021

Offline Pre-trained Multi-Agent Decision Transformer: One Big Sequence Model Tackles All SMAC Tasks.

[BibT_eX]

[DOI]

CoRR, 2021

A Game-Theoretic Approach for Improving Generalization Ability of TSP Solvers.

[BibT_eX]

[DOI]

CoRR, 2021

DESTA: A Framework for Safe Reinforcement Learning with Markov Games of Intervention.

[BibT_eX]

[DOI]

CoRR, 2021

Multi-Agent Constrained Policy Optimisation.

[BibT_eX]

[DOI]

CoRR, 2021

Revisiting the Characteristics of Stochastic Gradient Noise and Dynamics.

[BibT_eX]

[DOI]

CoRR, 2021

MALib: A Parallel Framework for Population-based Multi-agent Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, 2021

Unifying Behavioral and Response Diversity for Open-ended Learning in Zero-sum Games.

[BibT_eX]

[DOI]

CoRR, 2021

Discovering Multi-Agent Auto-Curricula in Two-Player Zero-Sum Games.

[BibT_eX]

[DOI]

CoRR, 2021

Learning to Shape Rewards using a Game of Switching Controls.

[BibT_eX]

[DOI]

CoRR, 2021

Modelling Behavioural Diversity for Learning in Open-Ended Games.

[BibT_eX]

[DOI]

CoRR, 2021

Online Double Oracle.

[BibT_eX]

[DOI]

CoRR, 2021

Towards Unifying Behavioral and Response Diversity for Open-ended Learning in Zero-sum Games.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Settling the Variance of Multi-Agent Policy Gradients.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Neural Auto-Curricula in Two-Player Zero-Sum Games.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

MyoChallenge 2022: Learning contact-rich manipulation using a musculoskeletal hand.

[BibT_eX]

[DOI]

Proceedings of the NeurIPS 2022 Competition Track, 2021

Modelling Behavioural Diversity for Learning in Open-Ended Games.

[BibT_eX]

[DOI]

Proceedings of the 38th International Conference on Machine Learning, 2021

Learning in Nonzero-Sum Stochastic Games with Potentials.

[BibT_eX]

[DOI]

Proceedings of the 38th International Conference on Machine Learning, 2021

2020

Order Execution Probability and Order Queue in Limit Order Markets.

[BibT_eX]

[DOI]

J. Syst. Sci. Complex., 2020

Can deep learning predict risky retail investors? A case study in financial risk behavior forecasting.

[BibT_eX]

[DOI]

Johnnie E. V. Johnson

Eur. J. Oper. Res., 2020

An Overview of Multi-Agent Reinforcement Learning from Game Theoretical Perspective.

[BibT_eX]

[DOI]

CoRR, 2020

Replica-Exchange Nosé-Hoover Dynamics for Bayesian Learning on Large Datasets.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Modelling Bounded Rationality in Multi-Agent Interactions by Generalized Recursive Reasoning.

[BibT_eX]

[DOI]

Ying Wen

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, 2020

Multi-Agent Determinantal Q-Learning.

[BibT_eX]

[DOI]

Proceedings of the 37th International Conference on Machine Learning, 2020

Learning to Infer User Hidden States for Online Sequential Advertising.

[BibT_eX]

[DOI]

Proceedings of the CIKM '20: The 29th ACM International Conference on Information and Knowledge Management, 2020

αα-Rank: Practically Scaling α-Rank through Stochastic Optimisation.

[BibT_eX]

[DOI]

Proceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems, 2020

Sequential Advertising Agent with Interpretable User Hidden Intents.

[BibT_eX]

[DOI]

Proceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems, 2020

Bi-Level Actor-Critic for Multi-Agent Coordination.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019

Efficient Ridesharing Order Dispatching with Mean Field Multi-Agent Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, 2019

Multi-Agent Generalized Recursive Reasoning.

[BibT_eX]

[DOI]

CoRR, 2019

Efficient Ridesharing Order Dispatching with Mean Field Multi-Agent Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the World Wide Web Conference, 2019

Probabilistic Recursive Reasoning for Multi-Agent Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the 7th International Conference on Learning Representations, 2019

Adversarial Variational Bayes Methods for Tweedie Compound Poisson Mixed Models.

[BibT_eX]

[DOI]