We stand with Ukraine

We stand with Ukraine

Weixun Wang

Orcid: 0000-0002-2727-8948

According to our database¹, Weixun Wang authored at least 73 papers between 2009 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

ROSE: Rollout On Serving GPUs via Cooperative Elasticity for Agentic RL.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, May, 2026

Complementary Reinforcement Learning.

[DOI]

,

,

,

,

,

,

,

,

,

,

CoRR, March, 2026

CE-RM: A Pointwise Generative Reward Model Optimized via Two-Stage Rollout and Unified Criteria.

[DOI]

,

,

,

,

,

,

,

,

CoRR, January, 2026

ShopSimulator: Evaluating and Exploring RL-Driven LLM Agent for Shopping Assistants.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, January, 2026

RollPacker: Taming Long-Tail Rollouts for RL Post-Training with Tail Batching.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the 23rd USENIX Symposium on Networked Systems Design and Implementation, 2026

Think-J: Learning to Think for Generative LLM-as-a-Judge.

[DOI]

,

,

,

,

,

,

,

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025

RollArt: Scaling Agentic RL Training via Disaggregated Infrastructure.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, December, 2025

Reasoning Palette: Modulating Reasoning via Latent Contextualization for Controllable Exploration for (V)LMs.

[DOI]

,

,

,

,

Tianqianjin Lin

,

,

,

,

,

CoRR, December, 2025

Attention Illuminates LLM Reasoning: The Preplan-and-Anchor Rhythm Enables Fine-Grained Policy Optimization.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

CoRR, October, 2025

Part II: ROLL Flash - Accelerating RLVR and Agentic Training with Asynchrony.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, October, 2025

Asymmetric Proximal Policy Optimization: mini-critics boost LLM reasoning.

[DOI]

,

Johan S. Obando-Ceron

,

,

,

,

,

,

Pablo Samuel Castro

,

Aaron C. Courville

,

CoRR, October, 2025

GEM: A Gym for Agentic LLMs.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, October, 2025

RollPacker: Mitigating Long-Tail Rollouts for Fast, Synchronous RL Post-Training.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, September, 2025

Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, August, 2025

Reinforcement Learning Optimization for Large-Scale Learning: An Efficient and User-Friendly Scaling Library.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, June, 2025

USB: A Comprehensive and Unified Safety Evaluation Benchmark for Multimodal Large Language Models.

[DOI]

,

,

Hongqiong Zhong

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, May, 2025

Beyond Safe Answers: A Benchmark for Evaluating True Risk Awareness in Large Reasoning Models.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

CoRR, May, 2025

Think-J: Learning to Think for Generative LLM-as-a-Judge.

[DOI]

,

,

,

,

,

,

,

,

CoRR, May, 2025

Deconstructing Long Chain-of-Thought: A Structured Reasoning Optimization Framework for Long CoT Distillation.

[DOI]

,

,

,

,

,

,

,

CoRR, March, 2025

Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning?

[DOI]

,

,

,

,

,

,

,

Zhaoxiang Zhang

,

,

,

CoRR, February, 2025

CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models.

[DOI]

Alexander Zhang

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Wangchunshu Zhou

,

,

Zhaoxiang Zhang

CoRR, February, 2025

2D-DPO: Scaling Direct Preference Optimization with 2-Dimensional Supervision.

[DOI]

,

,

,

,

,

,

,

,

,

Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2025, Albuquerque, New Mexico, USA, April 29, 2025

OpenRLHF: A Ray-based Easy-to-use, Scalable and High-performance RLHF Framework.

[DOI]

,

,

,

Jason Klein Liu

,

,

,

,

,

,

,

,

,

,

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

ProgCo: Program Helps Self-Correction of Large Language Models.

[DOI]

,

,

,

,

,

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2025

Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning?

[DOI]

,

,

,

,

,

,

,

Zhaoxiang Zhang

,

,

,

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024

Cooperative Multiagent Transfer Learning With Coalition Pattern Decomposition.

[DOI]

,

,

,

,

,

,

,

,

,

,

IEEE Trans. Games, June, 2024

Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, 2024

OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework.

[DOI]

,

,

,

,

,

CoRR, 2024

The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization.

[DOI]

,

Michael Noukhovitch

,

,

,

,

CoRR, 2024

PORTAL: Automatic Curricula Generation for Multiagent Reinforcement Learning.

[DOI]

,

,

,

,

,

,

Matthew E. Taylor

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

ASN: action semantics network for multiagent reinforcement learning.

[DOI]

,

,

,

Matthew E. Taylor

,

,

,

,

,

,

,

,

,

Auton. Agents Multi Agent Syst., October, 2023

MARLlib: A Scalable and Efficient Multi-agent Reinforcement Learning Library.

[DOI]

,

,

,

,

,

,

,

,

J. Mach. Learn. Res., 2023

Boosting Multiagent Reinforcement Learning via Permutation Invariant and Permutation Equivariant Networks.

[DOI]

,

,

,

,

,

,

,

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Off-Beat Multi-Agent Reinforcement Learning.

[DOI]

,

,

,

,

,

Svetlana Obraztsova

,

Zinovi Rabinovich

,

,

,

Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems, 2023

2022

Coach-assisted multi-agent reinforcement learning framework for unexpected crashed agents.

[DOI]

,

,

,

,

,

,

,

Frontiers Inf. Technol. Electron. Eng., 2022

MARLlib: Extending RLlib for Multi-agent Reinforcement Learning.

[DOI]

,

,

,

,

,

,

,

,

CoRR, 2022

A2C is a special case of PPO.

[DOI]

,

Anssi Kanervisto

,

,

,

Santiago Ontañón

,

Rousslan Fernand Julien Dossa

CoRR, 2022

API: Boosting Multi-Agent Reinforcement Learning via Agent-Permutation-Invariant Networks.

[DOI]

,

,

,

,

,

,

,

CoRR, 2022

Revisiting QMIX: Discriminative Credit Assignment by Gradient Entropy Regularization.

[DOI]

,

,

,

,

,

,

,

CoRR, 2022

Transformer-based Working Memory for Multiagent Reinforcement Learning with Action Parsing.

[DOI]

,

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Individual Reward Assisted Multi-Agent Reinforcement Learning.

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the International Conference on Machine Learning, 2022

2021

Cooperative Multi-Agent Transfer Learning with Level-Adaptive Credit Assignment.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, 2021

An Efficient Transfer Learning Framework for Multiagent Reinforcement Learning.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

2020

Learning When to Transfer among Agents: An Efficient Multiagent Transfer Learning Framework.

[DOI]

,

,

,

,

,

,

,

CoRR, 2020

Learning to Utilize Shaping Rewards: A New Approach of Reward Shaping.

[DOI]

,

,

,

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

KoGuN: Accelerating Deep Reinforcement Learning via Integrating Human Suboptimal Knowledge.

[DOI]

,

,

,

,

,

,

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, 2020

Efficient Deep Reinforcement Learning via Adaptive Policy Transfer.

[DOI]

,

,

,

Zongzhang Zhang

,

,

,

,

,

,

,

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, 2020

Learning to Accelerate Heuristic Searching for Large-Scale Maximum Weighted b-Matching Problems in Online Advertising.

[DOI]

,

,

,

,

,

,

,

,

,

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, 2020

Action Semantics Network: Considering the Effects of Actions in Multiagent Systems.

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the 8th International Conference on Learning Representations, 2020

Efficient Deep Reinforcement Learning through Policy Transfer.

[DOI]

,

,

,

Zongzhang Zhang

,

,

,

,

,

,

Proceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems, 2020

From Few to More: Large-Scale Dynamic Multiagent Curriculum Learning.

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

Multi-Agent Game Abstraction via Graph Attention Neural Network.

[DOI]

,

,

,

,

,

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019

Achieving cooperation through deep multiagent reinforcement learning in sequential prisoner's dilemmas.

[DOI]

,

,

,

Matthew E. Taylor

Proceedings of the First International Conference on Distributed Artificial Intelligence, 2019

Learning Adaptive Display Exposure for Real-Time Advertising.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the 28th ACM International Conference on Information and Knowledge Management, 2019

Independent Generative Adversarial Self-Imitation Learning in Cooperative Multiagent Systems.

[DOI]

,

,

,

Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, 2019

2018

Learning to Advertise with Adaptive Exposure via Constrained Two-Level Reinforcement Learning.

[DOI]

,

,

,

,

,

,

,

,

,

,

CoRR, 2018

Towards Cooperation in Sequential Prisoner's Dilemmas: a Deep Multiagent Reinforcement Learning Approach.

[DOI]

,

,

,

Matthew E. Taylor

CoRR, 2018

2012

Energy-Aware Scheduling and Dynamic Reconfiguration in Real-Time Systems.

[DOI]

,

,

Proceedings of the Handbook of Energy-Aware and Green Computing - Two Volume Set., 2012

System-Wide Leakage-Aware Energy Minimization Using Dynamic Voltage Scaling and Cache Reconfiguration in Multitasking Systems.

[DOI]

,

IEEE Trans. Very Large Scale Integr. Syst., 2012

Dynamic Cache Reconfiguration for Soft Real-Time Systems.

[DOI]

,

,

Ann Gordon-Ross

ACM Trans. Embed. Comput. Syst., 2012

TCEC: Temperature and Energy-Constrained Scheduling in Real-Time Multitasking Systems.

[DOI]

,

,

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2012

Energy-aware dynamic slack allocation for real-time multitasking systems.

[DOI]

,

,

Sustain. Comput. Informatics Syst., 2012

A Novel Approach for Handling Misbehaving Nodes in Behavior-Aware Mobile Networking

[DOI]

,

,

Srishti Mukherjee

,

CoRR, 2012

2011

Energy-aware dynamic reconfiguration algorithms for real-time multitasking systems.

[DOI]

,

,

Sustain. Comput. Informatics Syst., 2011

Dynamic Reconfiguration of Two-Level Cache Hierarchy in Real-Time Embedded Systems.

[DOI]

,

J. Low Power Electron., 2011

A General Algorithm for Energy-Aware Dynamic Reconfiguration in Multitasking Systems.

[DOI]

,

,

Proceedings of the VLSI Design 2011: 24th International Conference on VLSI Design, 2011

Dynamic cache reconfiguration and partitioning for energy optimization in real-time multi-core systems.

[DOI]

,

,

Proceedings of the 48th Design Automation Conference, 2011

2010

Leakage-Aware Energy Minimization Using Dynamic Voltage Scaling and Cache Reconfiguration in Real-Time Systems.

[DOI]

,

Proceedings of the VLSI Design 2010: 23rd International Conference on VLSI Design, 2010

Temperature- and energy-constrained scheduling in multitasking systems: a model checking approach.

[DOI]

,

,

Proceedings of the 2010 International Symposium on Low Power Electronics and Design, 2010

PreDVS: preemptive dynamic voltage scaling for real-time systems using approximation scheme.

[DOI]

,

Proceedings of the 47th Design Automation Conference, 2010

2009

SACR: Scheduling-Aware Cache Reconfiguration for Real-Time Embedded Systems.

[DOI]

,

,

Ann Gordon-Ross

Proceedings of the VLSI Design 2009: Improving Productivity through Higher Abstraction, 2009

Dynamic Reconfiguration of Two-Level Caches in Soft Real-Time Embedded Systems.

[DOI]

,

Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2009

Loading...