We stand with Ukraine

We stand with Ukraine

Shengyi Huang

Orcid: 0000-0003-4986-1365

According to our database¹, Shengyi Huang authored at least 20 papers between 2019 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2025

Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, August, 2025

Generalizing Verifiable Instruction Following.

[DOI]

Valentina Pyatkin

,

,

,

,

,

,

,

Hannaneh Hajishirzi

CoRR, July, 2025

2 OLMo 2 Furious.

[DOI]

CoRR, January, 2025

Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models.

[DOI]

Michael Noukhovitch

,

,

Sophie Xhonneux

,

,

Rishabh Agarwal

,

Aaron C. Courville

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024

TÜLU 3: Pushing Frontiers in Open Language Model Post-Training.

[DOI]

CoRR, 2024

The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization.

[DOI]

,

Michael Noukhovitch

,

,

,

,

CoRR, 2024

Open RL Benchmark: Comprehensive Tracked Experiments for Reinforcement Learning.

[DOI]

CoRR, 2024

Cleanba: A Reproducible and Efficient Distributed Reinforcement Learning Platform.

[DOI]

,

,

Rujikorn Charakorn

,

,

,

Santiago Ontañón

Proceedings of the Twelfth International Conference on Learning Representations, 2024

2023

Zephyr: Direct Distillation of LM Alignment.

[DOI]

,

Edward Beeching

,

,

,

,

,

,

Leandro von Werra

,

Clémentine Fourrier

,

,

Nathan Sarrazin

,

Omar Sanseviero

,

Alexander M. Rush

,

CoRR, 2023

Reward Scale Robustness for Proximal Policy Optimization via DreamerV3 Tricks.

[DOI]

,

,

,

John P. Dickerson

,

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

2022

CleanRL: High-quality Single-file Implementations of Deep Reinforcement Learning Algorithms.

[DOI]

,

Rousslan Fernand Julien Dossa

,

,

,

Dipam Chakraborty

,

,

João G. M. Araújo

J. Mach. Learn. Res., 2022

A2C is a special case of PPO.

[DOI]

,

Anssi Kanervisto

,

,

,

Santiago Ontañón

,

Rousslan Fernand Julien Dossa

CoRR, 2022

EnvPool: A Highly Parallel Reinforcement Learning Environment Execution Engine.

[DOI]

,

,

,

,

Denys Makoviichuk

,

Viktor Makoviychuk

,

,

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

A Closer Look at Invalid Action Masking in Policy Gradient Algorithms.

[DOI]

,

Santiago Ontañón

Proceedings of the Thirty-Fifth International Florida Artificial Intelligence Research Society Conference, 2022

2021

CleanRL: High-quality Single-file Implementations of Deep Reinforcement Learning Algorithms.

[DOI]

,

Rousslan Fernand Julien Dossa

,

,

CoRR, 2021

An Empirical Investigation of Early Stopping Optimizations in Proximal Policy Optimization.

[DOI]

Rousslan Fernand Julien Dossa

,

,

Santiago Ontañón

,

Takashi Matsubara

IEEE Access, 2021

Gym-µRTS: Toward Affordable Full Game Real-time Strategy Games Research with Deep Reinforcement Learning.

[DOI]

,

Santiago Ontañón

,

,

Proceedings of the 2021 IEEE Conference on Games (CoG), 2021

2020

Griddly: A platform for AI research in games.

[DOI]

,

,

CoRR, 2020

Action Guidance: Getting the Best of Sparse Rewards and Shaped Rewards for Real-time Strategy Games.

[DOI]

,

Santiago Ontañón

CoRR, 2020

2019

Comparing Observation and Action Representations for Deep Reinforcement Learning in MicroRTS.

[DOI]

,

Santiago Ontañón

CoRR, 2019

Loading...