Saiyong Yang

According to our database¹, Saiyong Yang authored at least 21 papers between 2011 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

ADWIN: Adaptive Windows for Horizon-Aware On-Policy Distillation.

[BibT_eX]

[DOI]

CoRR, May, 2026

RLVR Datasets and Where to Find Them: Tracing Data Lineage for Better Training Data.

[BibT_eX]

[DOI]

CoRR, May, 2026

Learning to Foresee: Unveiling the Unlocking Efficiency of On-Policy Distillation.

[BibT_eX]

[DOI]

CoRR, May, 2026

Debiased Model-based Representations for Sample-efficient Continuous Control.

[BibT_eX]

[DOI]

CoRR, May, 2026

Listwise Policy Optimization: Group-based RLVR as Target-Projection on the LLM Response Simplex.

[BibT_eX]

[DOI]

CoRR, May, 2026

Tool Learning Needs Nothing More Than a Free 8B Language Model.

[BibT_eX]

[DOI]

CoRR, April, 2026

Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation.

[BibT_eX]

[DOI]

CoRR, February, 2026

Composition-RL: Compose Your Verifiable Prompts for Reinforcement Learning of Large Language Models.

[BibT_eX]

[DOI]

CoRR, February, 2026

Small Generalizable Prompt Predictive Models Can Steer Efficient RL Post-Training of Large Reasoning Models.

[BibT_eX]

[DOI]

CoRR, February, 2026

ORBIT: On-policy Exploration-Exploitation for Controllable Multi-Budget Reasoning.

[BibT_eX]

[DOI]

CoRR, January, 2026

Do Not Step Into the Same River Twice: Learning to Reason from Trial and Error.

[BibT_eX]

[DOI]

Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2026

2025

EntroPIC: Towards Stable Long-Term Training of LLMs via Entropy Stabilization with Proportional-Integral Control.

[BibT_eX]

[DOI]

CoRR, November, 2025

DRIVE: Data Curation Best Practices for Reinforcement Learning with Verifiable Reward in Competitive Code Generation.

[BibT_eX]

[DOI]

CoRR, November, 2025

Think Outside the Policy: In-Context Steered Policy Optimization.

[BibT_eX]

[DOI]

CoRR, October, 2025

Do Not Step Into the Same River Twice: Learning to Reason from Trial and Error.

[BibT_eX]

[DOI]

CoRR, October, 2025

LaSeR: Reinforcement Learning with Last-Token Self-Rewarding.

[BibT_eX]

[DOI]

CoRR, October, 2025

Thinking-Free Policy Initialization Makes Distilled Reasoning Models More Effective and Efficient Reasoners.

[BibT_eX]

[DOI]

CoRR, September, 2025

RAG-Targeted SFT Improves RAG-Enhanced Math Reasoning.

[BibT_eX]

[DOI]

Proceedings of the Natural Language Processing and Chinese Computing, 2025

Eliminating Retrieval Knowledge Conflicts: Cross-Validation Re-ranking with Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the International Joint Conference on Neural Networks, 2025

2012

A low-cost hand gesture human-computer interaction system.

[BibT_eX]

[DOI]

Leyuan Liu

Nong Sang

Saiyong Yang

Proceedings of the IEEE International Conference on Consumer Electronics, 2012

2011

Real-time skin color detection under rapidly changing illumination conditions.

[BibT_eX]

[DOI]

IEEE Trans. Consumer Electron., 2011

Saiyong Yang

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...