Bosi Wen

Orcid: 0009-0001-9484-0662

According to our database¹, Bosi Wen authored at least 20 papers between 2021 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

RLAR: An Agentic Reward System for Multi-task Reinforcement Learning on Large Language Models.

[BibT_eX]

[DOI]

CoRR, March, 2026

RAVEL: Reasoning Agents for Validating and Evaluating LLM Text Synthesis.

[BibT_eX]

[DOI]

CoRR, March, 2026

TraceSIR: A Multi-Agent Framework for Structured Analysis and Reporting of Agentic Execution Traces.

[BibT_eX]

[DOI]

CoRR, March, 2026

IF-RewardBench: Benchmarking Judge Models for Instruction-Following Evaluation.

[BibT_eX]

[DOI]

Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2026

IF-CRITIC: Towards a Fine-Grained LLM Critic for Instruction-Following Evaluation.

[BibT_eX]

[DOI]

Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2026

RLMR: Reinforcement Learning with Mixed Rewards for Creative Writing.

[BibT_eX]

[DOI]

Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025

RLMR: Reinforcement Learning with Mixed Rewards for Creative Writing.

[BibT_eX]

[DOI]

CoRR, August, 2025

HPSS: Heuristic Prompting Strategy Search for LLM Evaluators.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2025

Training Language Model to Critique for Better Refinement.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2025

CharacterBench: Benchmarking Character Customization of Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024

Benchmarking Complex Instruction-Following with Multiple Constraints Composition.

[BibT_eX]

[DOI]

CoRR, 2024

Benchmarking Complex Instruction-Following with Multiple Constraints Composition.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

CharacterGLM: Customizing Social Characters with Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: EMNLP 2024, 2024

AlignBench: Benchmarking Chinese Alignment of Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

CritiqueLLM: Towards an Informative Critique Generation Model for Evaluation of Large Language Model Generation.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

ToMBench: Benchmarking Theory of Mind in Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023

AlignBench: Benchmarking Chinese Alignment of Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

CritiqueLLM: Scaling LLM-as-Critic for Effective and Explainable Evaluation of Large Language Model Generation.

[BibT_eX]

[DOI]

CoRR, 2023

CharacterGLM: Customizing Chinese Conversational AI Characters with Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

2021

EVA: An Open-Domain Chinese Dialogue System with Large-Scale Generative Pre-Training.

[BibT_eX]

[DOI]

CoRR, 2021

Bosi Wen

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...