Bosi Wen

Orcid: 0009-0001-9484-0662

According to our database1, Bosi Wen authored at least 20 papers between 2021 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
IF-RewardBench: Benchmarking Judge Models for Instruction-Following Evaluation.
CoRR, March, 2026

RLAR: An Agentic Reward System for Multi-task Reinforcement Learning on Large Language Models.
CoRR, March, 2026

RAVEL: Reasoning Agents for Validating and Evaluating LLM Text Synthesis.
CoRR, March, 2026

TraceSIR: A Multi-Agent Framework for Structured Analysis and Reporting of Agentic Execution Traces.
CoRR, March, 2026

RLMR: Reinforcement Learning with Mixed Rewards for Creative Writing.
Proceedings of the Fortieth AAAI Conference on Artificial Intelligence, 2026

2025
IF-CRITIC: Towards a Fine-Grained LLM Critic for Instruction-Following Evaluation.
CoRR, November, 2025

RLMR: Reinforcement Learning with Mixed Rewards for Creative Writing.
CoRR, August, 2025

HPSS: Heuristic Prompting Strategy Search for LLM Evaluators.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

Training Language Model to Critique for Better Refinement.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

CharacterBench: Benchmarking Character Customization of Large Language Models.
Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

2024
Benchmarking Complex Instruction-Following with Multiple Constraints Composition.
CoRR, 2024

Benchmarking Complex Instruction-Following with Multiple Constraints Composition.
Proceedings of the Advances in Neural Information Processing Systems 37: Annual Conference on Neural Information Processing Systems 2024, 2024

CharacterGLM: Customizing Social Characters with Large Language Models.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: EMNLP 2024, 2024

AlignBench: Benchmarking Chinese Alignment of Large Language Models.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

CritiqueLLM: Towards an Informative Critique Generation Model for Evaluation of Large Language Model Generation.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

ToMBench: Benchmarking Theory of Mind in Large Language Models.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023
AlignBench: Benchmarking Chinese Alignment of Large Language Models.
CoRR, 2023

CritiqueLLM: Scaling LLM-as-Critic for Effective and Explainable Evaluation of Large Language Model Generation.
CoRR, 2023

CharacterGLM: Customizing Chinese Conversational AI Characters with Large Language Models.
CoRR, 2023

2021
EVA: An Open-Domain Chinese Dialogue System with Large-Scale Generative Pre-Training.
CoRR, 2021


  Loading...