Guijin Son

According to our database1, Guijin Son authored at least 20 papers between 2023 and 2025.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
From KMMLU-Redux to KMMLU-Pro: A Professional Korean Benchmark Suite for LLM Evaluation.
CoRR, July, 2025

BenchHub: A Unified Benchmark Suite for Holistic and Customizable LLM Evaluation.
CoRR, June, 2025

Controlling Language Confusion in Multilingual LLMs.
CoRR, May, 2025

When AI Co-Scientists Fail: SPOT-a Benchmark for Automated Verification of Scientific Research.
CoRR, May, 2025

On the Robustness of Reward Models for Language Model Alignment.
CoRR, May, 2025

HRET: A Self-Evolving LLM Evaluation Toolkit for Korean.
CoRR, March, 2025

Won: Establishing Best Practices for Korean Financial NLP.
CoRR, March, 2025

Multi-Step Reasoning in Korean and the Emergent Mirage.
CoRR, January, 2025

Understand, Solve and Translate: Bridging the Multilingual Mathematical Reasoning Gap.
CoRR, January, 2025

KMMLU: Measuring Massive Multitask Language Understanding in Korean.
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models.
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

Linguistic Generalizability of Test-Time Scaling in Mathematical Reasoning.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024
Improving Fine-grained Visual Understanding in VLMs through Text-Only Training.
CoRR, 2024

MM-Eval: A Multilingual Meta-Evaluation Benchmark for LLM-as-a-Judge and Reward Models.
CoRR, 2024

LLM-as-a-Judge & Reward Model: What They Can and Cannot Do.
CoRR, 2024

ESG Classification by Implicit Rule Learning via GPT-4.
CoRR, 2024

HAE-RAE Bench: Evaluation of Korean Knowledge in Language Models.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

Multi-Task Inference: Can Large Language Models Follow Multiple Instructions at Once?
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023
Beyond Classification: Financial Reasoning in State-of-the-Art Language Models.
CoRR, 2023

Removing Non-Stationary Knowledge From Pre-Trained Language Models for Entity-Level Sentiment Classification in Finance.
CoRR, 2023


  Loading...