Guijin Son

According to our database1, Guijin Son authored at least 25 papers between 2023 and 2025.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
Revisiting the UID Hypothesis in LLM Reasoning Traces.
CoRR, October, 2025

Revisiting the Uniform Information Density Hypothesis in LLM Reasoning Traces.
CoRR, October, 2025

Pushing on Multilingual Reasoning Models with Language-Mixed Chain-of-Thought.
CoRR, October, 2025

KAIO: A Collection of More Challenging Korean Questions.
CoRR, September, 2025

Ko-PIQA: A Korean Physical Commonsense Reasoning Dataset with Cultural Context.
CoRR, September, 2025

From KMMLU-Redux to KMMLU-Pro: A Professional Korean Benchmark Suite for LLM Evaluation.
CoRR, July, 2025

BenchHub: A Unified Benchmark Suite for Holistic and Customizable LLM Evaluation.
CoRR, June, 2025

Controlling Language Confusion in Multilingual LLMs.
CoRR, May, 2025

When AI Co-Scientists Fail: SPOT-a Benchmark for Automated Verification of Scientific Research.
CoRR, May, 2025

On the Robustness of Reward Models for Language Model Alignment.
CoRR, May, 2025

HRET: A Self-Evolving LLM Evaluation Toolkit for Korean.
CoRR, March, 2025

Won: Establishing Best Practices for Korean Financial NLP.
CoRR, March, 2025

Multi-Step Reasoning in Korean and the Emergent Mirage.
CoRR, January, 2025

Understand, Solve and Translate: Bridging the Multilingual Mathematical Reasoning Gap.
CoRR, January, 2025

KMMLU: Measuring Massive Multitask Language Understanding in Korean.
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models.
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

Linguistic Generalizability of Test-Time Scaling in Mathematical Reasoning.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

2024
Improving Fine-grained Visual Understanding in VLMs through Text-Only Training.
CoRR, 2024

MM-Eval: A Multilingual Meta-Evaluation Benchmark for LLM-as-a-Judge and Reward Models.
CoRR, 2024

LLM-as-a-Judge & Reward Model: What They Can and Cannot Do.
CoRR, 2024

ESG Classification by Implicit Rule Learning via GPT-4.
CoRR, 2024

HAE-RAE Bench: Evaluation of Korean Knowledge in Language Models.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

Multi-Task Inference: Can Large Language Models Follow Multiple Instructions at Once?
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023
Beyond Classification: Financial Reasoning in State-of-the-Art Language Models.
CoRR, 2023

Removing Non-Stationary Knowledge From Pre-Trained Language Models for Entity-Level Sentiment Classification in Finance.
CoRR, 2023


  Loading...