Guijin Son

According to our database1, Guijin Son authored at least 36 papers between 2023 and 2026.

Collaborative distances:

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2026
LaRA: Layer-wise Representation Analysis for Detecting Data Contamination in RL Post-Training.
CoRR, May, 2026

ResearchMath-14K: Scaling Research-Level Mathematics via Agents.
CoRR, May, 2026

Self-Improving CAD Generation Agents with Finite Element Analysis as Feedback.
CoRR, May, 2026

Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs.
CoRR, May, 2026

Pushing the Boundaries of Multiple Choice Evaluation to One Hundred Options.
CoRR, April, 2026

KMMMU: Evaluation of Massive Multi-discipline Multimodal Understanding in Korean Language and Context.
CoRR, April, 2026

Judging What We Cannot Solve: A Consequence-Based Approach for Oracle-Free Evaluation of Research-Level Math.
CoRR, February, 2026

What Users Leave Unsaid: Under-Specified Queries Limit Vision-Language Models.
CoRR, January, 2026

2025
Revisiting the UID Hypothesis in LLM Reasoning Traces.
CoRR, October, 2025

Revisiting the Uniform Information Density Hypothesis in LLM Reasoning Traces.
CoRR, October, 2025

Pushing on Multilingual Reasoning Models with Language-Mixed Chain-of-Thought.
CoRR, October, 2025

KAIO: A Collection of More Challenging Korean Questions.
CoRR, September, 2025

Ko-PIQA: A Korean Physical Commonsense Reasoning Dataset with Cultural Context.
CoRR, September, 2025

From KMMLU-Redux to KMMLU-Pro: A Professional Korean Benchmark Suite for LLM Evaluation.
CoRR, July, 2025

BenchHub: A Unified Benchmark Suite for Holistic and Customizable LLM Evaluation.
CoRR, June, 2025

When AI Co-Scientists Fail: SPOT-a Benchmark for Automated Verification of Scientific Research.
CoRR, May, 2025

HRET: A Self-Evolving LLM Evaluation Toolkit for Korean.
CoRR, March, 2025

Won: Establishing Best Practices for Korean Financial NLP.
CoRR, March, 2025

Multi-Step Reasoning in Korean and the Emergent Mirage.
CoRR, January, 2025

Understand, Solve and Translate: Bridging the Multilingual Mathematical Reasoning Gap.
CoRR, January, 2025

KMMLU: Measuring Massive Multitask Language Understanding in Korean.
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models.
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

On the Robustness of Reward Models for Language Model Alignment.
Proceedings of the Forty-second International Conference on Machine Learning, 2025

Multi-LMentry: Can Multilingual LLMs Solve Elementary Tasks Across Languages?
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

From KMMLU-Redux to Pro: A Professional Korean Benchmark Suite for LLM Evaluation.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2025, 2025

FINKRX: Establishing Best Practices for Korean Financial NLP.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track), 2025

Linguistic Generalizability of Test-Time Scaling in Mathematical Reasoning.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Controlling Language Confusion in Multilingual LLMs.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop), 2025

2024
Improving Fine-grained Visual Understanding in VLMs through Text-Only Training.
CoRR, 2024

MM-Eval: A Multilingual Meta-Evaluation Benchmark for LLM-as-a-Judge and Reward Models.
CoRR, 2024

LLM-as-a-Judge & Reward Model: What They Can and Cannot Do.
CoRR, 2024

ESG Classification by Implicit Rule Learning via GPT-4.
CoRR, 2024

HAE-RAE Bench: Evaluation of Korean Knowledge in Language Models.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

Multi-Task Inference: Can Large Language Models Follow Multiple Instructions at Once?
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023
Beyond Classification: Financial Reasoning in State-of-the-Art Language Models.
CoRR, 2023

Removing Non-Stationary Knowledge From Pre-Trained Language Models for Entity-Level Sentiment Classification in Finance.
CoRR, 2023


  Loading...