We stand with Ukraine

We stand with Ukraine

Guijin Son

According to our database¹, Guijin Son authored at least 36 papers between 2023 and 2026.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book In proceedings Article PhD thesis Dataset Other

Links

On csauthors.net:

Bibliography

2026

LaRA: Layer-wise Representation Analysis for Detecting Data Contamination in RL Post-Training.

[DOI]

,

,

,

,

,

CoRR, May, 2026

ResearchMath-14K: Scaling Research-Level Mathematics via Agents.

[DOI]

,

,

,

,

,

CoRR, May, 2026

Self-Improving CAD Generation Agents with Finite Element Analysis as Feedback.

[DOI]

,

,

,

,

CoRR, May, 2026

Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs.

[DOI]

,

,

Catherine Arnett

,

,

,

,

,

,

,

,

,

,

Seunghyeok Hong

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Alexander B. Ivanov

,

Boboev Muhammadjon

,

,

Christian Stump

,

Cooper R. Anderson

,

,

,

,

,

,

,

,

,

,

,

,

Inomov Mashrafdzhon

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Nicolas Libedinsky

,

Rafal Marcin Lochowski

,

Raphaël Lachièze-Rey

,

Robert Auffarth

,

,

,

,

,

,

,

,

,

,

,

Zoltán Kovács

CoRR, May, 2026

Pushing the Boundaries of Multiple Choice Evaluation to One Hundred Options.

[DOI]

,

CoRR, April, 2026

KMMMU: Evaluation of Massive Multi-discipline Multimodal Understanding in Korean Language and Context.

[DOI]

,

,

,

,

,

,

CoRR, April, 2026

Judging What We Cannot Solve: A Consequence-Based Approach for Oracle-Free Evaluation of Research-Level Math.

[DOI]

,

,

Hitesh Laxmichand Patel

,

,

,

,

,

CoRR, February, 2026

What Users Leave Unsaid: Under-Specified Queries Limit Vision-Language Models.

[DOI]

,

,

,

,

,

,

,

,

Seunghyeok Hong

,

CoRR, January, 2026

2025

Revisiting the UID Hypothesis in LLM Reasoning Traces.

[DOI]

,

,

CoRR, October, 2025

Revisiting the Uniform Information Density Hypothesis in LLM Reasoning Traces.

[DOI]

,

,

CoRR, October, 2025

Pushing on Multilingual Reasoning Models with Language-Mixed Chain-of-Thought.

[DOI]

,

,

Hitesh Laxmichand Patel

,

,

,

,

,

,

,

,

,

CoRR, October, 2025

KAIO: A Collection of More Challenging Korean Questions.

[DOI]

,

,

,

CoRR, September, 2025

Ko-PIQA: A Korean Physical Commonsense Reasoning Dataset with Cultural Context.

[DOI]

,

,

CoRR, September, 2025

From KMMLU-Redux to KMMLU-Pro: A Professional Korean Benchmark Suite for LLM Evaluation.

[DOI]

,

,

,

,

,

CoRR, July, 2025

BenchHub: A Unified Benchmark Suite for Holistic and Customizable LLM Evaluation.

[DOI]

,

,

,

Hitesh Laxmichand Patel

,

,

CoRR, June, 2025

When AI Co-Scientists Fail: SPOT-a Benchmark for Automated Verification of Scientific Research.

[DOI]

,

,

,

,

,

,

,

,

,

,

Stella Biderman

CoRR, May, 2025

HRET: A Self-Evolving LLM Evaluation Toolkit for Korean.

[DOI]

,

,

,

,

Seunghyeok Hong

,

,

,

,

CoRR, March, 2025

Won: Establishing Best Practices for Korean Financial NLP.

[DOI]

,

,

,

CoRR, March, 2025

Multi-Step Reasoning in Korean and the Emergent Mirage.

[DOI]

,

,

CoRR, January, 2025

Understand, Solve and Translate: Bridging the Multilingual Mathematical Reasoning Gap.

[DOI]

,

,

CoRR, January, 2025

KMMLU: Measuring Massive Multitask Language Understanding in Korean.

[DOI]

,

,

,

,

Niklas Muennighoff

,

,

,

,

Stella Biderman

Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models.

[DOI]

,

,

,

,

,

,

,

,

Sheikh Shafayat

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Bill Yuchen Lin

,

,

,

,

,

Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

On the Robustness of Reward Models for Language Model Alignment.

[DOI]

,

,

,

,

,

,

,

Proceedings of the Forty-second International Conference on Machine Learning, 2025

Multi-LMentry: Can Multilingual LLMs Solve Elementary Tasks Across Languages?

[DOI]

,

Javier Aula-Blasco

,

,

,

,

Silvia Paniagua Suárez

,

,

Malte Ostendorff

,

,

,

Aitor Gonzalez-Agirre

,

Roberto Navigli

,

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

From KMMLU-Redux to Pro: A Professional Korean Benchmark Suite for LLM Evaluation.

[DOI]

,

,

,

,

,

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2025, 2025

FINKRX: Establishing Best Practices for Korean Financial NLP.

[DOI]

,

,

,

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track), 2025

Linguistic Generalizability of Test-Time Scaling in Mathematical Reasoning.

[DOI]

,

,

,

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Controlling Language Confusion in Multilingual LLMs.

[DOI]

,

,

,

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop), 2025

2024

Improving Fine-grained Visual Understanding in VLMs through Text-Only Training.

[DOI]

,

,

,

,

Seunghyeok Hong

CoRR, 2024

MM-Eval: A Multilingual Meta-Evaluation Benchmark for LLM-as-a-Judge and Reward Models.

[DOI]

,

,

,

Javier Aula-Blasco

,

,

,

Shayekh Bin Islam

,

Jaume Prats-Cristià

,

Lucía Tormo-Bañuelos

,

CoRR, 2024

LLM-as-a-Judge & Reward Model: What They Can and Cannot Do.

[DOI]

,

,

,

,

Seunghyeok Hong

CoRR, 2024

ESG Classification by Implicit Rule Learning via GPT-4.

[DOI]

,

,

,

,

CoRR, 2024

HAE-RAE Bench: Evaluation of Korean Knowledge in Language Models.

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

Multi-Task Inference: Can Large Language Models Follow Multiple Instructions at Once?

[DOI]

,

,

,

,

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023

Beyond Classification: Financial Reasoning in State-of-the-Art Language Models.

[DOI]

,

,

,

,

CoRR, 2023

Removing Non-Stationary Knowledge From Pre-Trained Language Models for Entity-Level Sentiment Classification in Finance.

[DOI]

,

,

,

CoRR, 2023

Loading...