Chanjun Park

Orcid: 0000-0002-7200-9632

Affiliations:
  • Korea University, Seoul, South Korea


According to our database1, Chanjun Park authored at least 101 papers between 2020 and 2026.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

Online presence:

On csauthors.net:

Bibliography

2026
Omanic: Towards Step-wise Evaluation of Multi-hop Reasoning in Large Language Models.
CoRR, March, 2026

LANGSAE EDITING: Improving Multilingual Information Retrieval via Post-hoc Language Identity Removal.
CoRR, January, 2026

2025
KITE: A Benchmark for Evaluating Korean Instruction-Following Abilities in Large Language Models.
CoRR, October, 2025

Mixture-of-Clustered-Experts: Advancing Expert Specialization and Generalization in Instruction Tuning.
CoRR, September, 2025

HealthGenie: Empowering Users with Healthy Dietary Guidance through Knowledge Graph and Large Language Models.
CoRR, April, 2025

Debate Only When Necessary: Adaptive Multiagent Collaboration for Efficient LLM Reasoning.
CoRR, April, 2025

Like Father, Like Son: Kinship-Aware Preference Mapping (KARMA) for Automatic Alignment in Large Language Models.
CoRR, February, 2025

An analysis on language transfer of pre-trained language model with cross-lingual post-training.
Expert Syst. Appl., 2025

CharacterGPT: A Persona Reconstruction Framework for Role-Playing Agents.
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

MIRAGE: A Metric-Intensive Benchmark for Retrieval-Augmented Generation Evaluation.
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2025, Albuquerque, New Mexico, USA, April 29, 2025

Dataverse: Open-Source ETL (Extract, Transform, Load) Pipeline for Large Language Models.
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

Understanding LLM Development Through Longitudinal Study: Insights from the Open Ko-LLM Leaderboard.
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

Find the Intention of Instruction: Comprehensive Evaluation of Instruction Understanding for Large Language Models.
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2025, Albuquerque, New Mexico, USA, April 29, 2025

Open Ko-LLM Leaderboard2: Bridging Foundational and Practical Evaluation for Korean LLMs.
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

CoME: An Unlearning-based Approach to Conflict-free Model Editing.
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

FLEX: A Benchmark for Evaluating Robustness of Fairness in Large Language Models.
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2025, Albuquerque, New Mexico, USA, April 29, 2025

LCIRC: A Recurrent Compression Approach for Efficient Long-form Context and Query Dependent Modeling in LLMs.
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

MultiDocFusion : Hierarchical and Multimodal Chunking Pipeline for Enhanced RAG on Long Industrial Documents.
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

HAWK: Highlighting Entity-aware Knowledge for Alleviating Information Sparsity in Long Contexts.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2025, 2025

Benchmark Profiling: Mechanistic Diagnosis of LLM Benchmarks.
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

Can Code-Switched Texts Activate a Knowledge Switch in LLMs? A Case Study on English-Korean Code-Switching.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2025, 2025

LP Data Pipeline: Lightweight, Purpose-driven Data Pipeline for Large Language Models.
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

ZEBRA: Leveraging Model-Behavioral Knowledge for Zero-Annotation Preference Dataset Construction.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2025, 2025

Mixture-of-Clustered-Experts: Advancing Expert Specialization and Generalization in Instruction Tuning.
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

Representing the Under-Represented: Cultural and Core Capability Benchmarks for Developing Thai Large Language Models.
Proceedings of the 31st International Conference on Computational Linguistics, 2025

sDPO: Don't Use Your Data All at Once.
Proceedings of the 31st International Conference on Computational Linguistics, 2025

<i>HealthGenie</i>: A Knowledge-Driven LLM Framework for Tailored Dietary Guidance.
Proceedings of the 34th ACM International Conference on Information and Knowledge Management, 2025

AGENTiGraph: A Multi-Agent Knowledge Graph Framework for Interactive, Domain-Specific LLM Chatbots.
Proceedings of the 34th ACM International Conference on Information and Knowledge Management, 2025

Rethinking KenLM: Good and Bad Model Ensembles for Efficient Text Quality Filtering in Large Web Corpora.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2025

Enhancing Automatic Term Extraction with Large Language Models via Syntactic Retrieval.
Proceedings of the Findings of the Association for Computational Linguistics, 2025

From Ambiguity to Accuracy: The Transformative Effect of Coreference Resolution on Retrieval-Augmented Generation systems.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop), 2025

2024
Exploring Coding Spot: Understanding Parametric Contributions to LLM Coding Performance.
CoRR, 2024

InstaTrans: An Instruction-Aware Translation Framework for Non-English Instruction Datasets.
CoRR, 2024

1 Trillion Token (1TT) Platform: A Novel Framework for Efficient Data Sharing and Compensation in Large Language Models.
CoRR, 2024

ChatLang-8: An LLM-Based Synthetic Data Generation Framework for Grammatical Error Correction.
CoRR, 2024

Enhancing Consistency and Role-Specific Knowledge Capturing by Rebuilding Fictional Character's Persona.
CoRR, 2024

Model-Based Data-Centric AI: Bridging the Divide Between Academic Ideals and Industrial Pragmatism.
CoRR, 2024

Toward Practical Automatic Speech Recognition and Post-Processing: a Call for Explainable Error Benchmark Guideline.
CoRR, 2024

Exploiting Hanja-Based Resources in Processing Korean Historic Documents Written by Common Literati.
IEEE Access, 2024

Exploring Inherent Biases in LLMs within Korean Social Context: A Comparative Analysis of ChatGPT and GPT-4.
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop, 2024

SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling.
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Track, 2024

Explainable CED: A Dataset for Explainable Critical Error Detection in Machine Translation.
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop, 2024

Translation of Multifaceted Data without Re-Training of Machine Translation Systems.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

Search if you don't know! Knowledge-Augmented Korean Grammatical Error Correction with Large Language Models.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

Where am I? Large Language Models Wandering between Semantics and Structures in Long Contexts.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Evalverse: Unified and Accessible Library for Large Language Model Evaluation.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: EMNLP 2024, 2024

SAAS: Solving Ability Amplification Strategy for Enhanced Mathematical Reasoning in Large Language Models.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: EMNLP 2024, 2024

Hyper-BTS Dataset: Scalability and Enhanced Analysis of Back TranScription (BTS) for ASR Post-Processing.
Proceedings of the Findings of the Association for Computational Linguistics: EACL 2024, 2024

Generative Interpretation: Toward Human-Like Evaluation for Educational Question-Answer Pair Generation.
Proceedings of the Findings of the Association for Computational Linguistics: EACL 2024, 2024

Leveraging Pre-existing Resources for Data-Efficient Counter-Narrative Generation in Korean.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

Detecting Critical Errors Considering Cross-Cultural Factors in English-Korean Translation.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

KoCommonGEN v2: A Benchmark for Navigating Korean Commonsense Reasoning Challenges in Large Language Models.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

Open Ko-LLM Leaderboard: Evaluating Large Language Models in Korean with Ko-H5 Benchmark.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

Length-aware Byte Pair Encoding for Mitigating Over-segmentation in Korean Machine Translation.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

2023
Doubts on the reliability of parallel corpus filtering.
Expert Syst. Appl., December, 2023

Data-Driven Approach for Formality-Sensitive Machine Translation: Language-Specific Handling and Synthetic Data Generation.
CoRR, 2023

Knowledge Graph-Augmented Korean Generative Commonsense Reasoning.
CoRR, 2023

Synthetic Alone: Exploring the Dark Side of Synthetic Data for Grammatical Error Correction.
CoRR, 2023

Transcending Traditional Boundaries: Leveraging Inter-Annotator Agreement (IAA) for Enhancing Data Management Operations (DMOps).
CoRR, 2023

Inter-Annotator Agreement in the Wild: Uncovering Its Emerging Roles and Considerations in Real-World Scenarios.
CoRR, 2023

Self-Improving-Leaderboard(SIL): A Call for Real-World Centric Natural Language Processing Leaderboards.
CoRR, 2023

DMOps: Data Management Operation and Recipes.
CoRR, 2023

Uncovering the Risks and Drawbacks Associated With the Use of Synthetic Data for Grammatical Error Correction.
IEEE Access, 2023

Improving Formality-Sensitive Machine Translation Using Data-Centric Approaches and Prompt Engineering.
Proceedings of the 20th International Conference on Spoken Language Translation, 2023

Informative Evidence-guided Prompt-based Fine-tuning for English-Korean Critical Error Detection.
Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, 2023

Alternative Speech: Complementary Method to Counter-Narrative for Better Discourse.
Proceedings of the IEEE International Conference on Data Mining, 2023

CHEF in the Language Kitchen: A Generative Data Augmentation Leveraging Korean Morpheme Ingredients.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

KEBAP: Korean Error Explainable Benchmark Dataset for ASR and Post-processing.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

PEEP-Talk: A Situational Dialogue-based Chatbot for English Education.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), 2023

2022
PU-GEN: Enhancing generative commonsense reasoning for language models with human-centered knowledge.
Knowl. Based Syst., 2022

Language Chameleon: Transformation analysis between languages using Cross-lingual Post-training based on Pre-trained language models.
CoRR, 2022

Empirical study on BlenderBot 2.0 Errors Analysis in terms of Model, Data and User-Centric Approach.
CoRR, 2022

AI for Patents: A Novel Yet Effective and Efficient Framework for Patent Analysis.
IEEE Access, 2022

Plain Template Insertion: Korean-Prompt-Based Engineering for Few-Shot Learners.
IEEE Access, 2022

Mimicking Infants' Bilingual Language Acquisition for Domain Specialized Neural Machine Translation.
IEEE Access, 2022

An Automatic Post Editing With Efficient and Simple Data Generation Method.
IEEE Access, 2022

K-NCT: Korean Neural Grammatical Error Correction Gold-Standard Test Set Using Novel Error Type Classification Criteria.
IEEE Access, 2022

Utilization Strategy of User Engagements in Korean Fake News Detection.
IEEE Access, 2022

Word-Level Quality Estimation for Korean-English Neural Machine Translation.
IEEE Access, 2022

KU X Upstage's Submission for the WMT22 Quality Estimation: Critical Error Detection Shared Task.
Proceedings of the Seventh Conference on Machine Translation, 2022

A Dog Is Passing Over The Jet? A Text-Generation Dataset for Korean Commonsense Reasoning and Evaluation.
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2022, 2022

Priming Ancient Korean Neural Machine Translation.
Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022

FreeTalky: Don't Be Afraid! Conversations Made Easier by a Humanoid Robot using Persona-based Dialogue.
Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022

Empirical Analysis of Noising Scheme based Synthetic Data Generation for Automatic Post-editing.
Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022

PicTalky: Augmentative and Alternative Communication for Language Developmental Disabilities.
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing, 2022

QUAK: A Synthetic Quality Estimation Dataset for Korean-English Neural Machine Translation.
Proceedings of the 29th International Conference on Computational Linguistics, 2022

2021
Neural spelling correction: translating incorrect sentences to correct sentences for multimedia.
Multim. Tools Appl., 2021

A Self-Supervised Automatic Post-Editing Data Generation Tool.
CoRR, 2021

A New Tool for Efficiently Generating Quality Estimation Datasets.
CoRR, 2021

Automatic Knowledge Augmentation for Generative Commonsense Reasoning.
CoRR, 2021

How should human translation coexist with NMT? Efficient tool for building high quality parallel corpus.
CoRR, 2021

Empirical Analysis of Korean Public AI Hub Parallel Corpora and in-depth Analysis using LIWC.
CoRR, 2021

Who says like a style of Vitamin: Towards Syntax-Aware DialogueSummarization using Multi-task Learning.
CoRR, 2021

PicTalky: Augmentative and Alternative Communication Software for Language Developmental Disabilities.
CoRR, 2021

An Empirical Study on Automatic Post Editing for Neural Machine Translation.
IEEE Access, 2021

Who Speaks Like a Style of Vitamin: Towards Syntax-Aware Dialogue Summarization Using Multi-Task Learning.
IEEE Access, 2021

Grounded Vocabulary for Image Retrieval Using a Modified Multi-Generator Generative Adversarial Network.
IEEE Access, 2021

Should we find another model?: Improving Neural Machine Translation Performance with ONE-Piece Tokenization Method without Model Modification.
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Papers, 2021

BTS: Back TranScription for Speech-to-Text Post-Processor using Text-to-Speech-to-Text.
Proceedings of the 8th Workshop on Asian Translation, 2021

2020
Comparison of the Evaluation Metrics for Neural Grammatical Error Correction With Overcorrection.
IEEE Access, 2020

Ancient Korean Neural Machine Translation.
IEEE Access, 2020


  Loading...