Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

MIRAGE: A Metric-Intensive Benchmark for Retrieval-Augmented Generation Evaluation.

[BibT_eX]

[DOI]

Chanhee Park

Hyeonseok Moon

Chanjun Park

Heuiseok Lim

Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2025, Albuquerque, New Mexico, USA, April 29, 2025

Dataverse: Open-Source ETL (Extract, Transform, Load) Pipeline for Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

Understanding LLM Development Through Longitudinal Study: Insights from the Open Ko-LLM Leaderboard.

[BibT_eX]

[DOI]

Chanjun Park

Hyeonwoo Kim

Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

Find the Intention of Instruction: Comprehensive Evaluation of Instruction Understanding for Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2025, Albuquerque, New Mexico, USA, April 29, 2025

Open Ko-LLM Leaderboard2: Bridging Foundational and Practical Evaluation for Korean LLMs.

[BibT_eX]

[DOI]

Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

CoME: An Unlearning-based Approach to Conflict-free Model Editing.

[BibT_eX]

[DOI]

Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

FLEX: A Benchmark for Evaluating Robustness of Fairness in Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2025, Albuquerque, New Mexico, USA, April 29, 2025

LCIRC: A Recurrent Compression Approach for Efficient Long-form Context and Query Dependent Modeling in LLMs.

[BibT_eX]

[DOI]

Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

MultiDocFusion : Hierarchical and Multimodal Chunking Pipeline for Enhanced RAG on Long Industrial Documents.

[BibT_eX]

[DOI]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

HAWK: Highlighting Entity-aware Knowledge for Alleviating Information Sparsity in Long Contexts.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2025, 2025

Benchmark Profiling: Mechanistic Diagnosis of LLM Benchmarks.

[BibT_eX]

[DOI]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

Can Code-Switched Texts Activate a Knowledge Switch in LLMs? A Case Study on English-Korean Code-Switching.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2025, 2025

LP Data Pipeline: Lightweight, Purpose-driven Data Pipeline for Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

ZEBRA: Leveraging Model-Behavioral Knowledge for Zero-Annotation Preference Dataset Construction.

[BibT_eX]

[DOI]

Jeesu Jung

Chanjun Park

Sangkeun Jung

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2025, 2025

Mixture-of-Clustered-Experts: Advancing Expert Specialization and Generalization in Instruction Tuning.

[BibT_eX]

[DOI]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

Representing the Under-Represented: Cultural and Core Capability Benchmarks for Developing Thai Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the 31st International Conference on Computational Linguistics, 2025

sDPO: Don't Use Your Data All at Once.

[BibT_eX]

[DOI]

Proceedings of the 31st International Conference on Computational Linguistics, 2025

<i>HealthGenie</i>: A Knowledge-Driven LLM Framework for Tailored Dietary Guidance.

[BibT_eX]

[DOI]

Proceedings of the 34th ACM International Conference on Information and Knowledge Management, 2025

AGENTiGraph: A Multi-Agent Knowledge Graph Framework for Interactive, Domain-Specific LLM Chatbots.

[BibT_eX]

[DOI]

Luis Marquez-Carpintero

Proceedings of the 34th ACM International Conference on Information and Knowledge Management, 2025

Rethinking KenLM: Good and Bad Model Ensembles for Efficient Text Quality Filtering in Large Web Corpora.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2025

Enhancing Automatic Term Extraction with Large Language Models via Syntactic Retrieval.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2025

From Ambiguity to Accuracy: The Transformative Effect of Coreference Resolution on Retrieval-Augmented Generation systems.

[BibT_eX]

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop), 2025

2024

Exploring Coding Spot: Understanding Parametric Contributions to LLM Coding Performance.

[BibT_eX]

[DOI]

CoRR, 2024

InstaTrans: An Instruction-Aware Translation Framework for Non-English Instruction Datasets.

[BibT_eX]

[DOI]

Yungi Kim

Chanjun Park

CoRR, 2024

1 Trillion Token (1TT) Platform: A Novel Framework for Efficient Data Sharing and Compensation in Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

ChatLang-8: An LLM-Based Synthetic Data Generation Framework for Grammatical Error Correction.

[BibT_eX]

[DOI]

Jeiyoon Park

Chanjun Park

Heuiseok Lim

CoRR, 2024

Enhancing Consistency and Role-Specific Knowledge Capturing by Rebuilding Fictional Character's Persona.

[BibT_eX]

[DOI]

Jeiyoon Park

Chanjun Park

Heuiseok Lim

CoRR, 2024

Model-Based Data-Centric AI: Bridging the Divide Between Academic Ideals and Industrial Pragmatism.

[BibT_eX]

[DOI]

Chanjun Park

Minsoo Khang

Dahyun Kim

CoRR, 2024

Toward Practical Automatic Speech Recognition and Post-Processing: a Call for Explainable Error Benchmark Guideline.

[BibT_eX]

[DOI]

CoRR, 2024

Exploiting Hanja-Based Resources in Processing Korean Historic Documents Written by Common Literati.

[BibT_eX]

[DOI]

IEEE Access, 2024

Exploring Inherent Biases in LLMs within Korean Social Context: A Comparative Analysis of ChatGPT and GPT-4.

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop, 2024

SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling.

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Track, 2024

Explainable CED: A Dataset for Explainable Critical Error Detection in Machine Translation.

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop, 2024

Translation of Multifaceted Data without Re-Training of Machine Translation Systems.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

Search if you don't know! Knowledge-Augmented Korean Grammatical Error Correction with Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

Where am I? Large Language Models Wandering between Semantics and Structures in Long Contexts.

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Evalverse: Unified and Accessible Library for Large Language Model Evaluation.

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: EMNLP 2024, 2024

SAAS: Solving Ability Amplification Strategy for Enhanced Mathematical Reasoning in Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: EMNLP 2024, 2024

Hyper-BTS Dataset: Scalability and Enhanced Analysis of Back TranScription (BTS) for ASR Post-Processing.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EACL 2024, 2024

Generative Interpretation: Toward Human-Like Evaluation for Educational Question-Answer Pair Generation.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EACL 2024, 2024

Leveraging Pre-existing Resources for Data-Efficient Counter-Narrative Generation in Korean.

[BibT_eX]

[DOI]

Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

Detecting Critical Errors Considering Cross-Cultural Factors in English-Korean Translation.

[BibT_eX]

[DOI]

Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

KoCommonGEN v2: A Benchmark for Navigating Korean Commonsense Reasoning Challenges in Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2024

Open Ko-LLM Leaderboard: Evaluating Large Language Models in Korean with Ko-H5 Benchmark.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

Length-aware Byte Pair Encoding for Mitigating Over-segmentation in Korean Machine Translation.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2024

2023

Doubts on the reliability of parallel corpus filtering.

[BibT_eX]

[DOI]

Expert Syst. Appl., December, 2023

Data-Driven Approach for Formality-Sensitive Machine Translation: Language-Specific Handling and Synthetic Data Generation.

[BibT_eX]

[DOI]

CoRR, 2023

Knowledge Graph-Augmented Korean Generative Commonsense Reasoning.

[BibT_eX]

[DOI]

CoRR, 2023

Synthetic Alone: Exploring the Dark Side of Synthetic Data for Grammatical Error Correction.

[BibT_eX]

[DOI]

CoRR, 2023

Transcending Traditional Boundaries: Leveraging Inter-Annotator Agreement (IAA) for Enhancing Data Management Operations (DMOps).

[BibT_eX]

[DOI]

CoRR, 2023

Inter-Annotator Agreement in the Wild: Uncovering Its Emerging Roles and Considerations in Real-World Scenarios.

[BibT_eX]

[DOI]

NamHyeok Kim

Chanjun Park

CoRR, 2023

Self-Improving-Leaderboard(SIL): A Call for Real-World Centric Natural Language Processing Leaderboards.

[BibT_eX]

[DOI]

CoRR, 2023

DMOps: Data Management Operation and Recipes.

[BibT_eX]

[DOI]

Eujeong Choi

Chanjun Park

CoRR, 2023

Uncovering the Risks and Drawbacks Associated With the Use of Synthetic Data for Grammatical Error Correction.

[BibT_eX]

[DOI]

IEEE Access, 2023

Improving Formality-Sensitive Machine Translation Using Data-Centric Approaches and Prompt Engineering.

[BibT_eX]

[DOI]

Proceedings of the 20th International Conference on Spoken Language Translation, 2023

Informative Evidence-guided Prompt-based Fine-tuning for English-Korean Critical Error Detection.

[BibT_eX]

[DOI]

Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, 2023

Alternative Speech: Complementary Method to Counter-Narrative for Better Discourse.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Data Mining, 2023

CHEF in the Language Kitchen: A Generative Data Augmentation Leveraging Korean Morpheme Ingredients.

[BibT_eX]

[DOI]

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

KEBAP: Korean Error Explainable Benchmark Dataset for ASR and Post-processing.

[BibT_eX]

[DOI]

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

PEEP-Talk: A Situational Dialogue-based Chatbot for English Education.

[BibT_eX]

[DOI]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), 2023

2022

PU-GEN: Enhancing generative commonsense reasoning for language models with human-centered knowledge.

[BibT_eX]

[DOI]

Knowl. Based Syst., 2022

Language Chameleon: Transformation analysis between languages using Cross-lingual Post-training based on Pre-trained language models.

[BibT_eX]

[DOI]

CoRR, 2022

Empirical study on BlenderBot 2.0 Errors Analysis in terms of Model, Data and User-Centric Approach.

[BibT_eX]

[DOI]

CoRR, 2022

AI for Patents: A Novel Yet Effective and Efficient Framework for Patent Analysis.

[BibT_eX]

[DOI]

IEEE Access, 2022

Plain Template Insertion: Korean-Prompt-Based Engineering for Few-Shot Learners.

[BibT_eX]

[DOI]

IEEE Access, 2022

Mimicking Infants' Bilingual Language Acquisition for Domain Specialized Neural Machine Translation.

[BibT_eX]

[DOI]

IEEE Access, 2022

An Automatic Post Editing With Efficient and Simple Data Generation Method.

[BibT_eX]

[DOI]

IEEE Access, 2022

K-NCT: Korean Neural Grammatical Error Correction Gold-Standard Test Set Using Novel Error Type Classification Criteria.

[BibT_eX]

[DOI]

IEEE Access, 2022

Utilization Strategy of User Engagements in Korean Fake News Detection.

[BibT_eX]

[DOI]

IEEE Access, 2022

Word-Level Quality Estimation for Korean-English Neural Machine Translation.

[BibT_eX]

[DOI]

IEEE Access, 2022

KU X Upstage's Submission for the WMT22 Quality Estimation: Critical Error Detection Shared Task.

[BibT_eX]

[DOI]

Proceedings of the Seventh Conference on Machine Translation, 2022

A Dog Is Passing Over The Jet? A Text-Generation Dataset for Korean Commonsense Reasoning and Evaluation.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2022, 2022

Priming Ancient Korean Neural Machine Translation.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022

FreeTalky: Don't Be Afraid! Conversations Made Easier by a Humanoid Robot using Persona-based Dialogue.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022

Empirical Analysis of Noising Scheme based Synthetic Data Generation for Automatic Post-editing.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022

PicTalky: Augmentative and Alternative Communication for Language Developmental Disabilities.

[BibT_eX]

[DOI]

Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing, 2022

QUAK: A Synthetic Quality Estimation Dataset for Korean-English Neural Machine Translation.

[BibT_eX]

[DOI]

Proceedings of the 29th International Conference on Computational Linguistics, 2022

2021

Neural spelling correction: translating incorrect sentences to correct sentences for multimedia.

[BibT_eX]

[DOI]

Multim. Tools Appl., 2021

A Self-Supervised Automatic Post-Editing Data Generation Tool.

[BibT_eX]

[DOI]

CoRR, 2021

A New Tool for Efficiently Generating Quality Estimation Datasets.

[BibT_eX]

[DOI]

CoRR, 2021

Automatic Knowledge Augmentation for Generative Commonsense Reasoning.

[BibT_eX]

[DOI]

CoRR, 2021

How should human translation coexist with NMT? Efficient tool for building high quality parallel corpus.

[BibT_eX]

[DOI]

CoRR, 2021

Empirical Analysis of Korean Public AI Hub Parallel Corpora and in-depth Analysis using LIWC.

[BibT_eX]

[DOI]

CoRR, 2021

Who says like a style of Vitamin: Towards Syntax-Aware DialogueSummarization using Multi-task Learning.

[BibT_eX]

[DOI]

CoRR, 2021

PicTalky: Augmentative and Alternative Communication Software for Language Developmental Disabilities.

[BibT_eX]

[DOI]

CoRR, 2021

An Empirical Study on Automatic Post Editing for Neural Machine Translation.

[BibT_eX]

[DOI]

IEEE Access, 2021

Who Speaks Like a Style of Vitamin: Towards Syntax-Aware Dialogue Summarization Using Multi-Task Learning.

[BibT_eX]

[DOI]

IEEE Access, 2021

Grounded Vocabulary for Image Retrieval Using a Modified Multi-Generator Generative Adversarial Network.

[BibT_eX]

[DOI]

IEEE Access, 2021

Should we find another model?: Improving Neural Machine Translation Performance with ONE-Piece Tokenization Method without Model Modification.

[BibT_eX]

[DOI]

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Papers, 2021

BTS: Back TranScription for Speech-to-Text Post-Processor using Text-to-Speech-to-Text.

[BibT_eX]

[DOI]

Proceedings of the 8th Workshop on Asian Translation, 2021

2020