Seonghoon Yang

According to our database1, Seonghoon Yang authored at least 4 papers between 2024 and 2025.

Collaborative distances:
  • Dijkstra number2 of five.
  • Erdős number3 of four.

Timeline

Legend:

Book  In proceedings  Article  PhD thesis  Dataset  Other 

Links

On csauthors.net:

Bibliography

2025
LP Data Pipeline: Lightweight, Purpose-driven Data Pipeline for Large Language Models.
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

Rethinking KenLM: Good and Bad Model Ensembles for Efficient Text Quality Filtering in Large Web Corpora.
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2025

2024
1 Trillion Token (1TT) Platform: A Novel Framework for Efficient Data Sharing and Compensation in Large Language Models.
CoRR, 2024

SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling.
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Track, 2024


  Loading...