We introduce Korean Language Understanding Evaluation (KLUE) benchmark . KLUE is a collection of 8 Korean natural language understanding (NLU) tasks, including Topic Classification, Semantic Textual Similarity, Natural LanguageInference, Named Entity Recognition, Relation Extraction, Dependency Parsing,Machine Reading Comprehension, and Dialogue State Tracking . We find KLUE-RoBERTa-largeoutperforms other baselines, including multilingual PLMs and existing Korean PLMs . In addition to accelerating Korean NLPresearch, our comprehensive documentation on creating KLUE will facilitatecreating similar resources for other languages in the future . We see minimal degradation in performance even when we replace personally identifiable information from the pretrainingcorpus, suggesting that privacy and NLU capability are not at odds with eachother . We also find that using BPE tokenization in combination withmorpheme-level pre-tokenization is effective in tasks involving morphemes-level tagging, detection and generation is effective . The new benchmark suite is available at this URL (https://klue-benchmark.com/).

Author(s) : Sungjoon Park, Jihyung Moon, Sungdong Kim, Won Ik Cho, Jiyoon Han, Jangwon Park, Chisung Song, Junseong Kim, Yongsook Song, Taehwan Oh, Joohong Lee, Juhyun Oh, Sungwon Lyu, Younghoon Jeong, Inkwon Lee, Sangwoo Seo, Dongjun Lee, Hyunwoo Kim, Myeonghwa Lee, Seongbo Jang, Seungwon Do, Sunkyoung Kim, Kyungtae Lim, Jongwon Lee, Kyumin Park, Jamin Shin, Seonghyun Kim, Lucy Park, Alice Oh, Jungwoo Ha, Kyunghyun Cho Alice Oh Jungwoo Ha Kyunghyun Cho

Links : PDF - Abstract

Code :
Coursera

Keywords : klue - korean - language - understanding - benchmark -

Leave a Reply

Your email address will not be published. Required fields are marked *