Big languages such as English and Finnish have many natural languageprocessing (NLP) resources and models, but this is not the case for low-resourced and endangered languages as such resources are so scarce . In this paper, we present a method for constructing word embeddings for endangered languages . The endangered languages we work with here are Erzya, Moksha,Komi-Zyrian and Skolt Sami . We build a universal sentiment analysis model for all the languages that are part of this study, whetherendangered or not, by utilizing cross-lingual word embeddeddings . The evaluation shows that our word embedding models are well-aligned with the resource-rich languages, and they are suitable for training task-specific models as demonstrated by our sentiment analysis models which achieved a high accuracy . All our cross-language wordembeddings and thesentiment analysis model have been released openly via an easy-to-use Pythonlibrary

Author(s) : Khalid Alnajjar

Links : PDF - Abstract

Code :
Coursera

Keywords : languages - models - word - endangered - analysis -

Leave a Reply

Your email address will not be published. Required fields are marked *