Learning bilingual word embeddings with almost no bilingual data

Most methods to learn bilingual word embeddings rely on large parallel corpora, which is difficult to obtain for most language pairs . This has motivated an active research line to relax this requirement, with methods that use document-aligned corpora or bilingual dictionaries of a few thousand words instead… In this work, we further reduce the need of bilingual resources using a very simple self-learning approach that can be combined with any dictionary-based mapping technique . Our method exploits the structural similarity of embedding spaces, and works with as little bilingual evidence as a 25 word dictionary or even an automatically generated list of numerals .

Links: PDF - Abstract

Code :


Keywords : bilingual - word - learning - methods - embeddings -

Leave a Reply

Your email address will not be published. Required fields are marked *