The only requirement is that there are some shared parameters in the top layers of the multi-lingual encoder . We show that transfer is possible even when there is no shared vocabulary across the monolingual corpora and also when the text comes from very different domains . We also show that representations from monolingial BERT models in different languages can be aligned post-hoc quite effectively, strongly suggesting that, much like for non-contextual word embeddings, there are universal latent symmetries in the learned embedding spaces . For multilingual masked language modeling, these symmetry are automatically discovered and aligned during the joint training process. For example, for example, this is when the model is found and aligned post hoc .

Links: PDF - Abstract

Code :


Keywords : aligned - language - shared - hoc - lingual -

Leave a Reply

Your email address will not be published. Required fields are marked *