Model Selection for Cross Lingual Transfer using a Learned Scoring Function

Transformers that are pre-trained on multilingual text corpora, such as, mBERT and XLM-RoBERTa, have achieved impressive cross-lingual transfer learning results . No target-language validation data is assumed in this setting, however substantial variance has been observed in target language performance between different fine-tuning runs . In extensive experiments we find that our approach consistently selects better models than English validation data across five languages and five well-studied NLP tasks, achieving results that are comparable to small amounts of target language development data . The approach is similar to that of previous attempts to select models with different learning rates, number of steps and other hyperparameters, often resulting in suboptimal choices, according to the authors.

Links: PDF - Abstract

Code :


Keywords : language - data - target - cross - approach -

Leave a Reply

Your email address will not be published. Required fields are marked *