Two larger multilingual masked language models with 3.5B and 10.7B parameters outperform XLM-R by 1.8% and 2.4% average accuracy on XNLI . Our model also outperforms theRoBERTa-Large model on several English tasks of the GLUE benchmark by 0.3% onaverage while handling 99 more languages . This suggests pretrained models withlarger capacity may obtain both strong performance on high-resource languageswhile greatly improving low- resource languages .

Author(s) : Naman Goyal, Jingfei Du, Myle Ott, Giri Anantharaman, Alexis Conneau

Links : PDF - Abstract

Code :

Keywords : multilingual - masked - language - models - b -

Leave a Reply

Your email address will not be published. Required fields are marked *