We explore cross-lingual transfer of register classification for webdocuments . Registers, that is, text varieties such as blogs or news are one of the primary predictors of linguistic variation . We introduce two new register annotated corpora,FreCORE and SweCORE, for French and Swedish . We demonstrate that deeppre-trained language models perform strongly in these languages and outperformprevious state-of-the-art in English and Finnish . We further analyse classification results finding that certain registers continue to pose challenges in particular for cross-language transfer .
Author(s) : Liina Repo, Valtteri Skantsi, Samuel Rönnqvist, Saara Hellström, Miika Oinonen, Anna Salmela, Douglas Biber, Jesse Egbert, Sampo Pyysalo, Veronika LaippalaLinks : PDF - Abstract
Code :
Keywords : classification - cross - registers - lingual - transfer -
- Mathematics for Machine Learning by Marc Peter Deisenroth, A. Aldo Faisal and Cheng Soon Ong
- Spark in R using sparklyr
- Mathematics for Machine Learning Specialization
- AI Programming with Python
- Neural Networks from Scratch with Python by Sentdex
- 3Blue1Brown
- Machine Learning A-Z
- Python for Everybody Specialization