We explore cross-lingual transfer of register classification for webdocuments . Registers, that is, text varieties such as blogs or news are one of the primary predictors of linguistic variation . We introduce two new register annotated corpora,FreCORE and SweCORE, for French and Swedish . We demonstrate that deeppre-trained language models perform strongly in these languages and outperformprevious state-of-the-art in English and Finnish . We further analyse classification results finding that certain registers continue to pose challenges in particular for cross-language transfer .

Author(s) : Liina Repo, Valtteri Skantsi, Samuel Rönnqvist, Saara Hellström, Miika Oinonen, Anna Salmela, Douglas Biber, Jesse Egbert, Sampo Pyysalo, Veronika Laippala

Links : PDF - Abstract

Code :

Keywords : classification - cross - registers - lingual - transfer -

Leave a Reply

Your email address will not be published. Required fields are marked *