Machine Translation (MT) and Natural LanguageProcessing (NLP) have shown that existing models amplify biases observed in the training data . We hypothesize that the’algorithmic bias’, i.e. an exacerbation of frequently observed patterns incombination with a loss of less frequent ones, not only exacerbates societalbiases present in current datasets but could also lead to an artificiallyimpoverished language: ‘machine translationese’ We assess the linguisticrichness (on a lexical and morphological level) of translations created by different data-driven MT paradigms – phrase-based statistical (PB-SMT) andneural MT (NMT) Our experiments show that there is a . loss of lexical . richness in the translations produced by all investigated MT

Author(s) : Eva Vanmassenhove, Dimitar Shterionov, Matthew Gwilliam

Links : PDF - Abstract

Code :

https://github.com/tensorflow/tensor2tensor


Coursera

Keywords : machine - mt - algorithmic - observed - translationese -

Leave a Reply

Your email address will not be published. Required fields are marked *