A large percentage of the world’s population speaks a language of the Indian subcontinent, comprising languages from both Indo-Aryan (e.g. Hindi, Punjabi, Gujarati, etc.) and Dravidian . A universal characteristic of Indian languages is their complex morphology, which, when combined with the lack of sufficient quantities of high-quality parallel data, can make developing machine translation systems for these languages difficult . We propose a technique called Unified Transliteration and Subword Segmentation to leverage language similarity while exploiting parallel data from related language pairs . We also propose a Multilingual Transfer Learning technique to leverage parallel data . to assist translation for low resource language pairs of interest . Our experiments demonstrate an overall average improvement of 5 BLEU points over the standard Transformer-based NMT baselines. Our experiments demonstrated an overall improvement of five BLEu points over . the standard TLCinging-based (NMTs) of the standard NMT

Links: PDF - Abstract

Code :


Keywords : languages - language - standard - parallel - data -

Leave a Reply

Your email address will not be published. Required fields are marked *