A major challenge of research on non-English machine reading for questionanswering (QA) is the lack of annotated datasets . In this paper, we presentGermanQuAD, a dataset of 13,722 extractive question/answer pairs . A nextractive QA model trained on GermanQuAD significantly outperformssmultilingual models . We demonstrate the wide range of applications of GermanQuADS by adapting it toGermanDPR, a training dataset for dense passage retrieval (DPR) and train andevaluate the first non-German DPR model . We conclude that machine-translated training data cannotfully substitute hand-annotated training data in the target language

Author(s) : Timo Möller, Julian Risch, Malte Pietsch

Links : PDF - Abstract

Code :



Keywords : training - dataset - dpr - model - data -

Leave a Reply

Your email address will not be published. Required fields are marked *