Large neural language models (LMs) like BERT can be finetuned toyield state-of-the-art results on many NLP tasks . However, analysis shows that what they learn fails to model any sort of broad notion ofwhich entities are semantically comparable or similar . Performance is highly correlated withco-occurrences between specific entities observed in the training set . This is true both for models that are pretrained on general text corpora, as well as models trained on a large corpus of comparison questions . Our study thusinforces recent results on the difficulty of making claims about a deepmodel’s world knowledge or linguistic competence based on performance on benchmark problems . We make our evaluation datasets publicly available to foster future research on complex understanding and reasoning in such models at standards of human interaction. We make an evaluation dataset publicly available. We hope to encourage further research on such models to use these models to test their effectiveness and to develop new understanding and understanding of such models’ understanding and capabilities. We are confident that such models

Author(s) : Avishai Zagoury, Einat Minkov, Idan Szpektor, William W. Cohen

Links : PDF - Abstract

Code :


Keywords : models - understanding - results - large - evaluation -

Leave a Reply

Your email address will not be published. Required fields are marked *