Several deep learning architectures have been proposed over the last years to tackle the problem of generating a written report given an imaging exam as input . Most works evaluate the generated reports using standard NaturalLanguage Processing (NLP) metrics (e.g. BLEU, ROUGE), reporting significantprogress . In this article, we contrast this progress by comparing state of theart (SOTA) models against weak baselines . We show that simple and even naiveapproaches yield near SOTA performance on most traditional NLP metrics . Weconclude that evaluation methods in this task should be further studied towardscorrectly measuring clinical accuracy .

Author(s) : Pablo Pino, Denis Parra, Pablo Messina, Cecilia Besa, Sergio Uribe

Links : PDF - Abstract

Code :


Keywords : metrics - nlp - state - sota - report -

Leave a Reply

Your email address will not be published. Required fields are marked *