Existing methods of medical visual question answering usually rely on transfer learning to obtain effective image feature representation . We introduce a cross-modal self-attention module to capture the long-range contextual relevance for more effectivefusion of visual and linguistic features . Experimental results demonstrate that the proposed method outperforms existing state-of-the-art methods . Our code and models are available at https://://://github.com/haifangong/CMSA-MTPT-4-MedicalVQA. Our code is available at http://www.cnn.org/ .

Author(s) : Haifan Gong, Guanqi Chen, Sishuo Liu, Yizhou Yu, Guanbin Li

Links : PDF - Abstract

Code :
Coursera

Keywords : visual - modal - attention - cross - code -

Leave a Reply

Your email address will not be published. Required fields are marked *