MDETR is an end-to-end modulated detector thatdetects objects in an image conditioned on a raw text query . We use a transformer-based architecture to reason jointly over text and image by fusing the two modalities at an early stage of the model . We show that our approach can be easily extended for visual questionanswering, achieving competitive performance on GQA and CLEVR . The code and models are available at https://://://github.com/ashkamath/mdetr.mdetr and the code is available at http://www.genev.org/mdETR/mdetr/methomethometr-code-and-pictures-available-in-case-report-test-test.mdetrs-examine-exercise-exhibit-examples-exhibiting-exhibition-expiration-exception-exchange-exclusion-exposure-excess-explanation-exhaustive-exhaption .

Author(s) : Aishwarya Kamath, Mannat Singh, Yann LeCun, Ishan Misra, Gabriel Synnaeve, Nicolas Carion

Links : PDF - Abstract

Code :
Coursera

Keywords : mdetr - code - text - modulated - image -

Leave a Reply

Your email address will not be published. Required fields are marked *