Visual Reasoning withDiffer-entiable Physics (VRDP) can jointly learn visual concepts andinfer physics models of objects and their interactions from videos and language . This is achieved by seamlessly integrating three components: a visualperception module, a concept learner, and a differentiable physics engine . VRDP improves accuracy of predictive and counterfactual questions by 4.5 to 11.5% compared to its best counterpart. VRDP is also highly data-efficient: physical parameters can be optimized from very few videos, and even a single video can be sufficient. Finally, with all physical parametersinferred, VRDP can quickly learn new concepts from a few examples.

Author(s) : Mingyu Ding, Zhenfang Chen, Tao Du, Ping Luo, Joshua B. Tenenbaum, Chuang Gan

Links : PDF - Abstract

Code :

Keywords : image - physics - vrdp - efficient - reasoning -

