Many control problems require long-term planning that is hard to solvegenerically with neural networks alone . We introduce a neuro-algorithmic policyarchitecture consisting of a neural network and an embedded time-dependentshortest path solver . These policies can be trained end-to-end by blackboxdifferentiation . We show that this type of architecture generalizes well tounseen variations in the environment already after seeing a few examples . We give evidence that generalization capabilities are in many casesbottlenecked by the inability to generalize on the combinatorial aspects of theproblem . We also show that for a certain subclass of the MDP framework,this can be alleviated by

Author(s) : Marin Vlastelica, Michal RolĂ­nek, Georg Martius

Links : PDF - Abstract

Code :


Keywords : show - neural - policies - generalization - neuro -

Leave a Reply

Your email address will not be published. Required fields are marked *