Many control problems require long-term planning that is hard to solvegenerically with neural networks alone . We introduce a neuro-algorithmic policyarchitecture consisting of a neural network and an embedded time-dependentshortest path solver . These policies can be trained end-to-end by blackboxdifferentiation . We show that this type of architecture generalizes well tounseen variations in the environment already after seeing a few examples . We give evidence that generalization capabilities are in many casesbottlenecked by the inability to generalize on the combinatorial aspects of theproblem . We also show that for a certain subclass of the MDP framework,this can be alleviated by
Author(s) : Marin Vlastelica, Michal RolĂnek, Georg MartiusLinks : PDF - Abstract
Code :
https://github.com/oktantod/RoboND-DeepLearning-Project
Keywords : show - neural - policies - generalization - neuro -