A deep equilibrium linear model is implicitly defined through an equilibriumpoint of an infinite sequence of computation . It avoids any explicitcomputation of the infinite sequence by finding an equilibrium point directly . Despite non-convexity, convergence to globaloptimum at a linear rate is guaranteed without any assumption on the width of the models, allowing the width to be smaller than the output dimension and thenumber of data points . We prove a relation between the gradient dynamics of the simple deep equilibrium model and the dynamics of trust regionNewton method of a shallow model . This mathematically proven relation alongwith our numerical observation suggests the importance of understandingimplicit bias and a possible open problem on the topic. Our proofs deal with non-linearity and weight tying, and differ from those in the related literature. Our Proofs deal withnonlinearity .

Author(s) : Kenji Kawaguchi

Links : PDF - Abstract

Code :
Coursera

Keywords : equilibrium - model - deep - relation - tying -

Leave a Reply

Your email address will not be published. Required fields are marked *