Recent theoretical results show that gradient descent on deep neural networks locally maximizes classification margin . This property of the solution however does not fully characterizethe generalization performance . We motivate theoretically and show empiricallythat the area under the curve of the margin distribution on the training set is a good measure of generalization . We then show that, after . . it is possible to . dynamically reduce the training . set by more than 99% without significant loss of performance . Interestingly, the . subset of “high capacity” features is not consistent across different training runs . This is consistent with the theoretical claim that all training points should converge to the same asymptotic margin under SGD .

Author(s) : Andrzej Banburski, Fernanda De La Torre, Nishka Pant, Ishana Shastri, Tomaso Poggio

Links : PDF - Abstract

Code :

Keywords : training - show - margin - distribution - classification -

Leave a Reply

Your email address will not be published. Required fields are marked *