VGG models used two sets of fully connected layers for the classification part of their architectures, which significantly increasesthe number of models’ weights . ResNet and next deep convolutional models used the Global Average Pooling (GAP) layer to compress the feature map and feed it to the classification layer . Using the GAP layer reduces the computational cost, but also causes losing spatial resolution of the featuremap, which results in decreasing learning efficiency . The code is shared at https://github.com/mr7495/image-classification-spatial . Applying our architecture hasrevealed a significant effect on increasing convergence speed and accuracy . OurExperiments on images with 224×224 resolution increased the Top-1 accuracybetween 2% to 8% on different datasets and models . Running our models on512x512 resolution images of the MIT Indoors Scenes dataset showed a notableresult of improving the Top to 26%. We will alsodemonstrate the model’s disadvantage when the input images are large and the number of classes is not few. In this circumstance, our proposedarchitecture can do a great help in enhancing classification results. The codeis shared at http://://://www.mnt.org/mntntnio.com

Author(s) : Mohammad Rahimzadeh, Soroush Parvin, Elnaz Safi, Mohammad Reza Mohammadi

Links : PDF - Abstract

Code :
Coursera

Keywords : models - classification - resolution - spatial - images -

Leave a Reply

Your email address will not be published. Required fields are marked *