Analyzing Redundancy in Pretrained Transformer Models

Transformer-based deep NLP models are trained using hundreds of millions of parameters, limiting their applicability in computationally constrained environments . In this paper, we study the cause of these limitations by defining a notion of Redundancy . We dissect two popular pretrained models, BERT and XLNet, studying how much redundancy they exhibit at a representation-level and at a more fine-grained neuron-level . Our analysis reveals interesting insights, such as: i. 85% of the neurons across the network are redundant and . ii) at least 92% of them can be removed when optimizing towards a downstream task .

Links: PDF - Abstract

Code :

Keywords : redundancy - models - pretrained - level - transformer -

Leave a Reply

Your email address will not be published. Required fields are marked *