Optimizing Transformers with Approximate Computing for Faster Smaller and more Accurate NLP Models

Approximate Computing is specifically targeting the use of Transformers in NLP tasks . Transformer models have garnered a lot of interest in recent years by delivering state-of-the-art performance in a range of Natural Language Processing (NLP) tasks . We propose a framework to create smaller, faster and in some cases more accurate models . Our framework can be adapted to produce models that are faster, smaller and/or more accurate, depending on the user’s constraints . We demonstrate that our framework produces models that . are up to 4x faster and up to 14x smaller (with less than 0.5% relative accuracy degradation) or up to 5.5%. more accurate with simultaneous improvements of up to 9.83x in model size or 2.94x in . model size . We apply our framework to seven models, including optimized models like DistilBERT and Q8BERT, and three downstream tasks, and 3 downstream tasks. We demonstrated that our . models were up to fourx faster, up to 8x smaller, or up . to 5% more accurate (with . less than . 0.3% relative . accuracy degradation), or . up to 10.83

Links: PDF - Abstract

Code :


Keywords : models - faster - accurate - x - smaller -

Leave a Reply

Your email address will not be published. Required fields are marked *