The rise of big data analytics on top of NLP increases the computational burden for text processing at scale . The problems faced in NLP are very highdimensional text, so it takes a high computation resource . The MapReduce allows parallelization of large computations and can improve the efficiency of textprocessing . The accuracy average and training time of all models using BERT is 0.9187 and 35 minutes while usingBERT with Spark NLP pipeline is 0 .8444 and 9 minutes. The bigger model will take more computation resources and need a longer time to complete the tasks. The accuracy of BERT with Spark . NLP only decreased by an average of 5.7%, while the training time was reduced by 62.9% compared toBERT without Spark N LP. However, the accuracy of the model was reduced significantly by 62 .9 percent compared to BERT . The bigger models will need a bigger model. The larger models will take a longer

Author(s) : Kuncahyo Setyo Nugroho, Anantha Yullian Sukmadewa, Novanto Yudistira

Links : PDF - Abstract

Code :
Coursera

Keywords : nlp - bert - spark - model - accuracy -

Leave a Reply

Your email address will not be published. Required fields are marked *