Transfer learning has fundamentally changed the landscape of natural language processing (NLP) research . Many existing state-of-the-art models are first pre-trained on a large text corpus and then fine-tuned on downstream tasks… However, due to limited data resources from downstream tasks and the extremely large capacity of models, the model is often overfit the data of downstream tasks . We propose a new computational framework for robust and efficient fine tuning for pre-training language models . The proposed framework contains two important ingredients: Smoothness-inducing regularization, which effectively manages the capacity of the model; 2. Bregman proximal point optimization, which is a class of trust-region methods and can prevent knowledge forgetting .

Links: PDF - Abstract

Code :

https://github.com/microsoft/MT-DNN
https://github.com/namisan/mt-dnn

Keywords : models - tasks - language - downstream - fine -

Leave a Reply

Your email address will not be published. Required fields are marked *