Recent advances in self-supervised learning have dramatically improved the state of the art on a wide variety of tasks . However, research in languagemodel pre-training has mostly focused on natural languages . In this paper, we introduce a new objective, DOBF, that leverages the structural aspect of programming languages and pre-trains a model to recover the original version of obfuscated source code . We show that models pre-trained with DOBF significantlyoutperform existing approaches on multiple downstream tasks, providing relativeimprovements of up to 13% in unsupervised code translation, and 24% in naturallanguage code search .

Author(s) : Baptiste Roziere, Marie-Anne Lachaux, Marc Szafraniec, Guillaume Lample

Links : PDF - Abstract

Code :


Keywords : pre - dobf - code - languages - training -

Leave a Reply

Your email address will not be published. Required fields are marked *