Offensive content is pervasive in social media and a reason for concern to companies and government organizations . Several studies have been recently published investigating methods to detect the various forms of such content (e.g. hate speech, cyberbulling, and cyberaggression)… The clear majority of these studies deal with English partially because most annotated datasets available contain English data . In this paper, we take advantage of English data available by applying cross-lingual contextual word embeddings and transfer learning to make predictions in languages with less resources . We project predictions on comparable data in Bengali, Hindi, and Spanish and we report results of 0.8415 F1 macro for Bengali and 0.8568 F1 Macro for Hindi, 0.7513

Author(s) :

Links : PDF - Abstract

Code :

https://github.com/tharindudr/DeepOffense




Keywords : data - english - content - offensive - cross -

Leave a Reply

Your email address will not be published. Required fields are marked *