Traditional video summarization methods generate fixed video representations regardless of user interest . Traditional methods limit users’ expectations in content search and exploration scenarios . In this work, a new method is proposed that uses a specialized attention network andcontextualized word representations to tackle this task . The proposed model is effective with an increase of +5.88% in accuracy and +4.06% increase of F1-score, compared with the state-of-the-art method . The new model uses a contextualized video summary controller, multi-modal attentionmechanisms, an interactive attention network, and a video summary generator to tackle the task . Based on the evaluation of the existing multi-media summarizationbenchmark, experimental results show that the proposed model

Author(s) : Jia-Hong Huang, Luka Murn, Marta Mrak, Marcel Worring

Links : PDF - Abstract

Code :

https://github.com/oktantod/RoboND-DeepLearning-Project


Coursera

Keywords : video - proposed - multi - model - increase -

Leave a Reply

Your email address will not be published. Required fields are marked *