In this paper, we propose to make a systematic study on machines multisensoryperception under attacks . We use the audio-visual event recognition task against multimodal adversarial attacks as a proxy . We attack audio, visual, and both modalities to explore whether audio-Visual integration still strengthens perception . We propose an audio-vasual defense approach based on an audio and visual dissimilarity constraint and externalfeature memory banks . Even a weakly-supervised sound source visual localization model can be successfully fooled . Our defense method can improve theinvulnerability of audio visual networks without significantly sacrificing model performance. Our method can . improve the vulnerability of audio- visual networks by . significantly sacrificing

Author(s) : Yapeng Tian, Chenliang Xu

Links : PDF - Abstract

Code :

Keywords : visual - audio - attacks - method - improve -

Leave a Reply

Your email address will not be published. Required fields are marked *