In this paper, we propose to make a systematic study on machines multisensoryperception under attacks . We use the audio-visual event recognition task against multimodal adversarial attacks as a proxy . We attack audio, visual, and both modalities to explore whether audio-Visual integration still strengthens perception . We propose an audio-vasual defense approach based on an audio and visual dissimilarity constraint and externalfeature memory banks . Even a weakly-supervised sound source visual localization model can be successfully fooled . Our defense method can improve theinvulnerability of audio visual networks without significantly sacrificing model performance. Our method can . improve the vulnerability of audio- visual networks by . significantly sacrificing
Author(s) : Yapeng Tian, Chenliang XuLinks : PDF - Abstract
Code :
Keywords : visual - audio - attacks - method - improve -
- Kaggle Competitions
- Google Cloud Professional Data Engineer Specialization
- fast.ai Machine Learning
- Introduction to Data Science in Python on Coursera
- The Hundred-Page Machine Learning Book
- Neural Networks from Scratch with Python by Sentdex
- Hands-On Machine Learning with Scikit-Learn and TensorFlow
- Mathematics for Machine Learning Specialization