Audio Spectrogram Transformer (AST) is the first convolution-free, purely attention-based model for audio classification. AST achieves new state-of-the-art results of 0.485 mAP on AudioSet, 95.6% accuracy on ESC-50, 98.1% on Speech Commands V2.

Author(s) : Yuan Gong, Yu-An Chung, James Glass

Links : PDF - Abstract

Code :


Keywords : ast - audio - accuracy - classification - v -

