Audio Spectrogram Transformer (AST) is the firstconvolution-free, purely attention-based model for audio classification . AST achieves new state-of-the-art results of 0.485 mAP on AudioSet, 95.6% accuracy on ESC-50, 98.1% on Speech Commands V2. Weevaluate AST on various audio classification benchmarks, where it achieves newstate-of theart results . AST is first Convolutional neural network free of convolutional networks to be used in audio classification, with AST achieving 95% accuracy and 98% accuracy in speech commands V2 and ESC 50 . The paper concludes that the AST is the best way to learn a direct mapping

Author(s) : Yuan Gong, Yu-An Chung, James Glass

Links : PDF - Abstract

Code :


Keywords : ast - audio - accuracy - classification - v -

Leave a Reply

Your email address will not be published. Required fields are marked *