We propose a simple method for automatic speech recognition (ASR) by fine-tuning BERT . BERT is a language model (LM) trained on large-scaleunlabeled text data and can generate rich contextual representations . Our assumption is that given a history context sequence, a powerful LM can narrow the range of possible choices and the speech signal can be used as a simpleclue . As an initial study, wedemonstrate the effectiveness of the proposed idea on the AISHELL dataset and show that stacking a very simple AM on top of BERT can yield reasonableperformance .

Author(s) : Wen-Chin Huang, Chia-Hua Wu, Shang-Bao Luo, Kuan-Yu Chen, Hsin-Min Wang, Tomoki Toda

Links : PDF - Abstract

Code :
Coursera

Keywords : bert - speech - lm - simple - recognition -

Leave a Reply

Your email address will not be published. Required fields are marked *