End-to-end models have achieved impressive results on the task of automaticspeech recognition (ASR) For low-resource ASR tasks, however, labeled data canhardly satisfy the demand of the demand . We fuse a pre-trained acoustic encoder and apre-trained linguistic encoder into an ASR model . The fused model only needs to learn the transfer from speech to language during fine-tuning on limited labeled data . The length of the two modalities is matched by a monotonic attention mechanism without additional parameters . Our model achieves better recognition performance on CALLHOME corpus (15 hours) than other end-to .end models. Ourmodel achieves . better recognition . performance on CallHOME corpus .

Author(s) : Cheng Yi, Shiyu Zhou, Bo Xu

Links : PDF - Abstract

Code :
Coursera

Keywords : recognition - model - asr - demand - data -

Leave a Reply

Your email address will not be published. Required fields are marked *