Triple M is a practical neuraltext-to-speech system, named Triple M . It consists of a seq2seq model with multi-guidance attention and a multi-band multi-time LPCNet . The former usesalignment results of different attention mechanisms to guide the learning of the basic attention mechanism . The latter combines the two strategies to reduce the computational overhead while ensuring perceptual quality . Due to the new strategy, the vocoder speed is increased by 2.75x on a single CPU and the MOS (mean opinion score) degradation is slight . The new approach can improve the performance of the text-to . feature module by absorbing the advantages of all guidance attentionmethods without modifying the basic inference architecture . This approach can . improve the speech-to.-feature module . The multi-Band andmulti-time strategy is increased . The vocoder speeds is . increased by the . vocoderSpeed is increased on a . single CPU

Author(s) : Shilun Lin, Xinhui Li, Li Lu

Links : PDF - Abstract

Code :

Keywords : multi - increased - attention - band - time -

