This paper presents a low-latency real-time (LLRT) non-parallel voiceconversion (VC) framework based on cyclic variational autoencoder (CycleVAE)and multiband WaveRNN with data-driven linear prediction (MWDLP) MWDLP is an efficient and a high-qualityneural vocoder that can handle multispeaker data and generate speech waveformfor LLRT applications with CPU . The experimental results demonstrate that the proposed frameworkachieves high-performance VC, while allowing for LLRT usage with a single-coreof $2.1$–$2.7$~GHz CPU on a real time factor of $0.87$ –$0.95$ The paper also proposes a novel fine-tuning procedure that uses the waveform loss

Author(s) : Patrick Lumban Tobing, Tomoki Toda

Links : PDF - Abstract

Code :

Keywords : data - real - time - llrt - autoencoder -

Leave a Reply

Your email address will not be published. Required fields are marked *