This paper presents a low-latency real-time (LLRT) non-parallel voiceconversion (VC) framework based on cyclic variational autoencoder (CycleVAE)and multiband WaveRNN with data-driven linear prediction (MWDLP) MWDLP is an efficient and a high-qualityneural vocoder that can handle multispeaker data and generate speech waveformfor LLRT applications with CPU . The experimental results demonstrate that the proposed frameworkachieves high-performance VC, while allowing for LLRT usage with a single-coreof $2.1$–$2.7$~GHz CPU on a real time factor of $0.87$ –$0.95$ The paper also proposes a novel fine-tuning procedure that uses the waveform loss
Author(s) : Patrick Lumban Tobing, Tomoki TodaLinks : PDF - Abstract
Code :
Keywords : data - real - time - llrt - autoencoder -
- Python for Everybody on Coursera
- AI Programming with Python
- Introduction to Python
- Google Cloud Professional Data Engineer Specialization
- Mathematics for Machine Learning Specialization
- Introduction to TensorFlow for AI, DL and ML by Andrew Ng
- Bioinformatics Specialization on Coursera
- Kaggle Learning