This paper presents a novel high-fidelity and low-latency universal neuralvocoder framework . MWDLP employs a coarse-fine bit WaveRNNarchitecture for 10-bit mu-law waveform modeling . A sparse gated recurrent unit with a relatively large size of hidden units is utilized, while the multibandmodeling is deployed to achieve real-time low latency usage . The proposal generates synthetic speech for seen and unseen speakers and/or language on300 speakers training data including clean and noisy/reverberant conditions . The number of training utterances is limited to 60 per speaker, while allowing for real time processing using a single core of $\sim\!$2.1–2.7~GHz CPU with $0.64 real

Author(s) : Patrick Lumban Tobing, Tomoki Toda

Links : PDF - Abstract

Code :

Keywords : real - latency - time - high - fidelity -

