Improving Zero Shot Voice Style Transfer via Disentangled Representation Learning

Voice style transfer seeks to modify one speaker’s voice to generate speech as if it came from another (target) speaker . Previous works have made progress on voice conversion with parallel training data and pre-known speakers… However, zero-shot voice style transfer, which learns from non-parallel data and generates voices for previously unseen speakers, remains a challenging problem . The proposed method first encodes speaker-related style and voice content of each input voice into separate low-dimensional embedding spaces, and then transfers to a new voice by combining the source content embedding and target style embedding through a decoder . On real-world datasets, our method outperforms other baselines and obtains state-of-the-art results in terms of

Links: PDF - Abstract

Code :


Keywords : voice - style - transfer - embedding - speaker -

Leave a Reply

Your email address will not be published. Required fields are marked *