Speech enhancement has seen great improvement in recent years mainly through denoising, speaker separation, and dereverberation methods . Inspired by voiceconversion methods, we train to augment the speech characteristics while preserving the identity of the source using an auxiliary identity network . We propose a wav-to-wav generativemodel for speech that can generate 24khz speech in a real-time manner and whichutilizes a compact speech representation, composed of ASR and identityfeatures, to achieve a higher level of intelligibility . Perceptual acoustic metrics and subjective tests show that the method obtains valuable improvements over recent baselines .

Author(s) : Adam Polyak, Lior Wolf, Yossi Adi, Ori Kabeli, Yaniv Taigman

Links : PDF - Abstract

Code :
Coursera

Keywords : speech - enhancement - wav - methods - identity -

Leave a Reply

Your email address will not be published. Required fields are marked *