Many neural network speaker recognition systems model each speaker using afixed-dimensional embedding vector . These embeddings are generally comparedusing either linear or 2nd-order scoring and, until recently, do not handleutterance-specific uncertainty . In this work, we propose scoring theserepresentations in a way that can capture uncertainty, enroll/test asymmetryand additional non-linear information . This is achieved by incorporating a2nd-stage neural network (known as a decision network) as part of an end-to-end training regimen . We observedsignificant performance gains for the two techniques . We propose the concept of decision residualnetworks .

Author(s) : Jason Pelecanos, Quan Wang, Ignacio Lopez Moreno

Links : PDF - Abstract

Code :


Keywords : network - speaker - decision - linear - nd -

Leave a Reply

Your email address will not be published. Required fields are marked *