This paper studies the problem of novel category discovery on single- and multi-modal data with labels from different but relevant categories . We present a generic, end-to-end framework to jointly learn a reliable representation andassign clusters to unlabelled data . We further employ Winner-Take-All (WTA) hashing algorithmon the shared representation space to generate pairwise pseudo labels forunlabelled data to better predict cluster assignments . We thoroughly evaluate our framework on large-scale multi-Modal video benchmarks Kinetics-400 andVGG-Sound, and image benchmarks CIFAR10 and ImageNet, obtaining state-of-the-art results .

Author(s) : Xuihui Jia, Kai Han, Yukun Zhu, Bradley Green

Links : PDF - Abstract

Code :

Keywords : data - multi - representation - modal - benchmarks -

Leave a Reply

Your email address will not be published. Required fields are marked *