The Combined Attention Generative AdversarialNetwork (CAGAN) could generate photo-realistic images according to textual descriptions . The proposed CAGAN utilises two attention models: word attention to draw different sub-regions conditioned on related words; andsqueeze-and-excitation attention to capture non-linear interaction among channels . We demonstrate that judging a model by a single evaluation metric can be misleading by developing anadditional model adding local self-attention which scores a higher IS, but generates realistic images through feature repetition . With spectral normalisation to stabilise training, we propose the new network improves the state of the art on the CUB dataset and the FID

Author(s) : Henning Schulze, Dogucan Yaman, Alexander Waibel

Links : PDF - Abstract

Code :
Coursera

Keywords : attention - cagan - model - images - realistic -

Leave a Reply

Your email address will not be published. Required fields are marked *