Unsupervised text style transfer aims to alter text styles while preserving the content, without aligned data for supervision. Existing seq2seq methods face three challenges: 1) the transfer is weakly interpretable, 2) generated outputs struggle in content preservation, and 3) the trade-off between content and style is intractable. We propose a hierarchical reinforced sequence operation method, […]

Sensor-based human activity recognition (HAR) requires to predict the action of a person based on sensor-generated time series data. The current state-of-the-art is represented by deep learning architectures that automatically obtain high level representations. We propose a novel deep learning framework, \algname, based on a purely attention-based mechanism, that overcomes the limitations of the current […]

Neural style algorithm based on Generative Adversarial Networks (GANs) A Web-based working application of video comixification available at http://comixify.ii.pw.edu.pl. The final contribution of our work is a state-of-the-art keyframes extraction algorithm that selects a subset of frames from the video to provide the most comprehensive video context. We filter those frames using image aesthetic estimation […]

This paper introduces a deep-learning approach to photographic style transfer that handles a large variety of image content while faithfully transferring the reference style. We show that this approach successfully suppresses distortion and yields satisfying photorealistic style transfers in a broad variety of scenarios, including transfer of the time of day, weather, season, and artistic […]

The proposed model outperformed the state-of-the-art methods with a weighted accuracy of 70.4% and an unweighted accuracy of 63.9% respectively. The proposed attention mechanism can make our model be aware of which time-frequency region of speech spectrogram is more emotion-relevant. Especially, it’s interesting to observe obvious improvement obtained with natural scene image based pre-trained model. […]

Common Voice is a massively-multilingual collection of transcribed speech intended for speech technology research and development. The most recent release includes 29 languages, and as of November 2019 there are a total of 38 languages collecting data. Over 50,000 individuals have participated so far, resulting in 2,500 hours of collected audio. To our knowledge this […]

In this paper we address the problem of generating person images conditioned on a given pose. Given an image of a person and a target pose, we synthesize a new image of that person in the novel pose… In order to deal with pixel-to-pixel misalignments caused by the pose differences, we introduce deformable skip connections […]

Recent breakthroughs in deep learning often rely on representation learning and knowledge transfer. Here, we introduce speechVGG, a flexible, transferable feature extractor tailored for integration with deep learning frameworks for speech processing. We demonstrate the application of the pre-trained model in four speech processing tasks, including speech enhancement, language identification, speech, noise and music classification, […]

Grammatical error correction can be viewed as a low-resource sequence-to-sequence task. To tackle this challenge, we first generate erroneous versions of large unannotated corpora using a realistic noising function. The resulting parallel corpora are subsequently used to pre-train Transformer models. Then, by sequentially applying transfer learning, we adapt these models to the domain and style […]

Most commercial font products are in fact manually designed by following specific requirements on some attributes of glyphs. We propose a novel model, Attribute2Font, to automatically create fonts by synthesizing visually-pleasing glyph images according to user-specified attributes. To the best of our knowledge, our model is the first one in the literature which is capable […]

The proposed ResNet-50 shows improvements in top-1 accuracy from 76.3\% to 82.78\% on ILSVRC2012 validation set. With these improvements, inference throughput only decreases from 536 to 312. Our approach achieved 1st place in the iFood Competition Fine-Grained Visual Recognition at CVPR 2019, and the source code and trained models are available at https://github.com/clovaai/assembled-cnn. The approach […]

We present a novel approach for the task of human pose transfer. We address the issues of limited correspondences identified between keypoints only and invisible pixels due to self-occlusion. Unlike existing methods, we propose to estimate dense and intrinsic 3D appearance flow to better guide the transfer of pixels between poses. With the appearance flow, […]

Recent progress on few-shot learning has largely re-lied on annotated data for meta-learning, sampled from the same domain as the novel classes. In this paper, we propose the cross-domain few- shot learning (CD-FSL) benchmark, consist-ing of images from diverse domains with varying similarity to ImageNet, ranging from crop disease images, satellite images, and medical images. […]

This work explores the problem of generating fantastic special-effects for the typography. It is quite challenging due to the model diversities to illustrate varied text effects for different characters. To address this issue, our key idea is to exploit the analytics on the high regularity of the spatial distribution for text effects to guide the […]

Classifying the general intent of the user utterance in a conversation, also known as Dialogue Act (DA), is a key step in Natural Language Understanding (NLU) for conversational agents. While DA classification has been extensively studied in human-human conversations, it has not been sufficiently explored for the emerging open-domain automated conversational Agents. The authors propose […]

Face recognition systems can be vulnerable to makeup presentation attacks. Attackers can apply heavy makeup in order to achieve the facial appearance of a target subject for the purpose of impersonation. The proposed detection system employs a machine learning-based classifier, which is trained with synthetically generated makeup presentations. It uses a generative adversarial network for […]

In this paper, we consider the problem of malware detection and classification based on image analysis. We convert executable files to images and apply image recognition using deep learning (DL) models. To train these models, we employ transfer learning based on existing DL models that have been pre-trained on massive image datasets. We carry out […]

We present a deep generative model for unsupervised text style transfer. Our probabilistic approach models non-parallel data from two domains as a partially observed parallel corpus. By hypothesizing a parallel latent sequence that generates each observed sequence, our model learns to transform sequences from one domain to another. Across all style transfer tasks, our approach […]

Learning from demonstrations (LfD) improves the exploration efficiency of a learning agent by incorporating demonstrations from experts. However, demonstration data can often come from multiple experts with conflicting goals, making it difficult to incorporate safely and effectively in online settings. We address this problem in the static and dynamic optimization settings by modelling the uncertainty […]

In recent years, deep Convolutional Neural Networks (CNNs) have broken all records in salient object detection. However, training such a deep model requires a large amount of manual annotations. Our goal is to overcome this limitation by automatically converting an existing deep contour detection model into a salient object Detection model without using any manual […]

Lung cancer is the leading cause of cancer-related death worldwide. Early diagnosis of pulmonary nodules in Computed Tomography (CT) chest scans provides an opportunity for designing effective treatment and making financial and care plans. In this paper, we consider the problem of diagnostic classification between benign and malignant lung nodules. We aim to learn a […]

SimpleTOD is a simple approach to task-oriented dialogue. It uses a single causal language model trained on all sub-tasks recast as a single sequence prediction problem. This allows SimpleTOD to fully leverage transfer learning from pre-trained, open domain, causal language models such as GPT-2. It improves over the prior state-of-the-art by 0.49 points in joint […]

The RepDistiller is a new way to transfer representational knowledge from one neural network to another. The method sets a new state-of-the-art in many transfer tasks, and sometimes even outperforms the teacher network when combined with knowledge distillation. We formulate this objective as contrastive learning and demonstrate that our resulting new objective outperforms other cutting-edge […]

Sentence fusion is the task of joining several independent sentences into a single coherent text. Current datasets for sentence fusion are small and insufficient for training modern neural models. In this paper, we propose a method for automatically-generating fusion examples from raw text and present DiscoFuse, a large scale dataset for discourse-based sentence fusion. We […]

We present a novel Multi Relational Graph Convolutional Network (MRGCN) to model on-road vehicle behaviours from a sequence of temporally ordered frames as grabbed by a moving monocular camera. The proposed method of obtaining his encoding is shown to be specifically suited for the problem at hand as it outperforms more complex end to end […]

We present a novel training framework for neural sequence models, particularly for grounded dialog generation. We leverage the recently proposed Gumbel-Softmax approximation to the discrete distribution. We also introduce a stronger encoder for visual dialog, and employ a self-attention mechanism for answer encoding. Overall, our proposed model outperforms state-of-the-art on the VisDial dataset by a […]

Artistic text style transfer is the task of migrating the style from a source image to the target text to create artistic typography. The proposed method demonstrates its superiority over previous state-of-the-arts in generating diverse, controllable and high-quality stylized text. The proposal is based on a novel bidirectional shape matching framework to establish an effective […]

Facial makeup transfer is a widely-used technology that aims to transfer the makeup style from a reference face image to a non-makeup face. Existing literature leverage the adversarial loss so that the generated faces are of high quality and realistic as real ones, but are only able to produce fixed outputs. Inspired by recent advances […]

This paper studies the impact of multitask and transfer learning for simple question answering. We introduce a new dataset of 100k questions that we use in conjunction with existing benchmarks. We conduct our study within the framework of Memory Networks (Weston et al., 2015) because this perspective allows us to eventually scale up to more […]

Current fully-supervised video datasets consist of only a few hundred thousand videos and fewer than a thousand domain-specific labels. This hinders the progress towards advanced video architectures… This paper presents an in-depth study of using large volumes of web videos for pre-training video models for the task of action recognition. Our primary empirical finding is […]

Motion planning algorithms are crucial for many state-of-the-art robotics applications such as self-driving cars. Existing motion planning methods become ineffective as their computational complexity increases exponentially with the dimensionality of the motion planning problem. Motion Planning Networks (MPNet) is a neural network-based novel planning algorithm. The proposed method encodes the given workspaces directly from a […]

We present a neurosymbolic framework for the lifelong learning of algorithmic tasks that mix perception and procedural reasoning. Reusing high-level concepts across domains and learning complex procedures are key challenges in lifelong learning. We show that a program synthesis approach that combines gradient descent with combinatorial search over programs can be a more effective response […]

As Transfer Learning from large-scale pre-trained models becomes more prevalent in Natural Language Processing (NLP), operating these large models in on-the-edge and/or under constrained computational training or inference budgets remains challenging. In this work, we propose a method to pre-train a smaller general-purpose language representation model, called DistilBERT, which can then be fine-tuned with good […]

“Talking-heads attention” is a variation on multi-head attention. It includes linearprojections across the attention-heads dimension, immediately before and after the softmax operation. It leads to better perplexities on masked language modeling tasks, aswell as better quality when transfer-learning to language comprehension and question answering tasks.

Transfer learning aims to learn robust classifiers for the target domain by leveraging knowledge from a source domain. Since the source and the target domains are usually from different distributions, existing methods mainly focus on adapting the cross-domain marginal or conditional distributions. In real applications, the marginal and conditional distributions usually have different contributions to […]