Order Constraints in Optimal Transport

Recent works have aimed toimprove optimal transport plans through the introduction of various forms of structure . We introduce novel order constraints into the optimal transportformulation to allow for structure . While there will are now quadratically many constraints as before, we prove a .roximatesolution…

Delphi Towards Machine Ethics and Norms

What would it take to teach a machine to behave ethically? While broadethical rules may seem straightforward to state (“thou shalt not kill”), applying such rules to real-world situations is far more complex . We present Commonsense NormBank, a moral textbook customized for machines, which compiles 1.7M examples of people’s ethical judgments on a broad spectrum of everyday situations .…

P Adapters Robustly Extracting Factual Information from Language Models with Diverse Prompts

P-Adapters are lightweight models that sit between the embedding layer and first attention layer of Large Language Models . They take LLM embeddings as input and output continuous prompts that are used to query the LLM . They showbetween 12-26% absolute improvement in precision and 36-50% absoluteimprovement in consistency over a baseline of only using natural languagequeries .…

Towards More Effective and Economic Sparsely Activated Model

The sparsely-activated models have achieved great success in natural language processing through large-scale parameters and relatively low computationalcost . Due to the limit of communication cost, activating multiple experts is hardly affordable during training and inference . To increase the number of activated experts without anincrease in computational cost, we propose SAM (Switch and Mixture) routing, anefficient hierarchical routing mechanism that activates multiple experts in asame device (GPU) Our methods shed light on the training of extremely largesparse models and experiments prove that our models can achieve significantperformance gain with great efficiency improvement .…

Capacity of Group invariant Linear Readouts from Equivariant Representations How Many Objects can be Linearly Classified Under All Possible Views

Equivariance has emerged as a desirable property of representations of objects subject to identity-preserving transformations that constitute a group . However, the expressivity of arepresentation constrained by group equivariance is still not fully understood . We provide a generalization of Cover’s Function CountingTheorem that quantifies the number of linearly separable and group-invariantbinary dichotomies that can be assigned to equivariant representations .…

HUMAN4D A Human Centric Multimodal Dataset for Motions and Immersive Media

We introduce HUMAN4D, a large and multimodal 4D dataset that contains avariety of human activities simultaneously captured by a professionalmarker-based MoCap, a volumetric capture and an audio recording system . By capturing 2 female and $2$ male professional actors performing variousfull-body movements and expressions, we provide a diverse set of motions and poses encountered as part of single- and multi-person daily, physical and social activities .…

DeepMoCap Deep Optical Motion Capture Using Multiple Depth Sensors and Retro Reflectors

A marker-based, single-person optical motion capture method(DeepMoCap) is proposed using multiple spatio-temporally aligned infrared-depthsensors and retro-reflective straps and patches (reflectors) DeepMoCapexplores motion capture by automatically localizing and labeling reflectors on depth images and, subsequently, on 3D space . The subject’s motion is efficiently captured by applying atemplate-based fitting technique on the extracted optical data .…

RGB D Image Inpainting Using Generative Adversarial Network with a Late Fusion Approach

Diminished reality is a technology that aims to remove objects from video images and fills in the missing region with plausible pixels . We propose an RGB-D image inpainting method using generative adversarialnetwork, which does not require multiple cameras . The experimental results verify the effectiveness of our proposed method, we propose late fusion approach that exploits the advantage of RGB and depth information each other to jointly restore texture and geometry of missing regions from a pair ofRGB and depth images .…

Learning Temporal 3D Human Pose Estimation with Pseudo Labels

We present a simple, yet effective, approach for self-supervised 3D humanpose estimation . During training, we rely ontriangulating 2D body pose estimates of a multiple-view camera system . Atemporal convolutional neural network is trained with the generated 3Dground-truth and the geometric multi-view consistency loss, imposinggeometrical constraints on the predicted 3D body skeleton .…

Symbolic Knowledge Distillation from General Language Models to Commonsense Models

The common practice for training commonsense models has gone from-human-to-machine: humans author commonsense knowledge graphs in order to train models . In this work, we investigate an alternative,from-machine- to-corpus- to machine: general language models author thesecommonsense knowledge graphs . We also distill only one aspect-thecommonsense of a general language model teacher, allowing the student to be adifferent type, a commonsense model .…

The Irrationality of Neural Rationale Models

Neural rationale models are popular for interpretable predictions of NLP tasks . In these, a selector extracts segments of the input text, calledrationales, and passes these segments to a classifier for prediction . We call for more rigorous evaluations of these models to ensure desired properties ofinterpretability are achieved .…

Domain Adaptation on Semantic Segmentation with Separate Affine Transformation in Batch Normalization

In recent years, unsupervised domain adaptation (UDA) for semanticsegmentation has brought many researchers’attention . The proposed SEAT is simple, easily implemented and easy to integrate into existing adversarial learning based UDA methods . We introduce multi level adaptation by adding thelower-level features to the higher-level ones before feeding them to the discriminator, without adding extra discriminator like others.…

NeRS Neural Reflectance Surfaces for Sparse view 3D Reconstruction in the Wild

NeRS learns a neural shape representation of aclosed surface that is diffeomorphic to a sphere, guaranteeing water-tightreconstructions . Surface parameterizations allow NeRS tolearn (neural) bidirectional surface reflectance functions (BRDFs) thatfactorize view-dependent appearance into environmental illumination, diffusecolor (albedo), and specular “shininess” The project page with code and visualizations can be found athttps://jasonyzhang.com/ners…

Sub word Level Lip Reading With Visual Attention

The goal of this paper is to learn strong lip reading models that can recognise speech in silent videos . We use sub-word units for lip reading for the first time to better model the ambiguities of the task . Our best lip reading model achieves 22.6% word error rate on the LRS2 dataset, a performanceunprecedented for lip-reading models, significantly reducing the performance gap between lip reading and automatic speech recognition .…

DeepMoCap Deep Optical Motion Capture Using Multiple Depth Sensors and Retro Reflectors

A marker-based, single-person optical motion capture method(DeepMoCap) is proposed using multiple spatio-temporally aligned infrared-depthsensors and retro-reflective straps and patches (reflectors) DeepMoCapexplores motion capture by automatically localizing and labeling reflectors on depth images and, subsequently, on 3D space . The subject’s motion is efficiently captured by applying atemplate-based fitting technique on the extracted optical data .…

The Neural MMO Platform for Massively Multiagent Research

Neural MMO is a computationally accessible research platform that combines large agent populations, long time horizons, open-ended tasks, and modular gamesystems . We present Neural MMO as free and opensource software with active support, ongoing development, documentation, and training, logging, and visualization tools to help users adapt to the new setting .…

Practical Benefits of Feature Feedback Under Distribution Shift

In experiments addressingsentiment analysis, we show that feature feedback methods perform significantly better on various natural out-of-domain datasets even absent differences onin-domain evaluation . By contrast, on natural language inference tasks, performance remains comparable . We hypothesize that while existing methods for incorporating feature feedback have delivered negligible in-sample gains, theymay nevertheless generalize better out- of-domain.…

HAVEN Hierarchical Cooperative Multi Agent Reinforcement Learning with Dual Coordination Mechanism

Multi-agent reinforcement learning often suffers from the exponentially larger action space caused by a large number of agents . We propose a novel value decomposition framework HAVEN based on hierarchicalreinforcement learning for the fully cooperative multi-agent problems . Ourmethod is demonstrated to achieve superior results to many baselines onStarCraft II micromanagement tasks and offers an efficient solution tomulti-agent hierarchical reinforcement learning in fully cooperative scenarios .…

HUMAN4D A Human Centric Multimodal Dataset for Motions and Immersive Media

We introduce HUMAN4D, a large and multimodal 4D dataset that contains avariety of human activities simultaneously captured by a professionalmarker-based MoCap, a volumetric capture and an audio recording system . By capturing 2 female and $2$ male professional actors performing variousfull-body movements and expressions, we provide a diverse set of motions and poses encountered as part of single- and multi-person daily, physical and social activities .…

Compressibility of Distributed Document Representations

We propose CoRe, a straightforward, representation learner-agnostic framework suitable for representation compression . The CoRe’s performance was studied on a collection of 17 real-life corpora from biomedical,news, social media, and literary domains . We explored the behavior whenconsidering contextual and non-contextual document representations, differentcompression levels, and 9 different compression algorithms .…

Spoken ObjectNet A Bias Controlled Spoken Caption Dataset

Modern audio-visual datasets contain biases that undermine the real-world performance of models trained on that data . We introduce Spoken ObjectNet to remove some of these biases . This dataset expands upon ObjectNet, which is a bias-controlled image dataset . We detail our datacollection pipeline, which features several methods to improve caption quality, including automated language model checks .…

BI RADS BERT Using Section Tokenization to Understand Radiology Reports

Domain specific contextualword embeddings have been shown to achieve impressive accuracy at such naturallanguage processing tasks in medicine . Radiology reports are the main form of communication between radiologists and clinicians, and contain important information for patient care . We thenevaluated whether using section tokenization improved the downstream extraction of the following fields: modality/procedure, previous cancer, menopausalstatus, purpose of exam, breast density and background parenchymal enhancement .…

Comparative Opinion Summarization via Collaborative Decoding

Opinion summarization focuses on generating summaries that reflect popularopinions of multiple reviews for a single entity . We propose a task to generate two contrastive summaries and onecommon summary from two given sets of reviews from different entities . Wedeveloped a comparative summarization framework CoCoSum, which consists of twofew-shot summarization models that are jointly used to generate contrastive andcommon summaries .…

Query and Extract Refining Event Extraction as Type oriented Binary Decoding

Event extraction is typically modeled as a multi-class classification problem . We propose anovel event extraction framework that takes event types and argument roles as natural language queries to extract candidate triggers and arguments from the input text . Experiments on two public benchmarks, ACE and ERE, demonstrate that our approach achieves state-of-the-art performance on each dataset and significantly outperforms existing methods on zero-shot event extraction .…

Fusing Heterogeneous Factors with Triaffine Mechanism for Nested Named Entity Recognition

Nested entities are observed in many domains due to their compositionality, which cannot be easily recognized by the widely-used sequence labelingframework . A natural solution is to treat the task as a span classificationproblem . To increase performance on span representation and classification, it is crucial to effectively integrate all useful information of differentformats .…

Unrolled Variational Bayesian Algorithm for Image Blind Deconvolution

In this paper, we introduce a variational Bayesian algorithm (VBA) for imageblind deconvolution . Our generic framework incorporates smoothness priors on the unknown blur/image and possible affine constraints (e.g., sum to one) on the blur kernel . One of our main contributions is the integration of VBA withina neural network paradigm, following an unrolling methodology .…

Playing for 3D Human Recovery

Image- and video-based 3D human recovery (i.e. pose and shape estimation)have achieved substantial progress . However, due to the prohibitive cost ofmotion capture, existing datasets are often limited in scale and diversity . In this work, we obtain massive human sequences as well as their 3D ground truths by playing video games .…

Query and Extract Refining Event Extraction as Type oriented Binary Decoding

Event extraction is typically modeled as a multi-class classification problem . We propose anovel event extraction framework that takes event types and argument roles as natural language queries to extract candidate triggers and arguments from the input text . Experiments on two public benchmarks, ACE and ERE, demonstrate that our approach achieves state-of-the-art performance on each dataset and significantly outperforms existing methods on zero-shot event extraction .…