Information Discrepancy in Strategic Learning

We study a decision-making model where a principal deploys a scoring rule and agents strategically invest effort to improve their scores . Agents within each subgroup observe the past scoresreceived by their peers, which allow them to construct an estimate of the scoring rule .…

Towards Enhancing Database Education Natural Language Generation Meets Query Execution Plans

The database systems course is offered as part of an undergraduate computerscience degree program in many major universities . It is often daunting for alearner to comprehend these QEPs containing vendor-specific implementationdetails, hindering her learning process . In this paper, we present a novel,end-to-end, generic system called lantern that generates a natural languagedescriptions of a qep to facilitate understanding of the query execution steps .…

Am I a Real or Fake Celebrity Measuring Commercial Face Recognition Web APIs under Deepfake Impersonation Attack

Companies such asMicrosoft, Amazon, and Naver offer highly accurate commercial face recognition web services for diverse applications to meet the end-user needs . Such technologies are threatened persistently, as virtually any individual can quickly implement impersonation attacks . These attacks can be a significant threat for authentication and identificationservices, which heavily rely on their underlying face recognition technologies’ accuracy and robustness .…

Polynesia Enabling Effective Hybrid Transactional Analytical Databases with Specialized Hardware Software Co Design

We propose Polynesia, a hardware-software co-designed system for in-memoryHTAP databases . Polynesia outperforms three state-of-the-art HTAP systems, with average transactional/analytical throughput improvements of 1.70X/3.74X . It reduces energy consumption by 48% over the prior lowest-energysystem . Ourevaluation shows that Polynesia performs better than three other HTAPsystems with average performance improvements of .70X and .74X, and .…

Accelerating Distributed Memory Autotuning via Statistical Analysis of Execution Paths

The prohibitive expense of automatic performance tuning at scale has largely limited the use of autotuning to libraries for shared-memory and GPUarchitectures . We introduce a framework for approximate autotuned thatachieves a desired confidence in each algorithm configuration’s performance . This strategy iseffective in the presence of frequently-recurring computation and communicationkernels, which is characteristic to algorithms in numerical linear algebra .…

Alignment Knowledge Distillation for Online Streaming Attention based Speech Recognition

This article describes an efficient training method for online streamingattention-based encoder-decoder (AED) automatic speech recognition (ASR)systems . CTCsynchronous training (CTC-ST) can achieve a comparable tradeoff of accuracy and latency without relying on external alignment information . The best MoChA system showsperformance comparable to that of RNN-transducer (RNN-T) The proposed method provides alignment information learned in the CTCbranch to the attention-based decoder.…

Contrastive Separative Coding for Self supervised Representation Learning

To extract robust deep representations from long sequential modeling of speech data, we propose a self-supervised learning approach . Key finding is to learn such representations byseparating the target signal from contrastive interfering signals . The experiment demonstrates that our approach can learnuseful representations achieving a strong speaker verification performance inadverse conditions .…

CogDL An Extensive Toolkit for Deep Learning on Graphs

Graph representation learning aims to learn low-dimensional node embeddings for graphs . It is used in several real-world applications such as socialnetwork analysis and large-scale recommender systems . CogDL is an extensive research toolkit for deep learning on graphs that allows researchers and developers to easily conduct experiments and build applications .…

Narratives and Counternarratives on Data Sharing in Africa

As machine learning and data science applications grow ever more prevalent, there is an increased focus on data sharing and open data initiatives in the context of the African continent . Many argue that datasharing can support research and policy design to alleviate poverty,inequality, and derivative effects in Africa .…

Persistent Message Passing

Graph neural networks (GNNs) are a powerful inductive bias for modelling algorithms and data structures . We introduce Persistent Message Passing (PMP), amechanism which endows GNNs with capability of querying past state by explicitly persisting it . PMP generalises out-of-distribution to more than 2x larger test inputs on dynamic temporal range queries, outperforming GNN’s which overwriting states .…

Preferential attachment hypergraph with high modularity

Few models for generating random hypergraphs exist and no general model allows to both preserve a power-lawdegree distribution and a high modularity indicating the presence ofcommunities . We present a dynamic preferential attachment hypergraph model which features features partition into communities .…

SDN based Self Configuration for Time Sensitive IoT Networks

This paper introduces anSDN-based self-configuration framework for the fully automated configuration ofTSN networks . Unlike standard TSN configuration, we remove end-host-relateddependencies and put flows initially on default paths to extract traffic patterns by monitoring network traffic at edge switches . These characteristics allow to move the flows to optimal paths while maintaining hard real-time guarantees, for which we also formulate an optimization problem .…

PyCG Practical Call Graph Generation in Python

Call graphs play an important role in different contexts, such as profiling and vulnerability propagation analysis . We propose a pragmatic, static approach for call graph generation in Python . We compute all assignment relations between program identifiers offunctions, variables, classes, and modules .…

SWP Microsecond Network SLOs Without Priorities

The increasing use of cloud computing for latency-sensitive applications has sparked renewed interest in providing tight bounds on network tail latency . Achieving this in practice at reasonable network utilization has provedelusive due to a combination of highly bursty application demand, faster linkspeeds, and heavy-tailed message sizes .…

AdaSpeech Adaptive Text to Speech for Custom Voice

Custom voice aims to adapt a source TTS model to synthesize personal voice for atarget speaker using few speech data . AdaSpeech achieves much better adaptation quality than baseline methods, with only about 5K specific parameters for each speaker . We pre-train the sourceTTS model on LibriTTS datasets and fine-tune it on VCTK and LJSpeech datasets with few adaptation data,e.g.,…

Direct guaranteed lower eigenvalue bounds with optimal a priori convergence rates for the bi Laplacian

An extra-stabilised Morley finite element method (FEM) directly computes lower eigenvalue bounds with optimal a priori convergence rates for bi-Laplace Dirichlet eigenvalues . The analysis is based on the Worsey-Farin 3D version of the Hsieh-Clough-Tocher macroelement with a careful selection of center points in a further decomposition of each tetrahedron into 12 sub-tetrahedra .…

Understanding Predicting User Lifetime with Machine Learning in an Anonymous Location Based Social Network

Jodel’slocation-based nature yields to the establishment of disjoint communitiescountry-wide . The study of user lifetime in the Kingdom of Saudi Arabia enables for the first time to study user lifetime . A user’s lifetime is an important measurement for evaluating and steering customer bases as it can be leveraged to predict churn and possibly apply suitable methods to circumvent potential potential losses .…

COVID 19 vs Social Media Apps Does Privacy Really Matter

Many people around the world are worried about using or even downloading COVID-19 contact tracing mobile apps . Main reported concerns are centered around privacy and ethical issues . People are voluntarilyusing Social Media apps at a significantly higher rate during the pandemic without similar privacy concerns compared with COV-19 apps .…

A simple method for improving the accuracy of Chung Lu random graph generation

The Chung-Lumodel is widely used to generate null-graph models with expected degreesequences as well as implicitly define network measures such as modularity . We introduce a simple method for improving the accuracy of Chung-Lu graph generation . Our method uses a Poisson approximation to define a linear system describing the expected degree sequence to be output from themodel using standard generation techniques .…

ReLIC Reduced Logic Inference for Composition for Quantifier Elimination based Compositional Reasoning and Verification

The paper presents our research on quantifier elimination (QE) forcompositional reasoning and verification . For compositional reasoning, QE provides the foundation of our approach, serving as the calculus forcomposition to derive the strongest system-property in a single step . We developed a new prototype verifier named ReLIC (Reduced Logic Inferencefor Composition) that implements our above approaches .…

Dynamic Stochastic Blockmodel Regression for Network Data Application to International Militarized Conflicts

Social science research is to understand how latent groupmemberships predict the dynamic process of network evolution . We develop a dynamic model of network data by combining a hidden Markov model with a mixed-membership stochastic block model . Changes in monadiccovariates like democracy shift states between coalitions, generatingheterogeneous effects on conflict over time and across states .…

Online Partial Service Hosting at the Edge

We consider the problem of service hosting where an application provider candynamically rent edge computing resources and serve user requests from the edgeto deliver a better quality of service . A key novelty of this work is that weallow the service to be hosted partially at the edge which enables a fraction of the user query to be served by the edge .…

Query Rewriting via Cycle Consistent Translation for E Commerce Search

A/B experiments show that it improves core e-commerce business metrics significantly . The proposed model has been launched into our search engine production, serving hundreds of millions of users since the summer of 2020, the proposed method is able to rewrite hard user queries into more standard queries that are more appropriate for the inverted index to retrieve .…

Fair and Efficient Allocations with Limited Demands

We study the fair division problem of allocating multiple resources among agents with Leontief preferences that are each required to complete afinite amount of work . We examine the behaviorof the classic Dominant Resource Fairness (DRF) mechanism in this setting andshow it is fair but only weakly Pareto optimal and inefficient in many naturalexamples .…

Computing the Information Content of Trained Neural Networks

The number of weights is usually a bad proxy for the actual amount of information stored . The bounds have a simple dependence on both the network architecture and the training data . This paper derives both a consistent estimator and a closed-form upperbound on the information content of infinitely wide neural networks .…

A survey on Variational Autoencoders from a GreenAI perspective

Variational AutoEncoders (VAEs) are powerful generative models that merge elements from statistics and information theory with the flexibility offered by deep neural networks to efficiently solve the generation problem for highdimensional data . The key insight of VAEs is to learn the latent distribution of data in such a way that new meaningful samples can be generated from it .…

High Performance Training by Exploiting Hot Embeddings in Recommendation Systems

Recommendation models are commonly used learning models that suggest relevant items to a user for e-commerce and online advertisement-based applications . Current recommendation models include deep-learning-based (DLRM) and time-basedsequence (TBSM) models . Some training inputs and their accesses into the embedding tables are heavily skewed with certain entries being accessed up to 10000x more .…