In this paper, we introduce a new class of data augmentation queries: join-correlation queries . We propose asketching method that enables the construction of an index for a large number of tables . We also explore different scoring strategies that effectively rank the query results based on how well the columns are correlated with the query . We carryout a detailed experimental evaluation, using both synthetic and real data, which shows that our sketches attain high accuracy and the scoring strategies lead to high-quality rankings. We also propose a new scoring strategy that effectively ranks queries based on the quality of the results of the queries . The increasing availability of structured datasets opens up opportunities~to enrichanalytics and improve machine learning models through relational dataaugmentation. The authors also suggest a new way to use these queries to improve machine-learning models in their findings. For more information, please visit the authors of this article, the authors’ website.

Author(s) : AĆ©cio Santos, Aline Bessa, Fernando Chirigati, Christopher Musco, Juliana Freire

Links : PDF - Abstract

Code :

https://github.com/arjunsesh/lrr-neurips


Coursera

Keywords : queries - scoring - correlation - authors - models -

Leave a Reply

Your email address will not be published. Required fields are marked *