ML p(r)ior | Functional Dependencies Unleashed for Scalable Data Exchange

Functional Dependencies Unleashed for Scalable Data Exchange

2016-02-01
We address the problem of efficiently evaluating target functional dependencies (fds) in the Data Exchange (DE) process. Target fds naturally occur in many DE scenarios, including the ones in Life Sciences in which multiple source relations need to be structured under a constrained target schema. However, despite their wide use, target fds' evaluation is still a bottleneck in the state-of-the-art DE engines. Systems relying on an all-SQL approach typically do not support target fds unless additional information is provided. Alternatively, DE engines that do include these dependencies typically pay the price of a significant drop in performance and scalability. In this paper, we present a novel chase-based algorithm that can efficiently handle arbitrary fds on the target. Our approach essentially relies on exploiting the interactions between source-to-target (s-t) tuple-generating dependencies (tgds) and target fds. This allows us to tame the size of the intermediate chase results, by playing on a careful ordering of chase steps interleaving fds and (chosen) tgds. As a direct consequence, we importantly diminish the fd application scope, often a central cause of the dramatic overhead induced by target fds. Moreover, reasoning on dependency interaction further leads us to interesting parallelization opportunities, yielding additional scalability gains. We provide a proof-of-concept implementation of our chase-based algorithm and an experimental study aiming at gauging its scalability with respect to a number of parameters, among which the size of source instances and the number of dependencies of each tested scenario. Finally, we empirically compare with the latest DE engines, and show that our algorithm outperforms them.
PDF

Highlights - Most important sentences from the article

Login to like/save this paper, take notes and configure your recommendations

Related Articles

2018-06-21

Suppose that two independent sets $I$ and $J$ of a graph with $\vert I \vert = \vert J \vert$ are gi… show more
PDF

Highlights - Most important sentences from the article

2019-03-21

The problem of data exchange involves a source schema, a target schema and a set of mappings from tr… show more
PDF

Highlights - Most important sentences from the article

2016-04-19

The fixed parameter tractable (FPT) approach is a powerful tool in tackling computationally hard pro… show more
PDF

Highlights - Most important sentences from the article

2017-03-08
1703.02830 | cs.LO

In our implementation of geometric resolution, the most costly operation is subsumption testing (or … show more
PDF

Highlights - Most important sentences from the article

2019-04-25

We consider worker skill estimation for the single-coin Dawid-Skene crowdsourcing model. In practice… show more
PDF

Highlights - Most important sentences from the article

2019-01-12

The chase procedure is a fundamental algorithmic tool in database theory with a variety of applicati… show more
PDF

Highlights - Most important sentences from the article

2018-05-17
1805.06834 | cs.LG

We present a high-dimensional analysis of three popular algorithms, namely, Oja's method, GROUSE and… show more
PDF

Highlights - Most important sentences from the article

2015-06-29
1506.08547 | cs.DS

We consider the recent formulation of the Algorithmic Lov\'asz Local Lemma [10,2,3] for finding obje… show more
PDF

Highlights - Most important sentences from the article

2016-07-16

An instance of the Constraint Satisfaction Problem (CSP) is given by a family of constraints on over… show more
PDF

Highlights - Most important sentences from the article

2016-11-02

Moser and Tardos (2010) gave an algorithmic proof of the lopsided Lov\'asz local lemma (LLL) in the … show more
PDF

Highlights - Most important sentences from the article

2018-07-23

Over the past years, there has been a resurgence of Datalog-based systems in the database community … show more
PDF

Highlights - Most important sentences from the article

2018-09-16

Vadalog is a system for performing complex reasoning tasks such as those required in advanced knowle… show more
PDF

Highlights - Most important sentences from the article

2016-11-05
1611.01647 | cs.DS

We propose a new algorithmic framework, called "partial rejection sampling", to draw samples exactly… show more
PDF

Highlights - Most important sentences from the article

2018-08-28
1808.09545 | cs.DB

Incentivized by the enormous economic profits, the data marketplace platform has been proliferated r… show more
PDF

Highlights - Most important sentences from the article

2018-07-24

In recent years, expansion-based techniques have been shown to be very powerful in theory and practi… show more
PDF

Highlights - Most important sentences from the article

2018-10-04

Existential rules, long known as tuple-generating dependencies in database theory, have been intensi… show more
PDF

Highlights - Most important sentences from the article