ML p(r)ior | Fast In-Memory SQL Analytics on Graphs

Fast In-Memory SQL Analytics on Graphs

2016-01-29
We study a class of graph analytics SQL queries, which we call relationship queries. Relationship queries are a wide superset of fixed-length graph reachability queries and of tree pattern queries. Intuitively, it discovers target entities that are reachable from source entities specified by the query. It usually also finds aggregated scores, which correspond to the target entities and are calculated by applying aggregation functions on measure attributes, which are found on the target entities, the source entities and the paths from the sources to the targets. We present real-world OLAP scenarios, where efficient relationship queries are needed. However, row stores, column stores and graph databases are unacceptably slow in such OLAP scenarios. We briefly comment on the straightforward extension of relationship queries that allows accessing arbitrary schemas. The GQ-Fast in-memory analytics engine utilizes a bottom-up fully pipelined query execution model running on a novel data organization that combines salient features of column-based organization, indexing and compression. Furthermore, GQ-Fast compiles its query plans into executable C++ source codes. Besides achieving runtime efficiency, GQ-Fast also reduces main memory requirements because, unlike column databases, GQ-Fast selectively allows more dense forms of compression including heavy-weighted compressions, which do not support random access. We used GQ-Fast to accelerate queries for two OLAP dashboards in the biomedical field. It outperforms Postgres by 2-4 orders of magnitude and outperforms MonetDB and Neo4j by 1-3 orders of magnitude when all of them are running on RAM. In addition, it generally saves space due to the appropriate use of compression methods.
PDF

Highlights - Most important sentences from the article

Login to like/save this paper, take notes and configure your recommendations

Related Articles

2013-07-11

There is growing interest in representing image data and feature descriptors using compact binary co… show more
PDF

Highlights - Most important sentences from the article

2016-03-28

We propose a novel generic inverted index framework on the GPU (called GENIE), aiming to reduce the … show more
PDF

Highlights - Most important sentences from the article

2019-05-17
1905.07113 | cs.DB

The storage manager, as a key component of the database system, is responsible for organizing, readi… show more
PDF

Highlights - Most important sentences from the article

2018-08-09

Exhaustive enumeration of all possible join orders is often avoided, and most optimizers leverage he… show more
PDF

Highlights - Most important sentences from the article

2017-03-28

XML query can be modeled by twig pattern query (TPQ) specifying predicates on XML nodes and XPath re… show more
PDF

Highlights - Most important sentences from the article

2019-04-09
1904.04467 | cs.DB

For testing the correctness of SQL queries, e.g., evaluating student submissions in a database cours… show more
PDF

Highlights - Most important sentences from the article

2018-09-01

Cloud-based data analysis is nowadays common practice because of the lower system management overhea… show more
PDF

Highlights - Most important sentences from the article

2016-03-23
1603.07185 | cs.CL

We apply distributed language embedding methods from Natural Language Processing to assign a vector … show more
PDF

Highlights - Most important sentences from the article

2018-09-28

Temporal text, i.e., time-stamped text data are found abundantly in a variety of data sources like n… show more
PDF

Highlights - Most important sentences from the article

2018-12-28

In this paper, we propose a compact data structure to store labeled attributed graphs based on the k… show more
PDF

Highlights - Most important sentences from the article

2018-07-20

Resource Description Framework (RDF) has been widely used to represent information on the web, while… show more
PDF

Highlights - Most important sentences from the article

2018-04-03

Despite 25 years of research in academia, approximate query processing (AQP) has had little industri… show more
PDF

Highlights - Most important sentences from the article

2019-06-08

Subgraph isomorphism is a well-known NP-hard problem that is widely used in many applications, such … show more
PDF

Highlights - Most important sentences from the article

2018-09-03

This paper uses typed linear algebra (LA) to represent data and perform analytical querying in a sin… show more
PDF

Highlights - Most important sentences from the article

2019-03-08

Optimizing the physical data storage and retrieval of data are two key database management problems.… show more
PDF

Highlights - Most important sentences from the article