AbstractsComputer Science

Efficient processing of advanced structural queries in graph databases

by Xiang Zhao




Institution: University of New South Wales
Department: Computer Science & Engineering
Year: 2014
Record ID: 1051896
Full text PDF: http://handle.unsw.edu.au/1959.4/53213


Abstract

Recent decades witnessed a rapid proliferation of graph data, such as chemical structures and business processes. Structural queries are frequently issued in these domains, and hence, attract extensive attention. Due to data inconsistency and noise, a recent trend is to study similarity queries. Moreover, graphs may possess multiple attributes on vertices, imposing additional challenges. Hence, indexing and processing methods for answering advanced structural queries are strongly in demand. The thesis studies three types of emerging structural queries in large-scale graph databases, including graph similarity join, graph similarity search and graph substructure selection. Graph similarity join retrieves all similarity pairs from two graph collections such that their similarity satisfies a given threshold. Particularly, we incorporate graph edit distance as similarity measure. Inspired by the q-gram idea for string similarity queries, we introduce path-based q-grams on graphs, and establish a count filtering lower bound for generating candidates. An efficient algorithm is proposed to exploit both matching and mismatching features. We also present label and degree-based matching conditions to strengthen the filtering techniques. Leveraging offline index, we investigate efficient graph similarity search. Existing solutions employ fixed-size overlapping substructures, and thus, are susceptible to large vertex degrees and thresholds. Disparately, we present a partition-based approach by dividing graphs into variable-size non-overlapping partitions. The edit distance constraint is converted to a containment constraint for candidate generation. A dynamic pruning technique and an improved verification algorithm are also developed. In addition, a cost-aware graph partitioning technique is conceived to optimize the index. Equipping structural queries with rich semantics, substructure selection over multi-attributed graphs finds all subgraph isomorphic mappings such that each pair of matching vertices satisfy a set of selection conditions, each against an equality, range, or set containment operator on a vertex attribute. Existing techniques for single-labeled graphs are developed under the assumption of identical label matching, and thus, cannot handle the general cases. To this end, we propose a two-tier index via judiciously materializing feature mappings. We also devise a scalable indexing and query processing method exploiting execution plans with minimum cost.