AbstractsComputer Science

Streaming Graph Partitioning

by Zainab Abbas




Institution: KTH Royal Institute of Technology
Department:
Year: 2016
Keywords: Engineering and Technology; Teknik och teknologier; Teknologie masterexamen - Distribuerade system; Master of Science - Distributed Computing; Datalogi; Computer Science
Posted: 02/05/2017
Record ID: 2121221
Full text PDF: http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-190895


Abstract

Graph partitioning is considered to be a standard solution to process huge graphs efficiently when processing them on a single machine becomes inefficient due to its limited computation power and storage space. In graph partitioning, the whole graph is divided among different computing nodes that process the graph in parallel. During the early stages of research done on graph partitioning, different offline partitioning methods were introduced; these methods create high computation cost as they process the whole graph prior to partitioning. Therefore, an online graph partitioning method called as streaming graph partitioning was introduced later to reduce the computation cost by assigning the edges or vertices on-the-fly to the computing nodes without processing the graph before partitioning. In our thesis, we presented an experimental study of different streaming graph partitioning methods that use two partitioning techniques: vertex partitioning and edge partitioning. Edge partitioning has proved good for partitioning highly skewed graphs. After implementing different partitioning methods, we have proposed a partitioning algorithm that uses degree information of the vertices. Furthermore, we measured the effect of different partitioning methods on the graph stream processing algorithms. Our results show that for vertex partitioning Fennel has performed better than Linear Greedy as it shows lower edge-cuts and better load balancing. Moreover, for edge partitioning, the Degree based partitioner has performed better than Least Cost Incremental and Least Cost Incremental Advanced in reducing the replication factor, but the Degree based partitioner does not do well in load balancing. In the end, we show that the custom partitioning methods, compared to default hash partitioning, save the memory space by reducing the size of aggregate states during execution of different graph processing algorithms on the resulting partitions. The Degree based partitioner performed well by reducing the size of aggregate states on average up to 50%. Other algorithms include: Fennel, Linear Greedy, Least Cost Incremental and Least Cost Incremental Advanced, they reduced the size of aggregate states on average up to 21%, 10%, 27% and 48%.