The one of characteristics is simpeness of structure that provides facile data analysis. The importance of graph processing is growing due to that. However the handling on large-scale graph using existing methods becomes harder due to the increase of data size. The distributed graph system requires a lot of machine for scalability. However network overhead increases linearly and is larger than processing time. Meanwhile GPU is used as coprocessor for handling graph data via thousands of GPU cores. However all of existing methods using GPU fail to process large-scale graphs due to lack of main memory. In this paper, we propose the method GStream 2.0 that handles large-scale graphs which is larger than main memory using a single machine. Proposed method is streaming the graph data from secondary storage to GPU memory and launches graph algorithm on GPU device. Through streaming graph data via slotted page format which is partitioning graph data with fixed size, proposed method handles large-scale graph which is larger than GPU device memory. Performance is improved by avoiding intermediate data transfer and proposed method is using the array for result of algorithm which is sorted in GPU device memory. For supporting graph algorithm based on edge attribute, we extend slotted page format. Finally, we propose the extended model for 2-hop neighborhood algorithm. Through experiments, we show that GStream 2.0 significantly outperforms the major methods and the state-of-the-art methods. ⓒ 2017 DGIST
Table Of Contents
Ⅰ. INTRODUCTION 1-- Ⅱ. PRELIMINARIES 5-- Ⅲ. MAIN CONCEPT OF GSTREAM 2.0 7-- 3.1 Main concept of GStream 2.0 7-- 3.2 Asynchronous multiple streaming 9-- 3.3 Major types of graph algorithms 11-- 3.4 Algorithm of the framework 12-- Ⅳ. THE STRATEGY OF GSTREAM 2.0 16-- 4.1 Strategy for performance 16-- 4.2 Strategy for scalability 18-- Ⅴ. EXTENDED SLOTTED PAGE FORMAT 21-- 5.1 Extended slotted page format for trillion-scale graph 21-- 5.2 Extended slotted page format for edge attributes 22-- Ⅵ. EXTENDED PROCESSING MODEL OF GSTREAM 2.0 24-- Ⅶ. EXPERIMENTAL EVALUATION 26-- 7.1 Experimental setup 26-- 7.2 Comparison with Distributed Methods 29-- 7.3 Comparison with CPU-based Methods 29-- 7.4 Comparison with GPU-based Methods 31-- 7.5 Characteristics of GStream 2.0 33-- 7.6 Additional graph algorithms 35-- 7.7 Experiment for edge attribute 36-- Ⅷ. CONCLUSION 37