DGIST Scholar: GStream: A Graph Streaming Processing Method for Large-scale Graphs on GPUs

Department of Electrical Engineering and Computer Science Theses Master

Cited time in webofscience

Cited time in scopus

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Kim, Min Soo	-
dc.contributor.author	Seo, Hyun Seok	-
dc.date.accessioned	2017-05-10T08:51:12Z	-
dc.date.available	2015-01-12T00:00:00Z	-
dc.date.issued	2015	-
dc.identifier.uri	http://dgist.dcollection.net/jsp/common/DcLoOrgPer.jsp?sItemId=000001914717	en_US
dc.identifier.uri	http://hdl.handle.net/20.500.11750/1387	-
dc.description.abstract	Fast processing graph algorithms for large-scale graphs becomes increasingly important as graphs become popular in a wide range of applications and the sizes of graphs are growing rapidly. Due to the relatively low cost and high computational power of GPUs, there have been many attempts to process graph applications by exploiting the massive amount of paral-lelism of GPUs. However, most of the existing methods fail to process large-scale graphs that do not fit in GPU device memory, mainly due to the irregular structure and the complex graph processing algorithms. Although the state-of-the-art method TOTEM can process large-scale graphs by partitioning a graph into two parts: the main memory part processed by CPUs and the device memory part processed by GPUs, it still has several fundamental problems such as a large amount of synchronization overhead and lack of scalability. We propose a fast and scalable parallel processing method GStream that processes large-scale graphs (e.g., billion vertices) beyond the capacity of GPU device memory very efficiently by exploiting the concept of so-called nested-loop join operation and the asynchronous GPU streams. It exploits multiple GPUs by uniformly distributing the workload among GPUs and also ex-ploits available GPU device memory by caching streaming graph data. GStream is the first scalable method in terms of both the data size and the number of GPUs, to the best of our knowledge. Extensive experimental results show that GStream consistently and significantly outperforms TOTEM in terms of absolute performance, scalability, and graph size to pro-cess, for both synthetic graphs and real graphs. ⓒ 2015 DGIST	-
dc.description.tableofcontents	Ⅰ. INTRODUCTION 1-- Ⅱ. PREMINARIES 5-- Ⅲ. GSTREAM METHOD 6-- 3.1 Basic concept of GStream 6-- 3.2 Streaming topology data 8-- 3.3 Exploiting Multiple GPUs 12-- 3.4 Caching topology data 13-- 3.5 Framework of GStream 14-- 3.6 Cost model of GStream 16-- Ⅳ. OTHER TECHNIQUES 18-- 4.1 Fine-granular parallel processing in GStream 18-- 4.2 Extending to a disk-based method 18-- Ⅴ. PERFORMANCE EVALUATION 20-- 5.1 Experimental setup 20-- 5.2 Comparison with TOTEM 21-- 5.3 Characteristics of GStream 24-- Ⅵ. RELATED WORK 27-- VII. CONCLUSIONS 29	-
dc.format.extent	35	-
dc.language	eng	-
dc.publisher	DGIST	-
dc.subject	Graph processing	-
dc.subject	Large-scale	-
dc.subject	GPU	-
dc.subject	Stream	-
dc.subject	그래프 처리	-
dc.subject	대규모	-
dc.subject	스트림	-
dc.title	GStream: A Graph Streaming Processing Method for Large-scale Graphs on GPUs	-
dc.title.alternative	GStream: GPU 에서 대규모 그래프 처리를 위한 스트리밍 방법	-
dc.type	Thesis	-
dc.identifier.doi	10.22677/thesis.1914717	-
dc.description.alternativeAbstract	큰 규모 그래프를 위한 빠른 그래프 알고리즘 처리는 다양한 분야의 어플리케이션에 있어서 그래프가 인기를 얻게 되고 그래프의 규모가 급속히 증가함에 따라 점점 중요해지고 있다. 상대적으로 낮은 가격 대비 높은 계산 능력을 가지고 있는 GPU의 대규모 병렬성을 이용하여 그래프 어플리케이션을 처리하기 위한 많은 시도가 이루어져 왔다. 그러나 현존하는 대부분의 방법들은 대개 불규칙적은 그래프 구조와 복잡한 그래프 처리 알고리즘 때문에 장치 메모리에 저장할 수 없는 큰 규모의 그래프를 처리하지 못했다. 가장 최신의 방법인 TOTEM은 큰 규모의 그래프를 메인 메모리와 장치 메모리 부분적으로 나누어 저장하여 메인 메모리에 저장된 부분은 CPU에 의해 계산되고, 장치 메모리에 저장된 부분은 GPU에 계산되는 방법으로 처리한다. TOTEM은 장치메모리 보다 큰 규모의 그래프 데이터를 처리하기는 하지만 아직 메인 메모리와 장치 메모리간의 동기화 오버헤드와 GPU의 개수 및 데이터 규모에 따른 확장성의 부족과 같은 근본적인 문제를 가지고 있다. 우리는 빠르고 GPU의 개수 및 데이터 규모에 대해 확장성을 가진 병렬 처리 방법인 GStream을 제안한다. GStream은 GPU의 장치메모리보다 큰 규모의 그래프(예, 10억개의 정점)를 중첩 루프 조인과 비동기적인 GPU 스트림을 이용함으로써 매우 효율적으로 처리한다. GStream은 GPU간에 균일한 작업 부하를 분산하는 방법으로 다수의 GPU를 활용한다. 또한 스트리밍으로 복사되는 그래프 데이터를 캐시하는 방법으로 GPU 장치 메모리의 가용공간을 활용한다. 우리가 아는 범위에서 GStream은 첫 번째 데이터 규모 및 GPU의 개수에 있어서 확장성을 가지는 방법이다. 대규모의 실험 결과는 GStream이 TOTEM보다 합성 데이터와 실제 데이터에 대해서 절대적인 성능, 확장성 그리고 그래프 규모에 있어서 일관적이며 상당히 더 나은 결과를 내는 것을 보여준다. ⓒ 2015 DGIST	-
dc.description.degree	Master	-
dc.contributor.department	Information and Communication Engineering	-
dc.contributor.coadvisor	Choe, Seung Ho	-
dc.date.awarded	2015. 2	-
dc.publisher.location	Daegu	-
dc.description.database	dCollection	-
dc.date.accepted	2015-01-12	-
dc.contributor.alternativeDepartment	대학원 정보통신융합공학전공	-
dc.contributor.affiliatedAuthor	Seo, Hyun Seok	-
dc.contributor.affiliatedAuthor	Kim, Min Soo	-
dc.contributor.affiliatedAuthor	Choe, Seung Ho	-
dc.contributor.alternativeName	서현석	-
dc.contributor.alternativeName	김민수	-
dc.contributor.alternativeName	최승호	-

Files in This Item:: 000001914717.pdf
기타 데이터 / 945.75 kB / Adobe PDF download

Appears in Collections:: Department of Electrical Engineering and Computer Science Theses Master

Show Simple Item Record

qrcode

DGIST

DGIST Scholar was built with support from the OAK distribution project by the National Library of Korea.

You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Library Services Team, DGIST 333. Techno Jungang-daero, Hyeonpung-myeon, Dalseong-gun, Daegu, 42988, Republic of Korea.

RSS_1.0 RSS_2.0 ATOM_1.0

DGIST Library Repository

BROWSE

DGIST

BROWSE