DGIST Scholar: MapReduce Architecture for a Single Computing Node of Multiprocessors

Department of Electrical Engineering and Computer Science Theses Master

Cited time in webofscience

Cited time in scopus

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Kim, Min Soo	-
dc.contributor.author	Song, Hyo Chan	-
dc.date.accessioned	2017-05-10T08:49:53Z	-
dc.date.available	2016-05-18T00:00:00Z	-
dc.date.issued	2013	-
dc.identifier.uri	http://dgist.dcollection.net/jsp/common/DcLoOrgPer.jsp?sItemId=000002262489	en_US
dc.identifier.uri	http://hdl.handle.net/20.500.11750/1328	-
dc.description.abstract	Recently, the paradigm of micro-architecture design of CPUs is shifting to on-chip multi-core processors, and moreover, to many-core coprocessors for general computing such as NVIDIA’s Tesla and Intel’s Xeon Phi. Meanwhile, the MapReduce framework has been extensively used and studied for big data analysis, which runs typically on a large cluster of cheap commodity nodes. We propose a new MapReduce framework called Hybrid-core based big Data (Real-time) Analysis (HYDRA) that regards a single node equipped with both multi-core CPUs and many-core GPUs as a cluster of nodes, where a single processor plays a role of a single node. By fully exploiting the computing power of the modern heterogeneous-core systems, HYDRA could achieve a comparable performance with a small-scale cluster of nodes. Especially, HYDRA is based on the sharedmemory architecture, and so, has no cost of transferring data via network in a shuffle step of MapReduce, whereas the conventional MapReduce could have a large cost in that step depending on a kind of task. Under the proposed framework, we propose two strategies,“Processor As A Node” (PAAN) and “GPU Mapper CPU Reducer” (GMCR). PAAN considers a multiprocessor of either CPU or GPU as a node in the same way. On the other hand, GMCR considers GPUs as only mapper nodes and CPUs as only reducer nodes dissimilarly.The proposed strategies tackle the challenging issues such as how to cooperate two types of processors (i.e., CPUs and GPUs), how to manage different memory hierarchies in those types, and how to minimize data communication overhead between CPUs and GPUs. Extensive experimental results show that HYDRA outperforms the conventional MapReduce on a cluster of eight commodity nodes by up to more than 14 times. ⓒ 2013 DGIST	-
dc.description.tableofcontents	Ⅰ. INTRODUCTION 1 -- Ⅱ. BACKGROUND 4 -- 2.1 Notations 4 -- 2.2 MapReduce 4 -- 2.3 General Purpose computing on Graphics Processing Unit (GPGPU) 7 -- Ⅲ. THE HYDRA SYSTEM 9 -- 3.1 System Overview 9 -- 3.2 Processor As A Node (PAAN) 9 -- 3.2.1 Strategy Architecture 10 -- 3.2.2 CPU Node Workflow 13 -- 3.2.3 GPU Node Workflow 13 -- 3.3 GPU Mapper CPU Reducer (GMCR) 17 -- 3.3.1 Strategy Architecture 18 -- 3.3.2 GPU Mapper Workflow 19 -- 3.3.3 CPU Reducer Workflow 22 -- Ⅳ. EVALUATI0ON 25 -- 4.1 Experimental Setup 25 -- 4.2 Application – Word Count 25 -- 4.3 Performance Evaluation 26 -- Ⅴ. REALTED WORK 30 -- 5.1 MapReduce Framework with the CPU 30 -- 5.2 MapReduce Framework with the Accelerators 30 -- 5.3 Programming Tools for the GPGPU 32 -- Ⅵ. CONCLUSIONS 33	-
dc.format.extent	41	-
dc.language	eng	-
dc.publisher	DGIST	-
dc.subject	MapReduce	-
dc.subject	Heterogeneous computing	-
dc.subject	GPGPU	-
dc.subject	multicore	-
dc.subject	manycore	-
dc.title	MapReduce Architecture for a Single Computing Node of Multiprocessors	-
dc.title.alternative	멀티프로세서로 구성된 싱글 컴퓨팅 노드상의 맵리듀스 아키텍쳐	-
dc.type	Thesis	-
dc.identifier.doi	10.22677/thesis.2262489	-
dc.description.alternativeAbstract	최근의 CPU 마이크로 아키텍처 디자인의 패러다임은 온-칩 멀티코어 프로세서와 NVIDIA’s Tesla 및 Intel’s Xeon Phi 와 같은 매니코어 코-프로세서로 변화하고 있다. 한편, MapReduce 프레임워크는 저 비용 노드들의 대규모 클러스터 기반의 빅 데이터 분석에 광범위하게 사용되고 연구 되고 있다. 본 논문은 다수의 멀티코어 CPU 들과 매니코어 GPU 들로 구성되어 있는 단일 노드를 프로세서들의 클러스터로 간주하여 Hybrid-core based big Data (Real-time) Analysis (HYDRA)라는 새로운 MapReduce 프레임워크를 제안한다. 이때, 하나의 프로세서는 하나의 노드의 역할을 수행한다. HYDRA 는 현대의 이기종 코어 시스템의 컴퓨팅 파워를 최대한 활용하도록 설계됨으로써 단일 노드상의 HYDRA 가 소규모의 다중 노드 클러스터상의 MapReduce 와 유사한 성능을 발휘할 수 있도록 한다. 특히, HYDRA 는 공유 메모리 아키텍쳐를 기반으로 하고 있어서 기존의 MapReduce 의 셔플 단계에서 발생할 수 있는 네트워크를 통한 과도한 데이터 전송 비용을 가지지 않는다. 본 논문은 HYDRA 프레임워크 하에서 "Processor As A Node" (PAAN) 와 "GPU Mapper CPU Reducer" (GMCR)의 두 가지 전략을 제안한다. PAAN 은 하나의 CPU 또는 GPU 를 하나의 컴퓨팅 노드로 간주하는 전략이다. 반면, GMCR 은 GPU 들은 맵퍼 노드들로서만, CPU 들은 리듀서 노드들로서만 작동시키는 전략이다. 제안한 두 전략들은 (1) 서로 다른 특성을 지닌 CPU 와 GPU 사이의 협력 문제, (2) 그들 프로세서들이 가진 서로 다른 메모리 계층 구조를 관리하는 문제, (3) CPU 와 GPU 사이의 데이터 송/수신 비용을 줄이는 문제들에 대한 해결책들을 제시한다. 마지막으로 다양한 실험들의 결과를 통해 제안한 HYDRA 가 소규모 클러스터(노드 개수 8 개) 상에서의 MapReduce 보다 14 배 이상 좋은 성능을 발휘함을 보인다. ⓒ 2013 DGIST	-
dc.description.degree	Master	-
dc.contributor.department	Information and Communication Engineering	-
dc.contributor.coadvisor	Han, Byung Chan	-
dc.date.awarded	2013. 2	-
dc.publisher.location	Daegu	-
dc.description.database	dCollection	-
dc.date.accepted	2016-05-18	-
dc.contributor.alternativeDepartment	대학원 정보통신융합공학전공	-
dc.contributor.affiliatedAuthor	Song, Hyo Chan	-
dc.contributor.affiliatedAuthor	Kim, Min Soo	-
dc.contributor.affiliatedAuthor	Han, Byung Chan	-
dc.contributor.alternativeName	송효찬	-
dc.contributor.alternativeName	김민수	-
dc.contributor.alternativeName	한병찬	-

Files in This Item:: 000002262489.pdf
기타 데이터 / 1.29 MB / Adobe PDF download

Appears in Collections:: Department of Electrical Engineering and Computer Science Theses Master

Show Simple Item Record

qrcode

DGIST

DGIST Scholar was built with support from the OAK distribution project by the National Library of Korea.

You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Library Services Team, DGIST 333. Techno Jungang-daero, Hyeonpung-myeon, Dalseong-gun, Daegu, 42988, Republic of Korea.

DGIST Library Repository

BROWSE

DGIST

BROWSE