Detail View
A Distributed In-situ Analysis Method for Large-scale Scientfic Data
WEB OF SCIENCE
SCOPUS
- Title
- A Distributed In-situ Analysis Method for Large-scale Scientfic Data
- Alternative Title
- 분산 환경 기반 시스템에서 과학 기술 빅데이터 in-situ 분석 방법
- DGIST Authors
- Han, Dong Hyoung ; Kim, Min Soo ; Kang, Won Seok ; Choi, Jihwan P.
- Advisor
- Kim, Min Soo
- Co-Advisor(s)
- Kang, Won Seok ; Choi, Jihwan P.
- Issued Date
- 2016
- Awarded Date
- 2016. 2
- Citation
- Han, Dong Hyoung. (2016). A Distributed In-situ Analysis Method for Large-scale Scientfic Data. doi: 10.22677/thesis.2229871
- Type
- Thesis
- Subject
- In-situ processing ; data loading ; array DBMS ; scientific data format ; 데이터 로딩 ; 과학 기술 데이터 ; 분산 환경 시스템. In-situ 분석방법 ; array 데이터베이스
- Abstract
-
The size of scientific data has been increasing rapidly in a variety of do-mains. The scientific data is represented as array data and is managed by a diverse scientific data format such as HDF, NetCDF and MDSplus. Even though the existing array DBMSs such as SciDB and RasDaMan manage array data, there are challenges in loading data into the array DBMS. The data loading process of the distributed array DBMS incurs the significant overheads since the inefficient four transformation steps of file format incur the expensive disk I/O.
더보기
In this paper, we propose a distributed in-situ analysis method DISCAN that can process a scientific query efficiently and directly over raw scientific array data in distributed array DBMSs. Our approach eliminates unnecessary write opera-tions during the data loading and processes only the data required in query. Our in-situ processing consists of two phases, HDF merger and DISCAN. HDF merger is responsible for managing raw scientific data in order to distribute the scientific data to nodes. DISCAN is composed of Local Map that transforms the raw scientific data into the internal data representation of DBMS and Global Map that replaces the transformed data according to a partitioning policy of the DBMS. DISCAN reads only the data required during query processing using the well-defined scientific data format libraries. We evaluate the performance of DISCAN across real-world scien-tific dataset. Experimental results show that DISCAN outperforms the processing query after data loading of the distributed array DBMS by up to more than 60 times. ⓒ 2016 DGIST
- Table Of Contents
-
1. INTRODUCTION 1--
2. PRELIMINARIES 6--
2.1 Array DBMS 6--
2.2 Data loading 9--
3. RELATED WORK 12--
4. DISCAN 17--
4.1 In-situ processing 17--
4.2 Modification of a query plan 23--
4.3 Distributed in-situ scan operator 27--
5. PERFORMANCE EVALUATION 31--
6. CONCLUSIONS 40--
7. REFERENCES 41
- URI
-
http://dgist.dcollection.net/jsp/common/DcLoOrgPer.jsp?sItemId=000002229871
http://hdl.handle.net/20.500.11750/1474
- Degree
- Master
- Department
- Information and Communication Engineering
- Publisher
- DGIST
File Downloads
공유
Total Views & Downloads
???jsp.display-item.statistics.view???: , ???jsp.display-item.statistics.download???:
