Cited 0 time in webofscience Cited 3 time in scopus

A distributed in-situ analysis method for large-scale scientific data

A distributed in-situ analysis method for large-scale scientific data
Han, DonghyoungNam, Yoon-MinKim, Min-Soo
DGIST Authors
Han, Donghyoung; Nam, Yoon-Min; Kim, Min-Soo
Issue Date
2017 IEEE International Conference on Big Data and Smart Computing, BigComp 2017, 69-75
Article Type
Conference Paper
Recently, a massive amount of data is generated in a wide range of scientific applications such as NASA's satellite, the large hadron collider, and large synoptic survey telescope. Most of scientific data follows the array model, and there are various kinds of standard array formats such as HDF, NetCDF, MDSplus, and ROOT. SciDB is the most well-known DBMS that stores the array-based scientific data and processes queries on it. SciDB is a distributed DBMS, and so, is scalable in terms of query performance. However, it has a severe drawback that takes a huge amount of time for loading a massive amount of scientific data into DBMS. That is, it is not scalable in terms of data loading. To overcome that problem, we propose a distributed in-situ analysis method that allows processing queries on raw scientific data in a distributed manner without explicit data loading. In detail, we propose the in-situ scan operator that scans necessary data of the array format and passes it to upper operators of the pipeline of a query plan. It also performs repartitioning during in-situ scanning, which is required for correct query results. Through experiments using real datasets, we have shown that the SciDB system using our method significantly outperforms the original SciDB system by orders of magnitude in terms of the performance of the first query. © 2017 IEEE.
Institute of Electrical and Electronics Engineers Inc.
There are no files associated with this item.
Department of Information and Communication EngineeringInfoLab2. Conference Papers

qrcode mendeley

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.