The size of scientific data has been increasing rapidly in a variety of do-mains. The scientific data is represented as array data and is managed by a diverse scientific data format such as HDF, NetCDF and MDSplus. Even though the existing array DBMSs such as SciDB and RasDaMan manage array data, there are challenges in loading data into the array DBMS. The data loading process of the distributed array DBMS incurs the significant overheads since the inefficient four transformation steps of file format incur the expensive disk I/O. In this paper, we propose a distributed in-situ analysis method DISCAN that can process a scientific query efficiently and directly over raw scientific array data in distributed array DBMSs. Our approach eliminates unnecessary write opera-tions during the data loading and processes only the data required in query. Our in-situ processing consists of two phases, HDF merger and DISCAN. HDF merger is responsible for managing raw scientific data in order to distribute the scientific data to nodes. DISCAN is composed of Local Map that transforms the raw scientific data into the internal data representation of DBMS and Global Map that replaces the transformed data according to a partitioning policy of the DBMS. DISCAN reads only the data required during query processing using the well-defined scientific data format libraries. We evaluate the performance of DISCAN across real-world scien-tific dataset. Experimental results show that DISCAN outperforms the processing query after data loading of the distributed array DBMS by up to more than 60 times. ⓒ 2016 DGIST
Table Of Contents
1. INTRODUCTION 1-- 2. PRELIMINARIES 6-- 2.1 Array DBMS 6-- 2.2 Data loading 9-- 3. RELATED WORK 12-- 4. DISCAN 17-- 4.1 In-situ processing 17-- 4.2 Modification of a query plan 23-- 4.3 Distributed in-situ scan operator 27-- 5. PERFORMANCE EVALUATION 31-- 6. CONCLUSIONS 40-- 7. REFERENCES 41
Research Interests
Data Mining & Machine Learning for Text & Multimedia; Brain-Sense-ICTConvergence Computing; Computational Olfaction Measurement; Simulation&Modeling