Cited 0 time in
Cited 0 time in
SciDFS: An In-situ Processing System for Scientific Array Data based on Distributed File System
- SciDFS: An In-situ Processing System for Scientific Array Data based on Distributed File System
- Han, Dong Hyoung; Nam, Yoon-Min; Kim, Min-Soo; Park, Kyongseok; Han, Sunggeun
- DGIST Authors
- Kim, Min-Soo
- Issue Date
- IEEE International Conference on Big Data and Smart Computing, BigComp 2018
- Recently, the amount of array data generated by scientific observation instruments increases rapidly. The array data is usually stored in standard formats such as HDF5 and NetCDF. To support high-level queries on the array data, a number of array DBMSs such as SciDB have been proposed. However, they typically have two drawbacks: slow data loading and not directly supporting standard formats. In particular, slow data loading is fatal since the speed of scientific data generation might be faster than that of data loading. To solve those drawbacks, we propose a distributed in-situ processing system called SciDFS that exploits a distributed file system (DFS) for storing and managing array data. SciDFS is a hybrid system that tightly integrates the query processing layer of an array DBMS with a DFS via an in-situ layer. It stores raw array data as DFS blocks very fast and processes queries in an in-situ manner by accessing the relevant DFS blocks. Through experiments using NASA's real satellite array data, we have shown three major features of SciDFS: high performance data loading (50X faster than SciDB), fast in-situ query processing performance, and running legacy applications for the HDF5 format. © 2018 IEEE.
There are no files associated with this item.
- Department of Information and Communication EngineeringInfoLab2. Conference Papers
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.