Cited time in webofscience Cited time in scopus

Full metadata record

DC Field Value Language
dc.contributor.author Nam, Yoon-Min -
dc.contributor.author Han, Donghyoung -
dc.contributor.author Kim, Min-Soo -
dc.date.accessioned 2019-01-27T13:53:28Z -
dc.date.available 2019-01-27T13:53:28Z -
dc.date.created 2019-01-17 -
dc.date.issued 2019-04 -
dc.identifier.issn 0020-0255 -
dc.identifier.uri http://hdl.handle.net/20.500.11750/9526 -
dc.description.abstract As parallel database systems have large amounts of data to process, it is important to utilize a scalable and efficient horizontal database partitioning method. The existing partitioning methods have major drawbacks that not only cause large amounts of data redundancy but also still require expensive shuffle operations for join queries in many cases—despite their high data redundancy. We elucidate upon the drawbacks originating from the tree-based partitioning schemes and propose a novel graph-based database partitioning method called GPT that both improves the query performance and reduces data redundancy. We integrate the proposed GPT method into a parallel query processing system, Spark SQL, across all the relevant layers and modules, including the query plan generator and the scan operator. Through extensive experiments using three benchmarks, TPC-DS, IMDB and BioWarehouse, we show that GPT significantly outperforms the state-of-the-art method in terms of both storage overhead and query performance. © 2018 Elsevier Inc. -
dc.language English -
dc.publisher Elsevier BV -
dc.title A parallel query processing system based on graph-based database partitioning -
dc.type Article -
dc.identifier.doi 10.1016/j.ins.2018.12.031 -
dc.identifier.scopusid 2-s2.0-85059005654 -
dc.identifier.bibliographicCitation Information Sciences, v.480, pp.237 - 260 -
dc.description.isOpenAccess FALSE -
dc.subject.keywordAuthor Graph-based partitioning -
dc.subject.keywordAuthor Horizontal database partitioning -
dc.subject.keywordAuthor Parallel query processing -
dc.subject.keywordPlus Benchmarking -
dc.subject.keywordPlus Database systems -
dc.subject.keywordPlus Digital storage -
dc.subject.keywordPlus Graphic methods -
dc.subject.keywordPlus Redundancy -
dc.subject.keywordPlus Trees (mathematics) -
dc.subject.keywordPlus Database partitioning -
dc.subject.keywordPlus Graph-based -
dc.subject.keywordPlus Large amounts of data -
dc.subject.keywordPlus Parallel database systems -
dc.subject.keywordPlus Parallel query processing -
dc.subject.keywordPlus Partitioning methods -
dc.subject.keywordPlus Query performance -
dc.subject.keywordPlus State-of-the-art methods -
dc.subject.keywordPlus Query processing -
dc.citation.endPage 260 -
dc.citation.startPage 237 -
dc.citation.title Information Sciences -
dc.citation.volume 480 -
Files in This Item:

There are no files associated with this item.

Appears in Collections:
Department of Electrical Engineering and Computer Science InfoLab 1. Journal Articles

qrcode

  • twitter
  • facebook
  • mendeley

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.

BROWSE