Cited time in webofscience Cited time in scopus

SGMiner: A Fast and Scalable GPU-Based Frequent Pattern Miner on SSDs

Title
SGMiner: A Fast and Scalable GPU-Based Frequent Pattern Miner on SSDs
Author(s)
Chon, Kang-WookYi, EunjeongKim, Min-Soo
Issued Date
2022-06
Citation
IEEE Access, v.10, pp.62502 - 62519
Type
Article
Author Keywords
Big datafrequent pattern miningparallel algorithmGPUsscalable algorithmdisk-based algorithm
Keywords
ALGORITHM
ISSN
2169-3536
Abstract
Frequent itemset mining is extensively employed as an essential data mining technique. Nevertheless, as the data size grows, the applicability of this method decreases owing to the relatively poor performance of the existing methods. Though numerous efficient sequential frequent itemset mining methods have been developed, the performance that can be achieved is clearly limited by the fact that they exploit only one thread. To overcome these limitations, a number of parallel methods using multi-core central processing units (CPUs), multiple machines or many-core graphic processing units (GPU) have been proposed. However, these methods are relatively slow in performance and have low scalability, mainly owing to large memory requirements for intermediate data, significant disk I/Os, and heavy computation. In this study, to resolve the aforementioned problems, we propose SGMiner, which is a new, fast, and scalable GPU-and disk-based method on a single machine equipped with multiple graphic processing units (GPUs) and multiple solid-state drives (SSDs) for extracting frequent patterns. It is based on an algorithm similar to the Apriori algorithm and neither has intermediate data nor large disk I/O overheads owing to its exploitation of SSDs. Moreover, we propose storing transaction databases, namely bitmap transaction chunks, in SSDs, streaming the chunks to GPU device memory via the main memory with reduced I/O overhead, and performing fast support counting with GPUs based on the chunks. In addition, when exploiting multiple GPUs and SSDs, it proposes a concept of replicating bitmap transaction chunks stored in SSDs to GPUs in a streaming fashion. This could allow an almost equal workload to be distributed evenly across multiple GPUs with reduced I/O overheads. The experiments we conducted demonstrate that SGMiner outperforms the existing methods in terms of scalability and performance with enhanced robustness.
URI
http://hdl.handle.net/20.500.11750/17431
DOI
10.1109/ACCESS.2022.3179592
Publisher
Institute of Electrical and Electronics Engineers Inc.
Files in This Item:

There are no files associated with this item.

Appears in Collections:
ETC 1. Journal Articles

qrcode

  • twitter
  • facebook
  • mendeley

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.

BROWSE