Cited 0 time in webofscience Cited 0 time in scopus

GMiner: A fast GPU-based frequent itemset mining method for large-scale data

Title
GMiner: A fast GPU-based frequent itemset mining method for large-scale data
Authors
Chon, Kang WookHwang, Sang HyunKim, Min Soo
DGIST Authors
Kim, Min Soo
Issue Date
2018-05
Citation
Information Sciences, 439-440, 19-38
Type
Article
Article Type
Article
Keywords
Computer graphicsComputer graphics equipmentData miningForestryHigher order statisticsParallel algorithmsProgram processorsComputational powerEnumeration treesFrequent itemset miningGraphic processing unit(GPU)Intermediate levelMultiple machineOrders of magnitudeWorkload skewnessGraphics processing unit
ISSN
0020-0255
Abstract
Frequent itemset mining is widely used as a fundamental data mining technique. However, as the data size increases, the relatively slow performances of the existing methods hinder its applicability. Although many sequential frequent itemset mining methods have been proposed, there is a clear limit to the performance that can be achieved using a single thread. To overcome this limitation, various parallel methods using multi-core CPU, multiple machine, or many-core graphic processing unit (GPU) approaches have been proposed. However, these methods still have drawbacks, including relatively slow performance, data size limitations, and poor scalability due to workload skewness. In this paper, we propose a fast GPU-based frequent itemset mining method called GMiner for large-scale data. GMiner achieves very fast performance by fully exploiting the computational power of GPUs and is suitable for large-scale data. The method performs mining tasks in a counterintuitive way: it mines the patterns from the first level of the enumeration tree rather than storing and utilizing the patterns at the intermediate levels of the tree. This approach is quite effective in terms of both performance and memory use in the GPU architecture. In addition, GMiner solves the workload skewness problem from which the existing parallel methods suffer; as a result, its performance increases almost linearly as the number of GPUs increases. Through extensive experiments, we demonstrate that GMiner significantly outperforms other representative sequential and parallel methods in most cases, by orders of magnitude on the tested benchmarks. © 2018 The Authors
URI
http://hdl.handle.net/20.500.11750/5914
DOI
10.1016/j.ins.2018.01.046
Publisher
Elsevier Inc.
Related Researcher
  • Author Kim, Min Soo InfoLab
  • Research Interests Big Data Systems; Big Data Mining & Machine Learning; Big Data Bioinformatics; 데이터 마이닝 및 빅데이터 분석; 바이오인포메틱스 및 뉴로인포메틱스; 뇌-기계 인터페이스(BMI)
Files:
There are no files associated with this item.
Collection:
Department of Information and Communication EngineeringInfoLab1. Journal Articles


qrcode mendeley

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

BROWSE