Cited 0 time in webofscience Cited 0 time in scopus

BIGMiner: a fast and scalable distributed frequent pattern miner for big data

Title
BIGMiner: a fast and scalable distributed frequent pattern miner for big data
Authors
Chon, Kang WookKim, Min Soo
DGIST Authors
Kim, Min Soo
Issue Date
ACCEPT
Citation
Cluster Computing, 1-14
Type
Article
Article Type
Article in Press
Keywords
Data communication systemsData miningHigher order statisticsParallel algorithmsScalabilityBitwise operationsFrequent itemset miningFrequent pattern miningHigh scalabilitiesLarge-scale datasetsMap-reduceScalable algorithmsState-of-the-art methodsBig data
ISSN
1386-7857
Abstract
Frequent itemset mining is widely used as a fundamental data mining technique. Recently, there have been proposed a number of MapReduce-based frequent itemset mining methods in order to overcome the limits on data size and speed of mining that sequential mining methods have. However, the existing MapReduce-based methods still do not have a good scalability due to high workload skewness, large intermediate data, and large network communication overhead. In this paper, we propose BIGMiner, a fast and scalable MapReduce-based frequent itemset mining method. BIGMiner generates equal-sized sub-databases called transaction chunks and performs support counting only based on transaction chunks and bitwise operations without generating and shuffling intermediate data. As a result, BIGMiner achieves very high scalability due to no workload skewness, no intermediate data, and small network communication overhead. Through extensive experiments using large-scale datasets of up to 6.5 billion transactions, we have shown that BIGMiner consistently and significantly outperforms the state-of-the-art methods without any memory problems. © 2018 Springer Science+Business Media, LLC, part of Springer Nature
URI
http://hdl.handle.net/20.500.11750/5910
DOI
10.1007/s10586-018-1812-0
Publisher
Springer New York LLC
Related Researcher
  • Author Kim, Min Soo InfoLab
  • Research Interests Big Data Systems; Big Data Mining & Machine Learning; Big Data Bioinformatics; 데이터 마이닝 및 빅데이터 분석; 바이오인포메틱스 및 뉴로인포메틱스; 뇌-기계 인터페이스(BMI)
Files:
There are no files associated with this item.
Collection:
Department of Information and Communication EngineeringInfoLab1. Journal Articles


qrcode mendeley

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

BROWSE