DGIST Scholar: BIGMiner: a fast and scalable distributed frequent pattern miner for big data

Department of Electrical Engineering and Computer Science InfoLab 1. Journal Articles

Cited time in webofscience

Cited time in scopus

Full metadata record

DC Field	Value	Language
dc.contributor.author	Chon, Kang-Wook	-
dc.contributor.author	Kim, Min-Soo	-
dc.date.accessioned	2018-03-07T04:21:56Z	-
dc.date.available	2018-03-07T04:21:56Z	-
dc.date.created	2018-02-26	-
dc.date.issued	2018-09	-
dc.identifier.citation	Cluster Computing, v.21, no.3, pp.1507 - 1520	-
dc.identifier.issn	1386-7857	-
dc.identifier.uri	http://hdl.handle.net/20.500.11750/5910	-
dc.description.abstract	Frequent itemset mining is widely used as a fundamental data mining technique. Recently, there have been proposed a number of MapReduce-based frequent itemset mining methods in order to overcome the limits on data size and speed of mining that sequential mining methods have. However, the existing MapReduce-based methods still do not have a good scalability due to high workload skewness, large intermediate data, and large network communication overhead. In this paper, we propose BIGMiner, a fast and scalable MapReduce-based frequent itemset mining method. BIGMiner generates equal-sized sub-databases called transaction chunks and performs support counting only based on transaction chunks and bitwise operations without generating and shuffling intermediate data. As a result, BIGMiner achieves very high scalability due to no workload skewness, no intermediate data, and small network communication overhead. Through extensive experiments using large-scale datasets of up to 6.5 billion transactions, we have shown that BIGMiner consistently and significantly outperforms the state-of-the-art methods without any memory problems. © 2018 Springer Science+Business Media, LLC, part of Springer Nature	-
dc.language	English	-
dc.publisher	Springer New York LLC	-
dc.title	BIGMiner: a fast and scalable distributed frequent pattern miner for big data	-
dc.type	Article	-
dc.identifier.doi	10.1007/s10586-018-1812-0	-
dc.identifier.wosid	000457275200004	-
dc.identifier.scopusid	2-s2.0-85041818619	-
dc.type.local	Article(Overseas)	-
dc.type.rims	ART	-
dc.description.journalClass	1	-
dc.citation.publicationname	Cluster Computing	-
dc.contributor.nonIdAuthor	Chon, Kang-Wook	-
dc.identifier.citationVolume	21	-
dc.identifier.citationNumber	3	-
dc.identifier.citationStartPage	1507	-
dc.identifier.citationEndPage	1520	-
dc.identifier.citationTitle	Cluster Computing	-
dc.type.journalArticle	Article	-
dc.description.isOpenAccess	N	-
dc.subject.keywordAuthor	Big data	-
dc.subject.keywordAuthor	Distributed algorithm	-
dc.subject.keywordAuthor	Frequent pattern mining	-
dc.subject.keywordAuthor	MapReduce	-
dc.subject.keywordAuthor	Scalable algorithm	-
dc.contributor.affiliatedAuthor	Chon, Kang-Wook	-
dc.contributor.affiliatedAuthor	Kim, Min-Soo	-

Files in This Item:: There are no files associated with this item.

Appears in Collections:: Department of Electrical Engineering and Computer Science InfoLab 1. Journal Articles

Show Simple Item Record

qrcode

DGIST

DGIST Scholar was built with support from the OAK distribution project by the National Library of Korea.

You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Library Services Team, DGIST 333. Techno Jungang-daero, Hyeonpung-myeon, Dalseong-gun, Daegu, 42988, Republic of Korea.

RSS_1.0 RSS_2.0 ATOM_1.0

DGIST Library Repository

BROWSE

DGIST

BROWSE