Cited 1 time in
Cited 1 time in
DSP-CC-: I/O Efficient Parallel Computation of Connected Components in Billion-Scale Networks
- DSP-CC-: I/O Efficient Parallel Computation of Connected Components in Billion-Scale Networks
- Kim, MS[Kim, Min-Soo]; Lee, S[Lee, Sangyeon]; Han, WS[Han, Wook-Shin]; Park, H[Park, Himchan]; Lee, JH[Lee, Jeong-Hoon]
- DGIST Authors
- Kim, MS[Kim, Min-Soo]; Park, H[Park, Himchan]
- Issue Date
- IEEE Transactions on Knowledge and Data Engineering, 27(10), 2658-2671
- Article Type
- Algorithms; Connected Component; Connected Components; Disk-Based; Disks (Machine Components); Graphic Methods; Graphs; Parallel; Parallel Algorithms; SSD
- Computing connected components is a core operation on graph data. Since billion-scale graphs cannot be resident in memory of a single server, several approaches based on distributed machines have recently been proposed. The representative methods are Hash-To-Min and PowerGraph. Hash-To-Min is the state-of-the art disk-based distributed method which minimizes the number of MapReduce rounds. PowerGraph is the-state-of-the-art in-memory distributed system, which is typically faster than the disk-based distributed one, however, requires a lot of machines for handling billion-scale graphs. In this paper, we propose an I/O efficient parallel algorithm for billion-scale graphs in a single PC. We first propose the Disk-based Sequential access-oriented Parallel processing (DSP) model that exploits sequential disk access in terms of disk I/Os and parallel processing in terms of computation. We then propose an ultra-fast disk-based parallel algorithm for computing connected components, DSP-CC, which largely improves the performance through sequential disk scan and page-level cache-conscious parallel processing. Extensive experimental results show that DSP-CC 1) computes connected components in billion-scale graphs using the limited memory size whereas in-memory algorithms can only support medium-sized graphs with the same memory size, and 2) significantly outperforms all distributed competitors as well as a representative disk-based parallel method. © 2015 IEEE.
- IEEE COMPUTER SOC
- Related Researcher
Kim, Min Soo
Big Data Systems; Big Data Mining & Machine Learning; Big Data Bioinformatics; 데이터 마이닝 및 빅데이터 분석; 바이오인포메틱스 및 뉴로인포메틱스; 뇌-기계 인터페이스(BMI)
There are no files associated with this item.
- Department of Information and Communication EngineeringInfoLab1. Journal Articles
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.